Parser.
VTK library is very HUGE. It contains about 700 classes each having
about 10-100 methods. Writing a wrapper by hand is practically
impossible. So it is of great importance to have a good parser. I used a
two-stage parser. At the first stage I apply gccxml translator
provided by Kitware. This is an extension to gcc compiler which
"compiles" code into XML files. Then these files can be parsed by any
xml-parser to generate the code suitable to be incorporated into R. I
use R for this purpose with XML package (thanks
to D.T.Lang). This is not an optimal choice, because the parser is quite
slow, and processing of the whole set of VTk code takes at my
Intel2400 about 3 hours.
gccxml.
Installation of gccxml is described by Kitware. To use it you need the
VTK source and you should be sure the VTk is compilable and installable
at the system where you want to use gccxml. After the installation of
VTk is complete I use tcl-script which lists the content of the current
working directory, do regular expression matching for the files like
(.*)Python.cxx and then compiles the original file \1.cxx. Here I want
to convert only classes which is already wrapped into Python. The script
is have to be run in the four directories contained in the VTk source
tree - Common/, Rendering/, Filtering/ and Graphics/. Five other
sublibraries of VTK (IO/, Imaging/, MPI/, Hybrid/ and Patented/) are not
yet included into RVTK.
Classes and methods.
After the VTK code is processed with gccxml we have a large number of
XMl files (each of them has size about 1MB). Each XML file is named
after one of the VTK classes and in most cases contains the description
of just this class. Our aim now is to extract information about class
methods from these files.
This was done by R script which parses XML file and writes the
corresponding code which is later will be packed into R library. One
type of parser's output is the files with .cc extension and the same
name as the name of VTK source file. These files contains a number of
functions with the name R_<ClassName>_<MethodName> which
wrap the call to the <MethodName> of <ClassName>. If the
class has a multicast method, i.e. the number of functions with the same
name, but having different number of arguments, then each method
instance is hadled separately. That is - in RVTK wrapper different
functions would be created having the name
R_<ClassName>_<MulticastMethodName> with suffix "_vx" where
x counts the instance number. These functions should be callable by R's .Call(...) routines. That means -
functions must return SEXP
value and all its input parameter must have SEXP type. The number of arguments
is one more than in original C++ functions. The nonmatching
parameter (always named obj)
is a reference to the C++ class instance wrapped into R external
pointer.The task of the wrapping function is to convert SEXPs to the base C types or to
pointers to VTK's objects, to call underlying C++ function and return
back SEXP even then VTK's
method does not return value.
All information needed for that XML file contains. That includes - the
method name, number of arguments and theirs types (or the class names),
the way arguments are passed to the method (by value, as pointers or
weak pointers, as constant values). If the argument is a reference to
the function, the things may get more complicated, but luckily VTK
seldom use functions as arguments to methods. In almost all of these
cases the argument is a function of the form void * func (void *) and is
treated by the parser separately. For basic types conversions RVTK
provides a number of functions of the form SEXP <c-type>toSexp(<c-type>
*, int dim) and <c-type>
* SexpTo<c-type> (SEXP). For conversion of VTK classes (or
any C++ class) void * SexpToVTK(SEXP) and VTKToSexp(void * arg, char * typename) are
used.
"typename" in VtkToSexp
function is used to set the R-Class of returned value (which in fact is
a an R-external pointer). Return value from SexpTo<c-type> may be array.
The size of the array are taken from the corresponding input SEXP argument's length. The inverse
conversion <c-type>ToSexp(...)
needs to be explicitely provided with the information of input
pointer's length.
Return Values.
There is not much of a problem with input arguments. All input SEXPs are processed with with SexpTo... functions and returned
pointers are passed to the C++ method with dereferencing when needed.
The more problematic question is what value should wrapping function
return? The first guess (to return the value of C++ method) is not the
good one because many of VTK functions return data as side-effect, i.e.
through the methods' input arguments. Taking this into account, the
return value that the wrapping function does return is an R-list. Beside
the value of the method the list also includes all arguments which the
method takes as references or pointers not marked as const. The list is
named: return value of VTk method has the name ".ret.val", all the rest
- the names of arguments prefixed with "."
Dimensions
While making the return SEXP from base C++ types by the means of
functions <c-type>ToSEXP
the dimension of the first argument should be known. This information
isn't included in XML-dump (gcc has now way to find the size of data
chunk referenced by the pointer), but there is a refernce to source file
where the method was realized. The parser uses that reference to extract
the method header and then do reg. exp. matching for patterns like "
<arg_name>[.*]" and " <arg_name>[.*][.*]". If the array
dimension is found in the header then in wrapper function explicit
length coersion is done with SET_LENGTH
macro applied on input argument. Multidimensional arguments as well as
multiple pointers (the information that the pointers are referenced by
pointers can be gathered from XML-dump) are not handled by the parser
and methods which requires them are not processed. If argument is not
supplied by value and its length can not be extracted from the source
file the length of output SEXP
is the same as the length of input SEXP
argument. So it is up to the end-user to provide R-vectors which have
enough space to hold all the data the VTK method would return. Of course
this is not safe, because supplying too short vector can easily lead to
memory corruption, but there is no other alternatives.
The length of return value of C++ method if this value is a pointer to
base C type is read from hints file provided by Kitware programmers. If
the length can not be found there the fail-safe length 1 is used.