I'm visiting Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE 2005) and hopefully give a presentation.
Using DSLs on top of Scheme VM
Software development process can benefit of using domain specific languages (DSL) instead of sets of custom libraries for data processing.
* Languages are usually designed better than casual code.
* DSLs provide right abstractions, allowing to write less and express more.
* Popular DSLs are well-documented, tested and have a user base.
For example, if an application extensively works with hierarchical data, it is reasonable to view the data as XML and run powerful XML tools over the XML view. One of such tools is XPath, a language for addressing parts of an XML document.
It is impossible to write an XPath implementation from scratch for each different programming language and each different tree, but developers want to use XPath in different environments. In order to satisfy the needs, the following questions should be answered.
* How to write a reusable DSL implementation?
* How to adapt the implementation for a specific programming language and data structure?
Different approaches are possible. Our work is based on the following ideas.
* Invent a custom programming language for implementing a DSL.
* Use the virtual machine (VM) technology. A DSL implementation should be compiled to a VM, and a host application should execute the compiled code.
* Write an implementation first, and only than develop the best suiting VM for the implementation.
We decided that the programming language Scheme suites best for making a generative XPath implementation. The language is perfect both as a basis for a custom programming language and as a basis for a VM. Moreover, there is no impedance mismatch between the language and the VM.
The following topics can be highlighted:
* embedding Scheme VM,
* preprocessing by compiling Scheme VM code to a host platform,
* transforming to Scheme VM source code.
Embedding is used when a DSL is required at runtime. There is a number of Scheme implementations which are ready for embedding, but before using them as VM the following issues should be solved.
* Mapping between data in the host platform and data in Scheme VM.
* Minimization of overhead in the mapping.
* Garbage collection.
According to our experience, mapping is the most complex task. When mapping from the host platform to Scheme VM, the complexity is a big number of different memory structures. The complexity of the reverse mapping is a flexibility of Scheme S-expressions which might not map correctly due to structural errors.
Another challenge is avoiding mapping data which is not used. For example, it's unreasonable to map a whole big hierarchical structure when only top-level nodes are used. To avoid excessive work, mapping should be lazy and instantiate values on demand.
Finally, garbage collection and live time of values should be considered. There is no a general advice. Each case should be handled personally.
We have developed two test cases that demonstrate embedding of Scheme VM.
* Python abstract syntax trees (AST) as XML. This is an example of developing a custom Scheme implementation from scratch.
* XPath over the file system. This is an example of adding a DSL (XPath) to a legacy application (GNU utility find). A modified version of Guile is used as Scheme VM.
Scheme VM can be hidden in a preprocessor if a DSL is not required at runtime. For example, developers can use XPath queries in C code. A preprocessor can extract all XPath strings, compile them to Scheme VM, then compile Scheme VM to C and replace original XPath expressions by the corresponding C code fragments.
Compiling a DSL to Scheme source code and rewriting the result is an important possibility. For example, Scheme code for an XPath statement can be considered as an execution plan. Using automatic rewriting or manual tuning, it is possible to trace or optimize execution of the XPath statement.
Static compiling and code rewriting are subjects for the future research. We are also interested in a generative Scheme implementation which is adaptable to different environments.