autoSql and autoXml: Code Generators from the Genome Project
autoSql has three types of objects:
Simple: objects that contain no variable-sized arrays.
Object: objects that can contain variable-sized arrays. A next pointer is automatically inserted as the first field in the C structure corresponding to an object.
Table: like objects, but the program generates an SQL as well as a C definition.
Simple objects differ from other objects in how the program treats array declarations. In the field declaration:
simple point triangle; "A three sided figure"
the three points are stored in memory as a C array. If this were declared instead as
object point triangle; "A three sided figure"the three points would be stored in memory as a singly linked list.
The following basic field types are supported:
int: 32-bit signed integer
uint: 32-bit unsigned integer
short: 16-bit signed integer
ushort: 16-bit unsigned integer
byte: 8-bit signed integer
ubyte: 8-bit unsigned integer
float: single precision IEEE floating point
char: 8-bit character (can only be used in an array)
string: variable length string up to 255 bytes long
lstring: variable length string up to 2 billion bytes long
Additionally, the simple, object and table types can be used as fields.
An array can be declared as either fixed size or variable size. A variable sized array is declared by putting a field name inside of the brackets in the array declaration. This field must be defined before the array.
Imagine that you've just built an amazing 3-D modeling program. The only problem is that now you need to save the structures in a database. Listing 1 is a way you might build the database with autoSql. Saving it as threeD.as and running
autoSql threeD.as threeD
would end up generating 393 lines of bug-free (I think!) C code and 14 lines of SQL for the investment of 33 lines of specification. (Refer to Listing 2 for the complete autoSql grammar.)
autoXml generates C code for an XML parser given an XML DTD file. It will generate a structure for each “element” in the DTD and populate the structure with fields for each attribute of the structure. By default, it will generate a parser that ignores elements and attributes not in the DTD, but otherwise is a validating parser. If you use the -picky flag, it will be fully validating.
The autoXml parser will load the entire file into memory. If this is a problem you'll have to resort to the lower-level xap parser, which is much like the commonly used expat parser, but a bit faster.
If you find yourself befuddled by all the acronyms so far, you're probably new to XML (eXtensible Markup Language). It has a tag-based format, and a simple example of an XML doc might be:
<POLYGON id="square"> <DESCRIPTION> This is soooo square man </DESCRIPTION> <POINT x="0" y="0" -> <POINT x="0" y="1" -> <POINT x="1" y="1" -> <POINT x="1" y="0" -> </POLYGON>
Everything in XML lives between <TAG></TAG> pairs. A tag may have associated text, attributes and subtags. In the example above, POLYGON has the subtags DESCRIPTION and POINT, the attribute id and no text. DESCRIPTION has the text “This is soooo square man” and no subtags or attributes. POINT has the attributes x and y. POINT also illustrates a little XML shortcut: tags containing only attributes can be written <TAG att=“something” -> as a shortcut for <TAG att=“something”></TAG>.
XML is much like HTML but has significant differences. All attributes must be enclosed in quotes in XML, while quotes are optional in HTML. Tags must strictly nest in XML, while HTML allows tags to be opened but not closed. The tags in HTML are predefined. In XML the definition of tags is up to you.
Tags can be defined two ways in XML: by a DTD file or by an XML schema. There are pros and cons for each method. DTD files are relatively simple and are recognized by a wide variety of parsers and XML browsers. On the other hand, DTD files can't express that a certain attribute has to be numerical. XML schemas are more complex. They are themselves written in a type of XML, which is nice in some ways. They are not as widely supported yet. Currently autoXml only works with DTD files with some modest extensions.
Here is a DTD file that would describe the POLYGON format above:
<!ELEMENT POLYGON (DESCRIPTION? POINT+)> <!ATTLIST POLYGON id CDATA #REQUIRED> <!ELEMENT DESCRIPTION (#PCDATA)> <!ELEMENT POINT> <!ATTLIST POINT x CDATA #REQUIRED> <!ATTLIST POINT y CDATA #REQUIRED> <!ATTLIST POINT z CDATA "0">
The DTD has two major types of definitions: ELEMENTs and ATTLISTs (or attributes). An element definition includes the name of the element and an optional parenthesized list of sub-elements. The sub-elements must be defined elsewhere in the DTD with the exception of the #PCDATA sub-element, which is used to indicate that the element can have text between its tags. Each sub-element may be followed by one of the following characters:
?: the sub-element is optional.
+: the sub-element occurs at least once.
*: the sub-element occurs 0 or more times.
If there is no following character the sub-element occurs exactly once.
The ATTLIST defines an attribute and associates it with an element. It is good style to keep ATTLISTs together with their ELEMENT. Here are the fields in an ATTLIST:
element: name of element this is associated with.
name: name of this attribute.
type: generally CDATA. Can be a reference or date, but these are not supported by autoXml.
default: this contains a default value to be used if the attribute is not present. The keyword #REQUIRED in this field means that the attribute must be present. The keyword #IMPLIED means that it's okay for this attribute to be missing (in which case it will have a NULL or zero value after it is read by autoXml).
Editorial Advisory Panel
Thank you to our 2014 Editorial Advisors!
- Jeff Parent
- Brad Baillio
- Nick Baronian
- Steve Case
- Chadalavada Kalyana
- Caleb Cullen
- Keir Davis
- Michael Eager
- Nick Faltys
- Dennis Frey
- Philip Jacob
- Jay Kruizenga
- Steve Marquez
- Dave McAllister
- Craig Oda
- Mike Roberts
- Chris Stark
- Patrick Swartz
- David Lynch
- Alicia Gibb
- Thomas Quinlan
- Carson McDonald
- Kristen Shoemaker
- Charnell Luchich
- James Walker
- Victor Gregorio
- Hari Boukis
- Brian Conner
- David Lane