autoSql and autoXml: Code Generators from the Genome Project

These tools have saved us from the drudger of writing tens of thousands of lines of repetitive code—we hope you find them useful.
autoXml Extensions and Limits

autoXml extends the type field of ATTLIST to include INT or FLOAT for numerical rather than string values. Similarly you can use #INT or #FLOAT in place of #PCDATA to put a numerical type in the text field. If you include these extensions, please use the .dtdx rather than .dtd suffix on your DTD file.

Currently autoXml only copes with DTD comments if they start on a line by themselves. autoXml expects all ELEMENTS and ATTLIST declarations to fit on a single line. It doesn't handle reference data types beyond saving the reference ID as a string.

Listing 3. autoXml Code Generation

Refer to Listing 3 for a complete example of the source code autoXml generates. In addition to the .h file shown in Listing 3, autoXml generates a corresponding .c file as well. Each XML file has to have a root object. In this case the root object is POLYGON (our DTD as is won't let us have more than one polygon per file). You can read an XML file that respects this DTD using the polyPolygonLoad() function, and save it back out using the polyPolygonSave.

autoSql and autoXml work well on a range of data, as you've seen, anywhere from an address book to gene tracks. We hope you'll find these tools useful on your own projects.


Jim Kent, PhD, and his work on the Human Genome Project have been profiled in the New York Times, the San Francisco Chronicle, Software Development magazine and other publications. He is currently working on cross-species genomic comparisons and Parasol, a job controller for his kilocluster.

Heidi Brumbaugh ( has been a writer and editor in the computer publishing industry since the late eighties. Visit links to her projects and read some of her fiction at


White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState