Username/Email:  Password: 
TwitterFacebookFlickrRSS

Self-Diagnostic APIs: Software Quality's Next Frontier

Delivering an API that is data aware by using ANSI C/C++ compilers' ability to type check function arguments.

With embedded software adding intelligence to so many everyday objects, it seems remarkable that the tools used to create these programs aren't smarter when it comes to catching highly destructive bugs. In assigning blame for such errors, one culprit lies in the application programming interfaces (APIs) provided by software publishers. Developers have long chosen libraries of pre-built software for communication, data management, messaging and other purposes rather than creating this functionality from scratch.

But while middleware libraries offer benefits including convenience, portability and productivity, the manner in which they are constructed and used leads to bugs. This stems from the fact that software functions in APIs are nearly always data structure ignorant—they handle data without knowing its type. This severely limits the compiler's and middleware runtime's abilities to perform any validation, greatly increasing the likelihood of programming mistakes slipping through QA.

The potential for a new kind of API that helps to catch and fix such bugs is built into C and C++. API vendors can deliver a programming interface that is data-aware and self-diagnostic by taking advantage of the function argument type-checking ability of every ANSI C/C++ compiler. This article explores the idea by looking at the API of McObject's eXtremeDB, an in-memory database available on Linux—but the idea of a self-diagnostic API applies as well to other middleware categories.

The concept requires us to abandon the old idea that an API must be a static library of functions that is applied in every situation. Instead, the programming interface is generated for each project or implementation of the middleware and therefore is aware of that project's data types.

Database APIs

APIs for database software development kits (SDKs) fall into two categories: interfaces for SQL and navigational interfaces. With navigational interfaces, developers interact with the contents of a database one record at a time. SQL, in contrast, is a set-oriented programming interface. With this API, the user submits SQL statements to the database in order to select, filter, sort and join together rows (records) from many tables. The query results in a set of tuples (rows). The API then is used to fetch the results, either one row at a time or in batches. There is no industry or de-facto standard interface for navigational APIs, so database vendors offering navigational interfaces provide proprietary APIs.

The Safety Issue

Yet pre-defined database APIs, whether SQL or navigational, carry a significant downside: for an interface library to be able to manage data of any database definition, it must have a programming interface that ignores the type of all data. In other words, the database programming interface must treat the data as un-typed, or opaque.

To accomplish this, databases use void pointers to pass data between the database library and the application program. A void pointer is a C/C++ language variable that legally can point to any type of data. With no type, neither the C/C++ compiler nor the database runtime can perform any validation on them. This opens the possibility of passing a pointer to the wrong type of data, with consequences ranging from nonsense data in the database to a corrupted, unusable database to a crashed program.

Let's look at three API examples, one each from Berkeley DB, SQL/ODBC and eXtremeDB. First, let's define a simple database in SQL:

create table make (
   make_id   integer,
   make_name char(20)

   primary key make_id;
)

create table model (
   make_id    integer foreign key references make,
   model_name char(20)
)

Here's how the same database would be defined in eXtremeDB:


declare database cars;

class make
{
   unsigned<4> make_id;
   char<20>    make_name;

   hash <make_id> by_make_id[10000];
};

class model
{
   unsigned<4> make_id; // foreign key of class make
   char<20>    model_name;

   tree <make_id> by_make_id;
};

Berkeley DB does not have a data definition language. Instead, a host program stores name/value pairs, and it is up to the development team to express the organization and interrelationships of the data through source code comments and system documentation.

Next, we write code to populate these databases, first with SQL and ODBC:


insert_make( long make_id, char *make_name )
{
   unsigned char *sql = 
      "insert into make values(?,?)";

   // housekeeping omitted

   SQLBindParameter( handle, // statement handle
      1,      // parameter number
      SQL_PARAM_INPUT, // InputOutputType
      SQL_C_LONG, // ValueType
      SQL_INTEGER, // ParameterType
      0, // ColumnSize ignored for SQL_INTEGER
      0, // DecimalDigits ignored for SQL_INTEGER
      &make_id, // ParameterValuePtr
      sizeof(make_id), // BufferLength
      sizeof(make_id) // ignored
   );
   SQLBindParameter( handle, // statement handle
      2,      // parameter number
      SQL_PARAM_INPUT, // InputOutputType
      SQL_C_CHAR, // ValueType
      SQL_VARCHAR, // ParameterType
      strlen(make_name), // ColumnSize 
      0, // DecimalDigits ignored for SQL_VARCHAR
      make_name, // ParameterValuePtr
      strlen(make_id), // BufferLength
      strlen(make_id) 
   );
   SQLExecute( handle );
}

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

If you must wear the static typing straightjacket...

Anonymous's picture

There is a deeper issue here of static versus dynamic typing. Languages like C++, Java etc. try to carefully set out the types of everything in advance, while Python, Perl, etc. do not require any pre-declaration.

The problem is that SQL is stuck in between. Given an arbitarary SELECT statement, it is not trivial to mechanically determine the type of the returned fields without knowing the current schema AND all the functions built into the particular database product we are talking to. The programmer knows, but the compiler won't. And if you restrict the kind of SELECTs you allow, you throw out half the power of a RDBMS.

Conversely, you cannot throw any value into any field, because SQL tables do have defined types.

Finally, the schema definition language (DDL) is distinct from the code in which you are accessing the database, so it is very difficult to bring them together to check any of this.

The problem with the article is that it assumes we want to extend the static typing straightjacket out into the DBMS. If you must use C++, this makes a certain amount of sense. I can certainly see advantages in this, but it also makes a lot of work.

Just remember, folks, that the impedance mismatch is much less when using SQL from a language that can go the other way and take the dynamic types as they come. Particularly for rapid development where performance is not critical - and many databases will have upgrade or maintainence procedures that are in this category. The pain and error-prone ODBC binding the author describes vanishes when using Python with ODBC.

SQL is a raw itch for statically typed languages, and I'm not sure that eXtremeDB magics the mismatch away as much as the author would like you to think.

F-G

Re: Self-Diagnostic APIs: Software Quality's Next Frontier

Anonymous's picture

if you want to write an ad for your proprietary product, I think it is best to just write an ad, not pretend you are inventing new programming techniques. the tone taken in this article is simply insulting.

Not so new...

Anonymous's picture

I think the following could be considered "prior-art", if you will...

MFC's database bindings create type-correct source code interface to tables, based on the tables' interface.

CORBA's IDL will create application-specific client-side classes that are structured according to the messaging requirements. I think these are type-correct.

ObjectStore is an object-oriented database. You create a C++ class, and ObjectStore will figure out how to store that thingy in a database. Again, the compiler ensures typesafety.

So while eXtremeDB's technique might be nice, it's probably not an entirely new approach.

Re: Not so new...

Anonymous's picture

If you use EJB you have strong typed functions to peform changes in the underlaying DB. The only difference that MFC wrapper is generated based on existing DB schema, and EJB uses type information to generate scheme for DB. So nothing new in this technique at all. The whole article sounds like product advertisement, not technical article.