Self-Diagnostic APIs: Software Quality's Next Frontier
With embedded software adding intelligence to so many everyday objects, it seems remarkable that the tools used to create these programs aren't smarter when it comes to catching highly destructive bugs. In assigning blame for such errors, one culprit lies in the application programming interfaces (APIs) provided by software publishers. Developers have long chosen libraries of pre-built software for communication, data management, messaging and other purposes rather than creating this functionality from scratch.
But while middleware libraries offer benefits including convenience, portability and productivity, the manner in which they are constructed and used leads to bugs. This stems from the fact that software functions in APIs are nearly always data structure ignorant—they handle data without knowing its type. This severely limits the compiler's and middleware runtime's abilities to perform any validation, greatly increasing the likelihood of programming mistakes slipping through QA.
The potential for a new kind of API that helps to catch and fix such bugs is built into C and C++. API vendors can deliver a programming interface that is data-aware and self-diagnostic by taking advantage of the function argument type-checking ability of every ANSI C/C++ compiler. This article explores the idea by looking at the API of McObject's eXtremeDB, an in-memory database available on Linux—but the idea of a self-diagnostic API applies as well to other middleware categories.
The concept requires us to abandon the old idea that an API must be a static library of functions that is applied in every situation. Instead, the programming interface is generated for each project or implementation of the middleware and therefore is aware of that project's data types.
APIs for database software development kits (SDKs) fall into two categories: interfaces for SQL and navigational interfaces. With navigational interfaces, developers interact with the contents of a database one record at a time. SQL, in contrast, is a set-oriented programming interface. With this API, the user submits SQL statements to the database in order to select, filter, sort and join together rows (records) from many tables. The query results in a set of tuples (rows). The API then is used to fetch the results, either one row at a time or in batches. There is no industry or de-facto standard interface for navigational APIs, so database vendors offering navigational interfaces provide proprietary APIs.
Yet pre-defined database APIs, whether SQL or navigational, carry a significant downside: for an interface library to be able to manage data of any database definition, it must have a programming interface that ignores the type of all data. In other words, the database programming interface must treat the data as un-typed, or opaque.
To accomplish this, databases use void pointers to pass data between the database library and the application program. A void pointer is a C/C++ language variable that legally can point to any type of data. With no type, neither the C/C++ compiler nor the database runtime can perform any validation on them. This opens the possibility of passing a pointer to the wrong type of data, with consequences ranging from nonsense data in the database to a corrupted, unusable database to a crashed program.
Let's look at three API examples, one each from Berkeley DB, SQL/ODBC and eXtremeDB. First, let's define a simple database in SQL:
create table make ( make_id integer, make_name char(20) primary key make_id; ) create table model ( make_id integer foreign key references make, model_name char(20) )
Here's how the same database would be defined in eXtremeDB:
declare database cars;
class make
{
unsigned<4> make_id;
char<20> make_name;
hash <make_id> by_make_id[10000];
};
class model
{
unsigned<4> make_id; // foreign key of class make
char<20> model_name;
tree <make_id> by_make_id;
};
Berkeley DB does not have a data definition language. Instead, a host program stores name/value pairs, and it is up to the development team to express the organization and interrelationships of the data through source code comments and system documentation.
Next, we write code to populate these databases, first with SQL and ODBC:
insert_make( long make_id, char *make_name )
{
unsigned char *sql =
"insert into make values(?,?)";
// housekeeping omitted
SQLBindParameter( handle, // statement handle
1, // parameter number
SQL_PARAM_INPUT, // InputOutputType
SQL_C_LONG, // ValueType
SQL_INTEGER, // ParameterType
0, // ColumnSize ignored for SQL_INTEGER
0, // DecimalDigits ignored for SQL_INTEGER
&make_id, // ParameterValuePtr
sizeof(make_id), // BufferLength
sizeof(make_id) // ignored
);
SQLBindParameter( handle, // statement handle
2, // parameter number
SQL_PARAM_INPUT, // InputOutputType
SQL_C_CHAR, // ValueType
SQL_VARCHAR, // ParameterType
strlen(make_name), // ColumnSize
0, // DecimalDigits ignored for SQL_VARCHAR
make_name, // ParameterValuePtr
strlen(make_id), // BufferLength
strlen(make_id)
);
SQLExecute( handle );
}
Trending Topics
| You Need A Budget | Feb 10, 2012 |
| The Linux powered LAN Gaming House | Feb 08, 2012 |
| Creating a vDSO: the Colonel's Other Chicken | Feb 06, 2012 |
| Your CMS Is Not Your Web Site | Feb 01, 2012 |
| Casper, the Friendly (and Persistent) Ghost | Jan 31, 2012 |
| Razor-qt 0.4 - Qt based Desktop Environment | Jan 30, 2012 |
- This is a great program. We
1 hour 54 min ago - No Air for Linux
3 hours 43 min ago - HEWLETT PACKARD created
3 hours 53 min ago - HEWLETT PACKARD created
3 hours 56 min ago - very helpful :)
4 hours 17 min ago - I'll give it a whirl
12 hours 52 min ago - TFPT, don't you mean TFTP!? I
21 hours 20 min ago - wunderbar!!
21 hours 39 min ago - Lubuntu on a USB key
1 day 11 hours ago - Because XFCE is neither fish
2 days 2 hours ago





Comments
If you must wear the static typing straightjacket...
There is a deeper issue here of static versus dynamic typing. Languages like C++, Java etc. try to carefully set out the types of everything in advance, while Python, Perl, etc. do not require any pre-declaration.
The problem is that SQL is stuck in between. Given an arbitarary SELECT statement, it is not trivial to mechanically determine the type of the returned fields without knowing the current schema AND all the functions built into the particular database product we are talking to. The programmer knows, but the compiler won't. And if you restrict the kind of SELECTs you allow, you throw out half the power of a RDBMS.
Conversely, you cannot throw any value into any field, because SQL tables do have defined types.
Finally, the schema definition language (DDL) is distinct from the code in which you are accessing the database, so it is very difficult to bring them together to check any of this.
The problem with the article is that it assumes we want to extend the static typing straightjacket out into the DBMS. If you must use C++, this makes a certain amount of sense. I can certainly see advantages in this, but it also makes a lot of work.
Just remember, folks, that the impedance mismatch is much less when using SQL from a language that can go the other way and take the dynamic types as they come. Particularly for rapid development where performance is not critical - and many databases will have upgrade or maintainence procedures that are in this category. The pain and error-prone ODBC binding the author describes vanishes when using Python with ODBC.
SQL is a raw itch for statically typed languages, and I'm not sure that eXtremeDB magics the mismatch away as much as the author would like you to think.
F-G
Re: Self-Diagnostic APIs: Software Quality's Next Frontier
if you want to write an ad for your proprietary product, I think it is best to just write an ad, not pretend you are inventing new programming techniques. the tone taken in this article is simply insulting.
Not so new...
I think the following could be considered "prior-art", if you will...
MFC's database bindings create type-correct source code interface to tables, based on the tables' interface.
CORBA's IDL will create application-specific client-side classes that are structured according to the messaging requirements. I think these are type-correct.
ObjectStore is an object-oriented database. You create a C++ class, and ObjectStore will figure out how to store that thingy in a database. Again, the compiler ensures typesafety.
So while eXtremeDB's technique might be nice, it's probably not an entirely new approach.
Re: Not so new...
If you use EJB you have strong typed functions to peform changes in the underlaying DB. The only difference that MFC wrapper is generated based on existing DB schema, and EJB uses type information to generate scheme for DB. So nothing new in this technique at all. The whole article sounds like product advertisement, not technical article.