Embedding Perl in MySQL

Add your own functionality to MySQL with MyPerl, which brings the powerful, versatile Perl interpreter into the heart of the relational database.

Despite the fact that MySQL comes with a rich set of functions, at some point you will find yourself wishing for some additional function or needing an advanced regular expression engine. To solve this problem, MySQL supports user-defined functions (UDFs). Through the UDF interface, you can load new functions into your database dynamically. Although this is a powerful feature, it does mean time spent debugging C or C++ code. As much as I like C, sometimes I don't have the time to write and debug applications written in it, and at other times I want a much faster development cycle. Enter Perl, the Swiss Army knife of languages, with a cast of thousands thanks to the Comprehensive Perl Archive Network (CPAN). Embedding Perl into MySQL gave me a lot of flexibility in being able to extend my database quickly. For these reasons, the embedded MySQL Perl interpreter, MyPerl, was written.

Setting Up Perl

The first step in putting Perl into the database is to get the right setup for Perl. Perl by default is not threadsafe, and MySQL, on the other hand, uses a thread for each user connection. So for Perl to live inside the database, you must compile a threadsafe version of Perl:

./Configure -Dusethreads -Duseithreads

Once this is finished and built, you will have a Perl that is threadsafe. This does not mean your code or any Perl modules you use will be threadsafe, it simply means that Perl itself will be. Building Perl to be threadsafe is a necessary step, because I know of no vendors shipping a threadsafe Perl at this time. Don't be fooled by the fact that MyPerl will build with a nonthreaded Perl; it can, but at some point it will crash your database. I suspect that with the advent of Apache 2.0 and a final release of mod_perl 2, some vendors will consider shipping their Perl binaries with threads enabled. While completing this article, I finally upgraded to Red Hat 9 and saw that they have begun shipping Perl with threads enabled.

Writing the Embedded Perl Interpreter

UDFs have three stages: init, request and deinit. The init stage is called once at the beginning of a query; the request stage is called once per row returned, and the deinit stage is called after the data is sent to the client. Both the init and deinit stages can be skipped, although for all but the most simple UDF you need to create and clean up memory you will use to return data to the client.

MyPerl starts off with the following init function:


my_bool
myperl_init(UDF_INIT *initid, UDF_ARGS *args,
            char *message)
{
    myperl_passable *pass = NULL;
    int exitstatus = 0;
    char *embedding[] = { "perl",
                          "-MMyPerl::Request",
                          "-e", "1" };
    PerlInterpreter *my_perl;
    uint i =0;
    initid->max_length = 256;
    initid->maybe_null=1;

Three parameters are passed into the init method, and it returns either success or failure. The UDF_INIT structure holds information that controls the behavior of how UDF responds. It also is the only structure that is passed between all three stages. First, MySQL is told the UDF will be sending data back that is greater than the size of a VARCHAR. By telling the server it needs more space then a VARCHAR, the server assumes it will be returning blobs.

Although MyPerl does not know that it actually will be doing this, at this point it has no way of knowing how much data it will be returning, so telling the server that it is returning blobs is the safer bet. Next, it sets maybe_null to 1, because there is always the possibility it will be returning NULL values. MyPerl returns NULL for both empty results and upon compilation errors that occur from code that you eval().

The next thing is to check the rows being passed in:


if (args->arg_count == 0 || i
    args->arg_type[0] != STRING_RESULT) {
    strncpy(message,USAGE, MYSQL_ERRMSG_SIZE);
    return 1;
}
for (i=0 ; i < args->arg_count; i++)
        args->arg_type[i]=STRING_RESULT;

MyPerl expects that the first row being passed in is the code it will execute. Therefore, if no rows are passed in or if the first row is not a string, an error should occur. Error messages must be at most the size of MYSQL_ERRMSG_SIZE, and they must be copied into the message string. To save some time, MyPerl walks through the arguments that MySQL will be passing in and tells it to convert them to strings.

Before the Perl interpreter is set up, a structure must be created to store the interpreter and track a memory block that it will use to return data for each request:


pass = (myperl_passable *)
    malloc(sizeof(myperl_passable));;
if (!pass) {
    strncpy(message, "Could not allocate memory",
            MYSQL_ERRMSG_SIZE);
    return 1;
}

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Memory leak

Anonymous's picture

Hi sir,

I'm Nguyen Ngoc Tam. After reading your this topic, i found that the memory of your MYSQL server would increase so that it could cause the memory leak, when it handled a lot of perl queries. Have you controlled this situation ?.

Regards
Ngoc Tam

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix