Embedding Perl in MySQL

Add your own functionality to MySQL with MyPerl, which brings the powerful, versatile Perl interpreter into the heart of the relational database.

Despite the fact that MySQL comes with a rich set of functions, at some point you will find yourself wishing for some additional function or needing an advanced regular expression engine. To solve this problem, MySQL supports user-defined functions (UDFs). Through the UDF interface, you can load new functions into your database dynamically. Although this is a powerful feature, it does mean time spent debugging C or C++ code. As much as I like C, sometimes I don't have the time to write and debug applications written in it, and at other times I want a much faster development cycle. Enter Perl, the Swiss Army knife of languages, with a cast of thousands thanks to the Comprehensive Perl Archive Network (CPAN). Embedding Perl into MySQL gave me a lot of flexibility in being able to extend my database quickly. For these reasons, the embedded MySQL Perl interpreter, MyPerl, was written.

Setting Up Perl

The first step in putting Perl into the database is to get the right setup for Perl. Perl by default is not threadsafe, and MySQL, on the other hand, uses a thread for each user connection. So for Perl to live inside the database, you must compile a threadsafe version of Perl:

./Configure -Dusethreads -Duseithreads

Once this is finished and built, you will have a Perl that is threadsafe. This does not mean your code or any Perl modules you use will be threadsafe, it simply means that Perl itself will be. Building Perl to be threadsafe is a necessary step, because I know of no vendors shipping a threadsafe Perl at this time. Don't be fooled by the fact that MyPerl will build with a nonthreaded Perl; it can, but at some point it will crash your database. I suspect that with the advent of Apache 2.0 and a final release of mod_perl 2, some vendors will consider shipping their Perl binaries with threads enabled. While completing this article, I finally upgraded to Red Hat 9 and saw that they have begun shipping Perl with threads enabled.

Writing the Embedded Perl Interpreter

UDFs have three stages: init, request and deinit. The init stage is called once at the beginning of a query; the request stage is called once per row returned, and the deinit stage is called after the data is sent to the client. Both the init and deinit stages can be skipped, although for all but the most simple UDF you need to create and clean up memory you will use to return data to the client.

MyPerl starts off with the following init function:


my_bool
myperl_init(UDF_INIT *initid, UDF_ARGS *args,
            char *message)
{
    myperl_passable *pass = NULL;
    int exitstatus = 0;
    char *embedding[] = { "perl",
                          "-MMyPerl::Request",
                          "-e", "1" };
    PerlInterpreter *my_perl;
    uint i =0;
    initid->max_length = 256;
    initid->maybe_null=1;

Three parameters are passed into the init method, and it returns either success or failure. The UDF_INIT structure holds information that controls the behavior of how UDF responds. It also is the only structure that is passed between all three stages. First, MySQL is told the UDF will be sending data back that is greater than the size of a VARCHAR. By telling the server it needs more space then a VARCHAR, the server assumes it will be returning blobs.

Although MyPerl does not know that it actually will be doing this, at this point it has no way of knowing how much data it will be returning, so telling the server that it is returning blobs is the safer bet. Next, it sets maybe_null to 1, because there is always the possibility it will be returning NULL values. MyPerl returns NULL for both empty results and upon compilation errors that occur from code that you eval().

The next thing is to check the rows being passed in:


if (args->arg_count == 0 || i
    args->arg_type[0] != STRING_RESULT) {
    strncpy(message,USAGE, MYSQL_ERRMSG_SIZE);
    return 1;
}
for (i=0 ; i < args->arg_count; i++)
        args->arg_type[i]=STRING_RESULT;

MyPerl expects that the first row being passed in is the code it will execute. Therefore, if no rows are passed in or if the first row is not a string, an error should occur. Error messages must be at most the size of MYSQL_ERRMSG_SIZE, and they must be copied into the message string. To save some time, MyPerl walks through the arguments that MySQL will be passing in and tells it to convert them to strings.

Before the Perl interpreter is set up, a structure must be created to store the interpreter and track a memory block that it will use to return data for each request:


pass = (myperl_passable *)
    malloc(sizeof(myperl_passable));;
if (!pass) {
    strncpy(message, "Could not allocate memory",
            MYSQL_ERRMSG_SIZE);
    return 1;
}

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Memory leak

Anonymous's picture

Hi sir,

I'm Nguyen Ngoc Tam. After reading your this topic, i found that the memory of your MYSQL server would increase so that it could cause the memory leak, when it handled a lot of perl queries. Have you controlled this situation ?.

Regards
Ngoc Tam

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState