Embedding Python in Multi-Threaded C/C++ Applications

by Ivan Pulleyn

Developers often make use of high-level scripting languages as a way of quickly writing flexible code. Various shell scripting languages have long been used to automate processes on UNIX systems. More recently, software applications have begun to provide scripting layers that allow the user to automate common tasks or even extend the feature set. Think of all the well-known applications you use: GIMP, Emacs, Word, Photoshop, etc. It seems as though all can be scripted in some way.

In this article, I will describe how you can embed the Python language within your C applications. There are many reasons you would want to do this. For instance, you may want to provide your more advanced users with the ability to alter or customize the program. Or maybe you want to take advantage of a Python capability, rather than implement it yourself. Python is a good choice for this task because it provides a clean, intuitive C API. Since many complex applications are written using threads, I will also show you how to create a thread-safe interface to the Python interpreter.

All the examples assume you are using Python version 1.5.2, which comes pre-installed on most recent Linux distributions. The API to access the Python interpreter is the same for both C and C++. There are no special C++ constructs used, and all functions are declared extern “C”. For this reason, the concepts described and the example code given here should work equally well when using either C or C++.

Overview of the Python C/C++ API

There are two ways that C and Python code can work together within the same process. Simply put, Python code can call C code or C code can call Python code. These two methods are called “extending” and “embedding”, respectively. When extending, you create a new Python module implemented in C/C++. This allows you to provide new functionality to the Python language that cannot be implemented in Python. For instance, several core Python modules such as “time” and “nis” are implemented as C extensions, while others are written in Python. You never notice the difference between C and Python modules, because the act of importing and using these modules is the same. If you look around in your /usr/lib/python1.5 directory, you may see some shared library files (extension .so). These are Python module extensions written in C. You will also see various Python files (extension .py) which are modules written in Python.

Typically, when you embed Python, you will develop a C/C++ application that has the ability to load and execute Python scripts. The application will be linked against the Python interpreter library, called libpython1.5.a, which provides all functionality related to evaluating Python code. There is no Python executable involved, only an API for your application to use.

Embedded Python

Listing 1

Embedding Python is a relatively straightforward process. If your goal is merely to execute vanilla Python code from within a C program, it's actually quite easy. Listing 1 is the complete source to a program that embeds the Python interpreter. This illustrates one of the simplest programs you could write making use of the Python interpreter.

Listing 1 uses three Python-specific function calls. Py_Initialize starts up the Python interpreter library, causing it to allocate whatever internal resources it needs. You must call this function before calling most other functions in the Python API. PyEval_SimpleString provides a quick, no-frills way to execute arbitrary Python code. Interpretation of the code is immediate. In the above example, for instance, the import sys line causes Python to import the sys module before returning control to the C/C++ program. Each string passed to PyEval_SimpleString must be a complete Python statement of some kind. In other words, half statements are illegal, even if they are completed with another call to PyRun_SimpleString. For example, the following code will not work properly:

// Python will print first error here
PyRun_SimpleString("import ");<\n>
// Python will print second error here
PyRun_SimpleString("sys\n");<\n>

Py_Finalize is the last Python function which any application that embeds Python must call. This function shuts down the interpreter and frees any resources it allocated during its lifetime. You should call this when you are completely finished using the Python library. When you call Py_Finalize, Python will unload all imported modules one by one. Many modules must execute their own clean-up code when they are unloaded in order to free any global resources they may have allocated. For this reason, calling Py_Finalize can have the side effect of causing quite a bit of other code to run.

PyEval_SimpleString is just one way to execute Python code from within your C applications. In fact, there is a whole collection of similar high-level functions. PyEval_SimpleFile is just like PyEval_SimpleString, except it reads its input from a FILE pointer rather than a character buffer. See the Python documentation at www.python.org/docs/api/veryhigh.html for complete documentation on these high-level functions.

In addition to evaluating Python scripts, you can also manipulate Python objects and call Python functions directly from your C code. While this involves more complex C code than using PyEval_SimpleString, it also allows access to more detailed information. For example, you can access objects returned from Python functions or determine if an exception has been thrown.

Extending Python

When you embed Python within your application, it is often desirable to provide a small module that exposes an API related to your application so that scripts executing within the embedded interpreter have a way to call back into the application. This is done by providing your own Python module, written in C, and is exactly the same as writing normal Python modules. The only difference is your module will function properly only within the embedded interpreter.

Extending Python requires some understanding of how the Python interpreter manipulates objects from C. All function arguments and return values are pointers to PyObject structures, which are the C representation of real Python objects. You can make use of various function calls to manipulate PyObjects. Listing 2 is a simple example of a Python module extension written in C. This is the source to the Python crypt module, which provides one-way hashing used in password authentication.

Listing 2

All C implementations of Python-callable functions take two arguments of type PyObject. The first argument is always “self”, the object whose method is being called (similar to the infamous “this” pointer in C++). The second object contains all the arguments to the function. PyArg_Parse is used to extract values from a PyObject containing function arguments. You do this by passing, in the PyObject which contains the values, a format string which represents the data types you expect to be there, and one or more pointers to data types to be filled in with values from the PyObject. In Listing 2, the function takes two strings, represented by "(ss)". PyArg_Parse is similar to the C function sscanf, except it operates on a PyObject rather than a character buffer. In order to return a string value from the function, call PyString_FromString. This helper function takes a char* value and converts it into a PyObject.

Python, C and Threads

C programs can easily create new threads of execution. Under Linux, this is most commonly done using the POSIX Threads (pthreads) API and the function call pthread_create. For an overview of how to use pthreads, see “POSIX Thread Libraries” by Felix Garcia and Javier Fernandez at http://www.linuxjournal.com/lj-issues/issue70/3184.html in the “Strictly On-line” section of LJ, February 2000. In order to support multi-threading, Python uses a mutex to serialize access to its internal data structures. I will refer to this mutex as the “global interpreter lock”. Before a given thread can make use of the Python C API, it must hold the global interpreter lock. This avoids race conditions that could lead to corruption of the interpreter state.

The act of locking and releasing this mutex is abstracted by the Python functions PyEval_AcquireLock and PyEval_ReleaseLock. After calling PyEval_AcquireLock, you can safely assume your thread holds the lock; all other cooperating threads are either blocked or executing code unrelated to the internals of the Python interpreter, and you may now call arbitrary Python functions. Once acquiring the lock, however, you must be certain to release it later by calling PyEval_ReleaseLock. Failure to do so will cause a thread deadlock and freeze all other Python threads.

To complicate matters further, each thread running Python maintains its own state information. This thread-specific data is stored in an object called PyThreadState. When calling Python API functions from C in a multi-threaded application, you must maintain your own PyThreadState objects in order to safely execute concurrent Python code.

If you are experienced in developing threaded applications, you might find the idea of a global interpreter lock rather unpleasant. Well, it's not as bad as it first appears. While Python is interpreting scripts, it periodically yields control to other threads by swapping out the current PyThreadState object and releasing the global interpreter lock. Threads previously blocked while attempting to lock the global interpreter lock will now be able to run. At some point, the original thread will regain control of the global interpreter lock and swap itself back in.

This means when you call PyEval_SimpleString, you are faced with the unavoidable side effect that other threads will have a chance to execute, even though you hold the global interpreter lock. In addition, making calls to Python modules written in C (including many of the built-in modules) opens the possibility of yielding control to other threads. For this reason, two C threads that execute computationally intensive Python scripts will indeed appear to share CPU time and run concurrently. The downside is that, due to the existence of the global interpreter lock, Python cannot fully utilize CPUs on multi-processor machines using threads.

Enabling Thread Support

Before your threaded C program is able to make use of the Python API, it must call some initialization routines. If the interpreter library is compiled with thread support enabled (as is usually the case), you have the runtime option of enabling threads or not. Do not enable runtime threading support unless you plan on using threads. If runtime support is not enabled, Python will be able to avoid the overhead associated with mutex locking its internal data structures. If you are using Python to extend a threaded application, you will need to enable thread support when you initialize the interpreter. I recommend initializing Python from within your main thread of execution, preferably during application startup, using the following two lines of code:

// initialize Python
Py_Initialize();
// initialize thread support
PyEval_InitThreads();

Both functions return void, so there are no error codes to check. You can now assume the Python interpreter is ready to execute Python code. Py_Initialize allocates global resources used by the interpreter library. Calling PyEval_InitThreads turns on the runtime thread support. This causes Python to enable its internal mutex lock mechanism, used to serialize access to critical sections of code within the interpreter. This function also has the side effect of locking the global interpreter lock. Once the function completes, you are responsible for releasing the lock. Before releasing the lock, however, you should grab a pointer to the current PyThreadState object. You will need this later in order to create new Python threads and to shut down the interpreter properly when you are finished using Python. Use the following bit of code to do this:

PyThreadState * mainThreadState = NULL;
// save a pointer to the main PyThreadState object
mainThreadState = PyThreadState_Get();
// release the lock
PyEval_ReleaseLock();
Creating a New Thread of Execution

Python requires a PyThreadState object for each thread that is executing Python code. The interpreter uses this object to manage a separate interpreter data space for each thread. In theory, this means that actions taken in one thread should not interfere with the state of another thread. For instance, if you throw an exception in one thread, the other snippets of Python code keep running as if nothing happened. You must help Python to manage per-thread data. To do this, manually create a PyThreadState object for each C thread that will execute Python code. In order to create a new PyThreadState object, you need a pre-existing PyInterpreterState object. The PyInterpreterState object holds information that is shared across all cooperating threads. When you initialized Python, it created a PyInterpreterState object and attached it to the main PyThreadState object. You can use this interpreter object to create a new PyThreadState for your own C thread. Here's some example code which does just that (ignore line wrapping):

// get the global lock
PyEval_AcquireLock();
// get a reference to the PyInterpreterState
PyInterpreterState * mainInterpreterState = mainThreadState->interp<\n>;
// create a thread state object for this thread
PyThreadState * myThreadState = PyThreadState_New(mainInterpreterState);
// free the lock
PyEval_ReleaseLock();
Executing Python Code

Now that you have created a PyThreadState object, your C thread can begin to use the Python API to execute Python scripts. You must adhere to a few simple rules when executing Python code from a C thread. First, you must hold the global interpreter lock before doing anything that alters the state of the current thread state. Second, you must load your thread-specific PyThreadState object into the interpreter before executing any Python code. Once you have satisfied these constraints, you can execute arbitrary Python code by using functions such as PyEval_SimpleString. Remember to swap out your PyThreadState object and release the global interpreter lock when done. Note the symmetry of “lock, swap, execute, swap, unlock” in the code (ignore line wrapping):

// grab the global interpreter lock
PyEval_AcquireLock();
// swap in my thread state
PyThreadState_Swap(myThreadState);
// execute some python code
PyEval_SimpleString("import sys\n");
PyEval_SimpleString("sys.stdout.write('Hello from a C thread!\n')\n");
// clear the thread state
PyThreadState_Swap(NULL);
// release our hold on the global interpreter
PyEval_ReleaseLock();
Cleaning Up a Thread

Once your C thread is no longer using the Python interpreter, you must dispose of its resources. To do this, delete your PyThreadState object. This is accomplished with the following code:

// grab the lock
PyEval_AcquireLock();
// swap my thread state out of the interpreter
PyThreadState_Swap(NULL);
// clear out any cruft from thread state object
PyThreadState_Clear(myThreadState);
// delete my thread state object
PyThreadState_Delete(myThreadState);
// release the lock
PyEval_ReleaseLock();

This thread is now effectively done using the Python API. You may safely call pthread_exit at this point to halt execution of the thread.

Shutting Down the Interpreter

Once your application has finished using the Python interpreter, you can shut down Python support with the following code:

// shut down the interpreter
PyEval_AcquireLock();
Py_Finalize();

Note there is no reason to release the lock, because Python has been shut down. Be certain to delete all your thread-state objects with PyThreadState_Clear and PyThreadState_Delete before calling Py_Finalize.

Conclusion

Python is a good choice for use as an embedded language. The interpreter provides support for both embedding and extending, which allows two-way communication between C application code and embedded Python scripts. In addition, the threading support facilitates integration with multi-threaded applications without compromising performance.

You can download example source code at ftp.linuxjournal.com/pub/lj/listings/issue73/3641.tgz. This includes an example implementation of a multi-threaded HTTP server with an embedded Python interpreter. In order to learn more about the implementation details, I recommend reading the Python C API documentation at http://www.python.org/docs/api/. In addition, I have found the Python interpreter code itself to be an invaluable reference.

email: ivan@torpid.com

Ivan Pulleyn can be reached via e-mail at ivan@torpid.com.

Load Disqus comments