Embedding Python in Your C Programs
The language of choice for large, high-performance applications in Linux is almost always C, or somewhat less often C++. Both are powerful languages that allow you to create high-performance natively compiled programs. However, they are not languages that lend themselves to runtime flexibility. Once a C/C++ application is compiled, its code is pretty much static. At times, that can be a real hindrance. For example, if you want to allow users of a program to create plugins easily that extend the application's functionality, you have to deal with complex dynamic linking issues that can cause no end of headaches. Additionally, your users will have to know C/C++ in order to extend the application, which severely limits the number of people capable of writing extensions.
A much better solution is to provide your users with a scripting language they can use to extend your application. With a scripting language, you will tend to have much more runtime flexibility, as well as shorter development times and a lower learning curve that will extend the base of users capable of creating extensions.
Unfortunately, creating a scripting language is very much a nontrivial task that easily could become a major portion of your program. Fortunately, you don't need to create a scripting language. With Python, you can embed the interpreter directly into your application and expose the full power and flexibility of Python without adding very much code at all to your application.
Including the Python interpreter in your program is extremely simple. Python provides a single header file for including all of the definitions you need when embedding the interpreter into your application, aptly named Python.h. This contains a lot of stuff, including several of the standard headers. For compiling efficiency, it might be nice if you could include only those parts of the interface that you actually intend to use, but unfortunately Python doesn't really give you that option. If you take a look at the Python.h file, you'll see that it defines several important macros and includes a number of common headers that are required by the individual components included later in the file.
To link your application to the Python interpreter at compile time, you should run the python-config program to get a list of the linking options that should be passed to the compiler. On my system, those are:
-lpython2.3 -lm -L/usr/lib/python2.3/config
So, how much code does it take to run the Python interpreter from a C app? As it turns out, very little. In fact, if you look at Listing 1, you'll see that it can be done in as little as three lines of code, which initialize the interpreter, send it a string of Python code to execute and then shut the interpreter back down.
Listing 1. Embedding Python in Three Lines
void exec_pycode(const char* code)
{
Py_Initialize();
PyRun_SimpleString(code);
Py_Finalize();
}
Or, you could embed an interactive Python terminal in your program by calling Py_Main() instead, as in Listing 2. This brings up the interpreter just as if you'd run Python directly from the command line. Control is returned to your application after the user exits from the interpreter shell.
Listing 2. Embedding an Interactive Python
void exec_interactive_interpreter(int arg, char** argv)
{
Py_Initialize();
Py_Main(argc, argv);
Py_Finalize();
}
Embedding the interpreter in three lines of code is easy enough, but let's face it, just executing arbitrary strings of Python code inside a program is neither interesting nor all that useful. Fortunately, it's also far from the extent of what Python allows. Before I get too deep into what it can do though, let's take a look at initializing the environment that Python executes within.
When you run the Python interpreter, the main environment context is stored in the __main__ module's namespace dictionary. All functions, classes and variables that are defined globally can be found in this dictionary. When running Python interactively or on a script file, you rarely need to care about this global namespace. However, when running the embedded interpreter, you'll often need to access this dictionary to get references to functions or classes in order to call or construct them. You also may find that you occasionally want to copy the global dictionary so that different bits of code can be run in distinct environments. For instance, you might want to create a new environment for each plugin that you load.
To get at the __main__ module's dictionary, you first need to get a reference to the module. You can do this by calling the PyImport_AddModule() function, which looks up the module name you supply and returns a PyObject pointer to that object. Why a PyObject? All Python data types derive from PyObject, which makes it a handy lowest-common denominator. Therefore, almost all of the functions that you'll deal with when interacting with the Python interpreter will take or return pointers to PyObjects rather than another more specific Python data type.
Once you have the __main__ module referenced by a PyObject, you can use the PyModule_GetDict() function to get a reference to the main module's dictionary, which again is returned as a PyObject pointer. You can then pass the dictionary reference when you execute other Python commands. For example, Listing 3 shows how you could duplicate the global environment and execute two different Python files in separate environments.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Nice article, thanks for the
7 hours 10 min ago - I once had a better way I
12 hours 56 min ago - Not only you I too assumed
13 hours 14 min ago - another very interesting
15 hours 7 min ago - Reply to comment | Linux Journal
17 hours 43 sec ago - Reply to comment | Linux Journal
23 hours 54 min ago - Reply to comment | Linux Journal
1 day 10 min ago - Favorite (and easily brute-forced) pw's
1 day 2 hours ago - Have you tried Boxen? It's a
1 day 7 hours ago - seo services in india
1 day 12 hours ago




Comments
Python Conference: PyCon 2006
For people that want to learn more about Python, especially those near Dallas Texas, I want to mention the upcoming community-organized annual Python conference, PyCon 2006.
Thanks!
Higher level interfaces
The Python/C API is very low level, verbose, painful to work with, and highly error prone. With C++ code in particular it just sucks - you'll spend half your time writing const_cast("blah") to work around "interesting" APIs or writing piles of "extern C" wrapper functions, and the rest of your time writing verbose argument encoding/decoding or reference counting code. It's immensely frustraing to use plain C to write a Python object to wrap a C++ object.
Do yourself a favour, and once you've got embedding working, expose the Python interface to your program using a higher level tool. I hear SWIG is pretty good, especially for plain C code, but haven't used it myself (I work with heavily templated C++ with Qt). SIP (used to make PyQt) from Riverbank computing has its advantages also, and is good if you want your Qt app's API to integrate cleanly into PyQt. Otherwise, I'd suggest the amazing Boost::Python for C++ users, as it's ability to almost transparently wrap your C++ interfaces, mapping them to quite sensible Python semantics, is pretty impressive.
Boost::Python has the added advantage that you can write some very nice C++ code that integrates cleanly with Python. For example, you can iterate over a Python list much like a C++ list, throw exceptions between C++ and Python, build Python lists as easily as (oversimplified example):
#include <boost/python.hpp>
using namespace boost::python;
boost::python::list make_list()
{
list pylist;
for (int i = 0; i != 10; ++i)
pylist.append( make_tuple( "fred", 10 ) );
return pylist;
}
The equivalent Python/C API code is longer, filled with dangerous reference count juggling, contains a lot of manual error checking that's often ignored, and is a lot uglier.
With regards to the article above, it's good to see things like this written. I had real trouble getting started with embedding Python, and I think this is a pretty well written intro. I do take issue with one point, though, and that's duplicating the environment. Cloning the main dict does not provide separate program environments - very far from it. It only gives them different global namespaces. Interpreter-wide state changes still affect both programs. For example, if one program imports a module, the other one can see it in sys.modules ; if one program changes a setting in a module, the other one is affected. Locale settings come to mind. Most well designed modules will be fine, but you'll run into the odd one that thinks that module-wide globals are a good idea and consequently chokes.
Unfortunately, the alternative is to use sub-interpreters. Sub-interpreters are a less than well documented part of Python's API, and as they rely on thread local storage they're hopeless in single threaded programs. They can be made to work (see Scribus, for example) but it's not overly safe, and will abort if you use a Python debug build.
When you combine this with a GUI toolkit like Qt3 that only permits GUI operations from the main thread (thankfully, the limitiation is relieved by Qt4), this becomes very frustrating. If you're not stuck with this limitation, you can just spawn off a thread for your users' scripts, and should consider designing your interface that way right from the start.
Embedding Python is handy. Have fun.
P.S: LJ staff, please fix your comment engine's braindeath about leading whitespace, < and > chars in <code> sections, and blank lines in <code> sections. Thankyou. Repeated entities are not a fun way to format text.
--
Craig Ringer
craig@postnewspapers.com.au