Extending GlusterFS with Python
Are you a Python programmer who wishes your storage could do more for you? Here's an easy way to add functionality to a real distributed filesystem, in your favorite language.
Programming languages are usually not good neighbors. Even mixing languages as closely related as C and C++ often can lead to a morass of conflicting conventions with respect to symbol names, initialization orders and memory management strategies. As the distance between languages increases, the difficulty of integrating them increases as well. This is particularly true when attempting to mix compiled and interpreted languages. Most interpreted languages have ways to call functions and access symbols in compiled libraries, but these facilities often are far from convenient, and calling back the other way—from compiled code to interpreted—is less convenient still. Integration between interpreted languages is even less feasible—the one notable exception being the several languages that share the Java Virtual Machine (JVM). Interoperability between interpreted languages using different virtual machines usually is limited to message passing between separate processes.
In this context, Python's facilities for integrating with code written in other languages are like a breath of fresh air. One option is Jython, which exists quite comfortably within the aforementioned JVM ecosystem. For integration with compiled code, Python offers not one but two methods of integration. The first is the "extension API", which allows you to write Python modules in C. ("C" is used here as shorthand for any compiled code that adheres to the initialization and calling conventions originally defined for C.) Using this interface, it is possible to create compiled modules that offer the full functionality of native Python modules with the full performance of compiled code. There even are projects like Cython that will generate most of the necessary "boiler plate" for you.
The Python ctypes module offers an even more convenient option for integration with compiled code, with only a very small decrease in functionality. Using ctypes, Python code can call functions and access symbols even in C libraries whose authors never thought about Python at all. Python programmers also can use ctypes to interpret C data structures (overlapping somewhat with the functionality provided by the struct module) and even define Python callbacks that can be passed to C functions. Although it is not possible to do absolutely everything with ctypes that you can do with the extension interface, combining the two approaches can lead to very powerful results.
As a case study in combining Python code with an existing compiled program or language, this article focuses on the implementation of a Python "translator" interface for GlusterFS. GlusterFS is a modern distributed filesystem based on the principle of horizontal scaling—adding capacity or performance to a system by adding more servers based on commodity hardware instead of having to pay an ever-increasing premium to make existing servers more powerful. Development is sponsored by Red Hat, but it's completely open source, so anyone can contribute. In addition to horizontal scaling, another core principle of GlusterFS is modularity. Most of the functionality within GlusterFS actually is provided by translators—so called because they translate I/O calls (such as read or write) coming from the user into the same or other calls that are passed on toward storage. These calls are passed from one translator to another, arranged in an arbitrarily complex hierarchy, until eventually the lowest-level calls are executed on servers' local filesystems. I call this interface TXAPI here for the sake of brevity, even though that's not an official term. TXAPI has been used to implement internal GlusterFS functionality, such as replication and caching, and also external functionality, such as on-disk encryption.
This article is not primarily about GlusterFS, however. Even though I use GlusterFS to illustrate techniques for integrating Python and C code and show results to illustrate the potential benefits of such integration, most of the techniques are equally applicable to other programs with a similar set of characteristics. Those characteristics include a C "top level" calling into Python instead of the other way around, a fundamentally multithreaded execution model, and the presence of a well-defined plugin interface (TXAPI) that makes extensive use of callbacks in both directions.
The fact that GlusterFS is primarily a C program—filesystems are, after
all, system software—means that you can't use ctypes for everything. To
bootstrap your integration, you need to use Python's "embedding
API", which
is a close cousin of the previously mentioned extension API and allows
C code to call in to the Python interpreter. You need to invoke this API
at least once to create an interpreter and invoke an initialization
function in a Python module. For this purpose, you use a single C-based
"meta translator" that can be loaded just like translators always have
been. This translator is called glupy from GLUster and PYthon. (The
preferred pronunciation is "gloopy" even though
"glup-pie" might make
more sense given those origins.) Most of what glupy does is provide the
generic embedding-API glue to load the actual Python translator, which is
specified as an option. This loading is a fairly simple matter of calling
PyImport_Import to load the module, followed by
PyObject_CallObject to
initialize it, as shown below (error handling has been left
out for clarity):
priv->py_module = PyImport_Import(py_mod_name);
Py_DECREF(py_mod_name);
py_init_func = PyObject_GetAttrString(priv->py_module, "xlator");
py_args = PyTuple_New(1);
/* "this" is the C pointer to this glupy instance */
PyTuple_SetItem(py_args,0,PyLong_FromLong((long)this));
priv->py_xlator = PyObject_CallObject(py_init_func, py_args);
Py_DECREF(py_args);
The user's Python init function is then responsible for registering TXAPI callbacks for later, in addition to its own domain-specific initialization. Glupy also includes a Python/ctypes module that encapsulates the GlusterFS types and some functions that glupy users can invoke (in the example, this is done using the "dl" handle).
At this point, you reach a fork in the road. If you're already using the
embedding API, why not continue using it for almost everything? In
this approach, a glupy dispatch function would use
Py_BuildValue to
construct an argument list and then use
PyObject_CallObject to call the
appropriate Python function/method from a table. This is pretty tedious
code to write by hand, but much of the process could be automated. The
bigger problem with this approach is that TXAPI involves many pointers
to GlusterFS-specific structures, which must be passed through the
embedding API as opaque integers. The Python code receiving such a value
must then explicitly use from_address to convert this into a real Python
object. Clutter within glupy itself is not a problem, but clutter within
glupy users' code makes this approach less appealing.
The approach actually used in glupy involves less C code and more Python code, with a greater emphasis on ctypes. In this approach, the user's Python code is presented not as Python functions but as C functions, using ctypes to define function types that then can be used as decorators. Unfortunately, details of the platform-specific foreign function interfaces used by ctypes to implement such a callback mean that there's no way to get the actual function pointer as it's seen by C code other than by actually passing it to a C function. Accordingly, you pass the Python callback object to a glupy registration function that can see the result of this conversion. For each type of operation, there are two corresponding registration functions: one for the dispatch function that initiates the operation and one for the callback that handles completion. The glupy meta-translator then stores pointers to the registered functions in a table for fast access later. One side effect of this approach is that glupy functions are strongly typed. This might seem rather un-Pythonic, but TXAPI itself is strongly typed, and the consequences of mixing types could be a hung filesystem, so this seems like a reasonable safety measure. Although this might all seem rather complicated, the net result is Python code that's relatively free of type-conversion clutter and requires very little initialization code. For instance, the following shows the init function for an example I'll be using that registers dispatch functions and callbacks for two types of operations:
def __init__ (self, xl):
dl.set_lookup_fop(xl,lookup_fop)
dl.set_lookup_cbk(xl,lookup_cbk)
dl.set_create_fop(xl,create_fop)
dl.set_create_cbk(xl,create_cbk)
The next problem to solve is multithreading. The Python
interpreter still is essentially single-threaded, so C code that calls
into Python must be sure to take the Global Interpreter Lock and do other
things to keep the interpreter sane. Fortunately, current versions of
Python make this much easier than it used to be. The first thing you
need to do is enable multithreading by calling
PyEval_InitThreads
after Py_Initialize. What a surprising number of people seem to
miss, even though it's fairly well documented, is that part of what
PyEval_InitThreads does is acquire the Global Interpreter Lock on behalf
of the calling thread. This lock must be released explicitly at the
end of initialization, or else any other code that tries to acquire it
will deadlock. In this case, this acquisition is implicit in calls to
PyGILState_Ensure, which is the recommended way to set up interpreter
state before calling into Python from multithreaded C code. Each
glupy dispatch function and callback does this, with a matching call to
PyGILState_Release after the Python function returns.
Before moving on from what's inside glupy to what glupy code looks like, you need to know what this example glupy-based translator actually does. The problem this example tries to solve is one that occurs frequently when using GlusterFS to store the code for PHP Web applications. Often, such applications try to load literally hundreds of include files every time a page is requested. Each include file might exist in any of several include directories along a search path. The example caches information about "positive lookups" (that is, those that succeeded) but not about "negative lookups" (which failed).
Jeff Darcy has been working on network and distributed storage since that meant DECnet and NFS version 2 in the early 1990s. He is currently at Red Hat where he serves on the GlusterFS architecture team.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- What's the tweeting protocol?
- Tech Tip: Really Simple HTTP Server with Python
- Kernel Problem
28 min 34 sec ago - BASH script to log IPs on public web server
4 hours 55 min ago - DynDNS
8 hours 31 min ago - Reply to comment | Linux Journal
9 hours 3 min ago - All the articles you talked
11 hours 27 min ago - All the articles you talked
11 hours 30 min ago - All the articles you talked
11 hours 31 min ago - myip
15 hours 56 min ago - Keeping track of IP address
17 hours 47 min ago - Roll your own dynamic dns
23 hours 56 sec ago





Comments
german translation
Professional German Translation
German language one of the most competitive to translate, and some of us are just brave enough to take a stand and learn professional german writing
and reading and not just that translating it also, however some us just gets lucky to be native German speakers. Me personally I am none of the
above, I had some problems translating my business proposal into German to achieve the goal, my dream that I was working day and night for, but life
is not that easy, I tried so many different softwares and yes the ugly truth I translated with a low quality and lost the bid. And that was the day I
owed to bring all those professional and certified translators at one place and do not let anybody else to lose, but rather achieves what they
deserve and that place is
Try our high-quality translation and affordable price which you may never have it in the past, but now you do and we Promise nothing else than
service and pride.
well and its really great
well and its really great post while after read feel good thanks for sharing
Business communication
We are really grateful for
We are really grateful for your blog post. You will find a lot of approaches after visiting your post.
academic essay
An excellent post. The post
An excellent post. The post affects a lot of urgent issues in our minds. We can not be indifferent to these problems. Your article gives the light in which we are able to watch our real life. Keep it up.
LA Fashion
Python language for programming
Always used the Python language for programming, so these tips are very interesting to me, I'm always looking for new commands! 2 Via Conta
Gluster and phinton
Using ctypes, Python code can call functions and access symbols even in Tirinhas Memes
http://www.hairwigs.de/ I
http://www.hairwigs.de/
I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the good work.seo companyremely fast.
But I'm no longer an edge
But I'm no longer an edge case, because maps are proving to be essential on smartphones, which today approaches a billion or more people. Digital maps on phones are now among the core portfolio of smartphone apps, alongside voice, text, calendar and otimização de sites
Isant phyton a old school
Isant phyton a old school language ?
i use to code in c# but now a days i only code html and some php... sad sad
madeira plastica
Madeira plastica | plastic lumber | korando |
You tool is very useful for
You tool is very useful for those who want to get a programmer on my website: Filmes Downloads Tranks You!!!
Your blog is a source of
Your blog is a source of information, I am an avid reader and I wish you good luck.
pynthon
network that blocks SSH or from devices that do not support SSH herbalife