UpFront

The UpFront Section

UpFront

diff -u: What's New in Kernel Development

Ahmed S. Darwish recently initiated an abortive effort to improve the way system crashes were logged. When a system, particularly a laptop, crashes very early in the boot process, typically no record is kept. Ahmed's idea was to rely on BIOS routines to write logging information to disk even if a crash occurs almost immediately. There was a lot of interest in Ahmed's work, since it could be a significant aid to debugging. But, Linus Torvalds torpedoed the idea, saying, “No way in hell do I want the situation of 'the system is screwed, so let's overwrite the disk' to be something the kernel I release might do. It's crazy. That disk is a lot more important than the kernel.” So clearly, this type of feature almost certainly will remain a third-party patch for a long time to come.

The ARM architecture has quite a few implementations of the clk struct, which annoyed Jeremy Kerr and inspired him to rewrite them all into a single generic implementation. A lot of folks pitched in with suggestions. Some of the trickier points involved identifying the proper way to tell when multiple devices were using a single clock. It also was important to make sure that the generic code have appropriate boundaries, so it would be easier to write special case code for the systems that needed it.

Oren Weil of Intel has submitted a patch to support Intel's new management engine. Unfortunately, there was a variety of problems with the patch submission process itself, including leaving off the crucial Signed-Off-By headers, among other things. So, much of the discussion centered around that instead of the code. One key problem was that patches are never supposed to break the kernel build. This is essential even if subsequent patches would fix the build. The reason is that developers need to be able to rely on “git bisect” to track down bugs, and this is possible only if every version of the kernel builds properly. This also makes for a cleaner development process in general, in which no single project is stuck waiting for another project to fix the build before it can continue doing its own tests and development.

Thomas Gleixner has revamped the generic interrupt handling code in the kernel. This was partly done to clean up naming conventions so as to be easier for developers to understand, but it also was done to encapsulate the various features of the system, so developers couldn't abuse them anymore—or at least so developers who abused them would be easy to identify and stop. One of the goals is to extend the IRQ handling code gradually to accommodate the special needs of all the various projects out there, and that's more difficult to do when folks are reaching their misshapen prongs into the guts of the code.

Eric Paris wanted to re-introduce a global capabilities bounding set to make it possible to remove a capability from the system in such a way that it could not be gotten back. The idea would be to remove some key capabilities and then hand root access over to untrusted entities. Eric's idea was that this would keep the system secure. Unfortunately, Linux security is implemented primarily in terms of ways to prevent root access, rather than ways to mitigate the effect of granting root access. The assumption made by virtually all kernel developers is that once a user has gained root access, that's the ball game. So something like Eric's subtle manipulation of root privileges does not ring true to most kernel folks.

Dropbox Tips and Tricks

Dropbox, or one of the alternatives like Ubuntu One or SparkleShare, are great tools for keeping computers in sync. They offer some unique abilities as well. Here are a few of our favorites:

  • Keep config folders, like Pidgin's .purple directory in your Dropbox, and symlink to it in your home directory. It saves entering the same information on your computers.

  • Have your home computer monitor a specific folder inside Dropbox for torrent files. Then, whenever you want to download a torrent at home, just put the torrent file into your folder on any computer, and it will start downloading at home.

  • Keep your favorite desktop wallpaper photos in Dropbox, so you have access to all your NASA astronomy pictures on your various desktop and laptop computers. (This works for non-NASA photos too, but why would you want non-space-based wallpaper?)

  • Use Dropbox in combination with Calibre and Calibre2opds to keep your e-book library on-line and always accessible. It makes downloading a book from your collection simple when you're on vacation and forget to pack enough books.

Do you have more Dropbox tips? Send them to me at shawn@linuxjournal.com, and I'll publish the good ones on LinuxJournal.com.

Non-Linux FOSS

If you're a fan of Quicksilver on OS X or GNOME Do on Linux, Launchy might be just the thing your Windows install needs. An open-source application launcher, Launchy stays out of the way until you need it. A simple keystroke brings it to the foreground and helps you launch whatever application you desire. Check it out at www.launchy.net.

Adding More Awesome to Your Office

Whether you prefer OpenOffice.org or LibreOffice, which currently still are pretty similar, out of the box, they are missing some of the conveniences installed by their commercial counterparts. Granted, they are fully functional, but if you want a robust clip-art library and a decent selection of document templates, you'll want to add some extensions and templates.

Browse on over to extensions.services.openoffice.org and templates.services.openoffice.org to pick up some add-ons that will make your office suite shine. Whether you want to add a few graphics to your document or spice up your next presentation, the options are extensive.

Also, if your user has write access to the system files, you'll get the option to install extensions for all users or just the current user—an awesome boon for sysadmins like myself!

Parallel Programming Crash Course

I've been covering various scientific programs the past few months, but sometimes it's hard to find a package that does what you need. In those cases, you need to go ahead and write your own code. When you are involved with heavy-duty scientific computing, you usually need to go to parallel computing in order to get the runtimes down to something reasonable. This month, I give a crash course in parallel programming so you can get a feel for what is involved.

There are two broad categories of parallel programs: shared memory and message passing. You likely will see both types being used in various scientific arenas. Shared-memory programming is when all of the processors you are using are on a single box. This limits you as to how big your problem can be. When you use message passing, you can link together as many machines as you have access to over some interconnection network.

Let's start by looking at message-passing parallel programming. The most common version in use today is MPI (Message Passing Interface). MPI is actually a specification, so many different implementations are available, including Open MPI, MPICH and LAM, among others. These implementations are available for C, C++ and FORTRAN. Implementations also are available for Python, OCaml and .NET.

An MPI program consists of multiple processes (called slots), running on one or more machines. Each of these processes can communicate with all other processes. Essentially, they are in a fully connected network. Each process runs a full copy of your program as its executable content and runs independently of the others. The parallelism comes into play when these processes start sending messages to each other.

Assuming you already have some MPI code, the first step in using it is to compile it. MPI implementations include a set of wrapper scripts that handle all of the compiler and linker options for you. They are called mpicc, mpiCC, mpif77 and mpif90, for C, C++, FORTRAN 77 and FORTRAN 90, respectively. You can add extra options for your compiler as options to the wrapper scripts. One very useful option is -showme. This option simply prints out the full command line that would be used to invoke your compiler. This is useful if you have multiple compilers and/or libraries on your system, and you need to verify that the wrapper is doing the right thing.

Once your code is compiled, you need to run it. You don't actually run your program directly. A support program called mpirun takes care of setting up the system and running your code. You need to tell mpirun how many processors you want to run and where they are located. If you are running on one machine, you can hand in the number of processors with the option -np X. If you are running over several machines, you can hand in a list of hostnames either on the command line or in a text file. If this list of hostnames has repeats, mpirun assumes you want to start one process for each repeat.

Now that you know how to compile and run your code, how do you actually write an MPI program? The first step needs to initialize the MPI subsystem. There is a function to do this, which in C is this:


int MPI_Init(&argc, &argv);

Until you call this function, your program is running a single thread of execution. Also, you can't call any other MPI functions before this, except for MPI_Initialized. Once you run MPI_Init, MPI starts up all of the parallel processes and sets up the communication network. After this initialization work is finished, you are running in parallel, with each process running a copy of your code.

When you've finished all of your work, you need to shut down all of this infrastructure cleanly. The function that does this is:

int MPI_Finalize();

Once this finishes, you are back to running a single thread of execution. After calling this function, the only MPI functions that you can call are MPI_Get_version, MPI_Initialized and MPI_Finalized.

Remember that once your code goes parallel, each processor is running a copy of your code. If so, how does each copy know what it should be doing? In order to have each process do something unique, you need some way to identify different processes. This can be done with the function:

int MPI_Comm_rank(MPI_Comm comm, int *rank);

This function will give a unique identifier, called the rank, of the process calling it. Ranks are simply integers, starting from 0 to N–1, where N is the number of parallel processes.

You also may need to know how many processes are running. To get this, you would need to call the function:

int MPI_Comm_size(MPI_Comm comm, int *size);

Now, you've initialized the MPI subsystem and found out who you are and how many processes are running. The next thing you likely will need to do is to send and receive messages. The most basic method for sending a message is:

int MPI_Send(void *buf, int count, MPI_Datatype type, 
 ↪int dest, int tag, MPI_Comm comm);

In this case, you need a buffer (buf) containing count elements of type type. The parameter dest is the rank of the process that you are sending the message to. You also can label a message with the parameter tag. Your code can decide to do something different based on the tag value you set. The last parameter is the communicator, which I'll look at a little later. On the receiving end, you would need to call:

int MPI_Recv(void *buf, int count, MPI_Datatype type, 
 ↪int source, int tag, MPI_Comm comm, MPI_Status *status);

When you are receiving a message, you may not necessarily care who sent it or what the tag value is. In those cases, you can set these parameters to the special values MPI_ANY_SOURCE and MPI_ANY_TAG. You then can check what the actual values were after the fact by looking at the status struct. The status contains the values:

status->MPI_source
status->MPI_tag
status->MPI_ERROR

Both of these functions are blocking. This means that when you send a message, you end up being blocked until the message has finished being sent. Alternatively, if you try to receive a message, you will block until the message has been received completely. Because these calls block until they complete, it is very easy to cause deadlocks where, for example, two processes are both waiting for a message to arrive before any messages get sent. They end up waiting forever. So if you have problems with your code, these calls usually are the first places to look.

These functions are point-to-point calls. But, what if you want to talk to a group of other processes? MPI has a broadcast function:

int MPI_Bcast(void *buf, int count, MPI_Datatype type, 
 ↪int root, MPI_Comm comm);

This function takes a buffer containing count elements of type type and broadcasts to all of the processors, including the root process. The root process (from the parameter root) is the process that actually has the data. All the others receive the data. They all call MPI_Bcast, and the MPI subsystem is responsible for sorting out who has the data and who is receiving. This call also sends the entire contents of the buffer to all the processes, but sometimes you want each process to work on a chunk of the data. In these cases, it doesn't make sense to send the entire data buffer to all of them. There is an MPI function to handle this:

int MPI_Scatter(void *send, int sendcnt, MPI_Datatype type,
                void *recv, int recvcnt, MPI_Datatype type, int root,
                MPI_Comm comm);

In this case, they all call the same function, and the MPI subsystem is responsible for sorting out which is root (the process with the data) and which are receiving data. MPI then divides the send buffer into even-size chunks and sends it out to all of the processes, including the root process. Then, each process can work away on its chunk. When they're done, you can gather up all the results with:

int MPI_Gather(void *send, int sendcnt, MPI_Datatype type,
               void *recv, int recvcnt, MPI_Datatype type, int root,
               MPI_Comm comm);

This is a complete reversal of MPI_Scatter. In this case, all the processes send their little chunks, and the root process gathers them all up and puts them in its receive buffer.

Taking all of the information from above and combining it together, you can put together a basic boilerplate example:


#include <mpi.h>
// Any other include files

int main(int argc, char **argv){
   int id,size;
   // all of your serial code would
   // go here
   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &id);
   MPI_Comm_size(MPI_COMM_WORLD, &size);
   // all of your parallel code would
   // go here
   MPI_Finalize();
   // any single-threaded cleanup code
   // goes here
   exit(0);
}

Hopefully, you now feel more comfortable with MPI programs. I looked at the most basic elements here, but if you feel inspired, you should grab a good textbook and see what other functions are available to you. If not, you at least should be able to read existing MPI code and have a good idea of what it's trying to do. As always, if you'd like to see a certain area covered in this space, feel free to let me know.

Scrivener, Now for Linux!

The folks over at www.literatureandlatte.com have a rather nifty writer's tool called Scrivener. For years, it's been an OS X-only program for novelists and screenwriters that acts like a project management tool for big writing projects. Linux users may be familiar with Writer's Café from www.writerscafe.co.uk, which is a similar program. Although Scrivener is a little more expensive ($45 vs. $40 for Writer's Café), its features make it something any novelist should check out. And, if you try it during the beta period, Scrivener is free.

Unfortunately, users with existing licenses for the OS X version of Scrivener cannot transfer that license to the Linux version. Perhaps once the final version is released, the Literature and Latte folks will change their minds. Either way, if you're a writer, you'll want to check out Scrivener or Writer's Café. Both are neat packages, and now both are Linux-compatible!

They Said It

All sorts of computer errors are now turning up. You'd be surprised to know the number of doctors who claim they are treating pregnant men.

—Isaac Asimov

Humanity has the stars in its future, and that future is too important to be lost under the burden of juvenile folly and ignorant superstition.

—Isaac Asimov

I am not a speed reader. I am a speed understander.

—Isaac Asimov

I do not fear computers. I fear the lack of them.

—Isaac Asimov

If my doctor told me I had only six minutes to live, I wouldn't brood. I'd type a little faster.

—Isaac Asimov

We Want to Hear from YOU at LinuxJournal.com

The Internet is a marvelous thing. I know, “Welcome to 1995, Katherine”, but hear me out. You hold in your hands a marvelous source of information about Linux and open-source software. I think it's a pretty fantastic use of paper and ink, but it's a one-way street. LinuxJournal.com on the other hand, allows for numerous opportunities to communicate both ways. I'd love to see each of you take advantage of the on-line community to exchange ideas in the comments sections of your favorite articles or communicate with our authors and staff directly via our LinuxJournal.com profiles (I'm www.linuxjournal.com/users/webmistress). Visit often to check out the latest poll and vote. Wondering which office software Linux Journal readers prefer? Now you know: www.linuxjournal.com/content/whats-your-favorite-office-program.

We want to hear from you so we can continue to provide you with the content you enjoy most and to get to know you a little better. Don't be shy! Visit LinuxJournal.com and say hi.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState