/var/opinion - Parallel Is Coming into Its Own
I started writing about computing back in the 1980s. I don't want to say which year, or do the math for how long I've been doing this. It makes me feel old.
I've made a plethora of predictions since then. Some of them left me red-faced and embarrassed. Some of them were spot-on. Some of them have not yet been fulfilled, but I still think my predictions are on target.
One of my earliest predictions was rather easy, but it was considered controversial back in the 1980s. I said that it was only a matter of time before we bumped up against the limits of Moore's Law, and the only viable answer would be parallel processing. Lo and behold, dual-core processors are now common, and it won't be long before we see quad-core processors, and the multicore cell processor in the PlayStation 3 is around the corner.
Naturally, the next logical step is clustering or other means of distributed processing. Here's where I begin to get nervous. When “grid computing” became a buzzword, my knee-jerk reaction was, “no, thanks”. I don't work in a company office anymore, but if I did, I wouldn't want the company off-loading processing to my desktop workstation unless I was certain that everything ran in a completely isolated sandbox. Put the grid processes in a chroot environment on Linux, for example. Even then, I'm not sure I'd be happy about the idea. What if I want to do something compute-intensive, and the grid process decides it wants my CPU cycles more than I do? This isn't supposed to happen, but since it's all in the hands of some administrator with his or her own agenda, why should I trust that it won't happen?
It's the lack of control and fear of security breaches that make me nervous. I've got four computers in my home that nobody ever turns off, and two more for special purposes that I turn on as needed. The two hand-me-down computers my kids use sit idle much of the time, unless my daughter is browsing the Web, or my son is playing World of Warcraft. I use a server as a centralized provider of resources such as printers, files and e-mail. It's a very old machine, but it never breaks a sweat given its purpose. All this represents a tremendous amount of wasted processing power. I'd love to tap in to that unused power at home. This is a safe environment, because I'm not talking about exposing my processing power to everyone on the Internet. I'm talking about distributing workloads across local machines.
In principle, however, Sun was right all along when it said, “the network is the computer”. Other companies, such as IBM, worked along the same lines before Sun did, but I don't know of any company that said it better than Sun. “The network is the computer” is a powerful phrase. As long as there is adequate security built in to every aspect of distributed processing, it makes perfect sense to provide common services as remote procedure calls and distribute every conceivable workload across as many computers as you want to make available to the system. If someone could make me feel comfortable about security and control, I'd buy into distributed processing in a big way.
Here are the challenges as I see them. First, there's the problem of heterogeneous platforms. How do you distribute a workload across machines with different processors and different operating systems? ProActive is one of several good platform-agnostic distributed computing platforms (see www-sop.inria.fr/oasis/ProActive). It is 100% pure Java, so it runs on any platform that supports Java. It has a great graphical interface that lets you manage the way you distribute the load of a job. You can literally drag a process from one computer and drop it onto another.
The problem is that a tool like ProActive doesn't lend itself to the way I want to distribute computing. I want it to be as transparent as plugging a dual-core processor in to my machine. Unfortunately, you can't get this kind of transparency even if you run Linux on all your boxes. The closest thing to it that I can think of is distcc, which lets you distribute the workload when you compile programs. Even this requires you to have the same version of compiler (and perhaps some other tools) on all your boxes. If you want this to be a no-brainer, you pretty much have to install the same distro of Linux on all your machines.
The bottom line here is that I smell an opportunity for Linux. I would love to see a project that makes distributed computing on Linux brainlessly transparent and distribution-agnostic. I'm talking about the ability to start up any computation-intensive application and have it automatically distribute the work across other machines on the network configured to accept the role as yet another “processor core”. You can make this transparent to the application by building it into the core user-space APIs. You manage it like you would any other network service. Is this too pie in the sky? I'd love to hear your opinions.
Nicholas Petreley is Editor in Chief of Linux Journal and a former programmer, teacher, analyst and consultant who has been working with and writing about Linux for more than ten years.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Build a Skype Server for Your Home Phone System
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python
- Not free anymore
57 sec ago - Great
3 hours 48 min ago - Reply to comment | Linux Journal
3 hours 56 min ago - Understanding the Linux Kernel
6 hours 10 min ago - General
8 hours 40 min ago - Kernel Problem
18 hours 43 min ago - BASH script to log IPs on public web server
23 hours 10 min ago - DynDNS
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 3 hours ago - All the articles you talked
1 day 5 hours ago






Comments
some thoughts on parallelism
I don't know if your comment system accepts trackbacks.
http://bitratchet.prweblogs.com/2006/08/09/linux-thots-on-parallelism/
OpenMosix?
Have you taken a look at OpenMosix? There are several openMosix-enabled liveCDs around: KlusTriX, ClusterKnoppix and Clusterix to name a few. Seems 2.6 support is around the corner as well.