Getting Started with Condor
To test our new condor setup, let's create a simple “Hello Condor” job:
#include
int main()
{ printf("Hello World!\n");}
Compile the application with gcc.
Now, to submit a job to Condor, we need to write a submit file. A submit file describes what Condor needs to do with the job—that is, where it will get the input for the application, where to produce the output and if any errors occur, where it should store them:
Universe = Vanilla Executable = hello Output = hello.out Input = hello.in Error = hello.err Log = hello.log Queue
The first Universe entry defines the runtime environment under which Condor should run the job. Two Universes are noteworthy: for long jobs, such as those that will last for weeks and months, the Standard Universe is recommended, as it ensures reliability and the ability to save partial execution state and relocate the job to another machine automatically if the first machine crashes. This saves a lot of vital processing effort. However, to use the Standard Universe, the application must be “condor compiled”, and the source code is required. The Vanilla Universe is for jobs that are short-lived, but long jobs also can be executed if the stability of the machines is guaranteed. Vanilla jobs can run unmodified binaries.
Other Universes in Condor include PVM, MPI and Java, for PVM, MPI and Java applications, respectively. For more detail on Condor Universes consult the documentation.
In this example, our executable file is called hello (the traditional “Hello Condor” program), and we're using the Vanilla Universe. The Input, Output, Error and Log directives tell Condor which files to use for stdin, stdout and stderr and to log the job's execution. Finally, the Queue directive specifies how many copies of the program to run.
After you have the submit file ready, run condor_submit hello.sub to submit it to Condor. You can check on the status of your job using condor_q, which will tell you how many jobs are in the queue, their IDs and whether they're running or idle, along with some statistics.
Condor has many other features; so far we have covered only the basics of getting it up and running. A number of tutorials are available on-line, along with the Condor Manual (www.cs.wisc.edu/condor/manual), that will teach you the basic and advanced capabilities of Condor. When reading the Condor Manual, pay particular attention to the Standard Universe, which allows you to checkpoint your job, and the Java Universe, which allows you to run Java jobs seamlessly.
You also can add Condor to the boot sequence of your central manager and other machines. You can shut down cluster machines, and their jobs will continue or restart on a different machine (depending on whether it's a Standard Universe job or a Vanilla job). This allows for a lot of flexibility in managing a system.
Condor is not only about clusters. An extension to Condor allows jobs submitted within one pool of machines to execute on another (separate) Condor pool. Condor calls this flocking. If a machine within the pool where a job is submitted is not available to run the job, the job makes its way to another pool. This is enabled by special configuration of the pools.
The simplest flocking configuration sets a few configuration variables in the condor_config file. For example, let's set up an environment where we have two clusters, A and B, and we want jobs submitted in A to be executed in B. Let's say cluster A has its central manager at a.condor.org and B at b.condor.org. Here's the sample configuration:
FLOCK_TO = b.condor.org FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)
The FLOCK_TO variable can specify multiple pools, by entering a comma-separated list of central managers. The other two variables usually point to the same settings that FLOCK_TO does. The configuration macros that must be set in pool B authorize jobs from pool A to flock to pool B. The following is a sample of configuration macros that allows the flocking of jobs from A to B. As in the FLOCK_TO field, FLOCK_FROM allows users to authorize the flocking of incoming jobs from specific pools:
FLOCK_FROM=a.condor.org HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM) HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
The above settings set flocking from pool A to pool B, but not the reverse. To enable flocking in both directions, each direction needs to be considered separately. That is, in pool B you would need to set the FLOCK_TO, FLOCK_COLLECTOR_HOSTS and FLOCK_NEGOTIATOR_HOST to point to pool A, and set up the authorization macros in pool A for B.
Be careful with HOSTALLOW_WRITE and HOSTALLOW_READ. These settings let you define the hosts that are allowed to join your pool, or those that can view the status of your pool but are not allowed to join it, respectively.
Condor provides flexible ways to define the hosts. It is possible, for example, to allow read access only to the hosts that belong to a specific subnet, like this:
HOSTALLOW_READ=127.6.45.*
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- another very interesting
23 min 16 sec ago - Reply to comment | Linux Journal
2 hours 16 min ago - Reply to comment | Linux Journal
9 hours 10 min ago - Reply to comment | Linux Journal
9 hours 26 min ago - Favorite (and easily brute-forced) pw's
11 hours 18 min ago - Have you tried Boxen? It's a
17 hours 10 min ago - seo services in india
21 hours 41 min ago - For KDE install kio-mtp
21 hours 42 min ago - Evernote is much more...
23 hours 42 min ago - Reply to comment | Linux Journal
1 day 8 hours ago






Comments
Open source?
The Condor FAQ is pretty clear that source is not distributed freely.
How is Condor in any way 'Open Source'?
You just send a request
You just send a request email, and they give you access to a website with the source.
And the license allows you to do anything you want with the source (redistribute, make derivative works), ala BSD style.
Right source code is NOT
Right source code is NOT available from the website, however it is STILL opensource, because if you request it from them and have a good reason to do, like extending it for something, they will not deny the request.