Python Scripts as a Replacement for Bash Utility Scripts
Often in Python scripts that are used on the command line, arguments
are used to give users options when they run a certain command. For
head command takes a
-n argument that takes the number
following it and prints only that number of lines. Each argument that
is provided to a Python script is exposed through the
which can be accessed by first importing
code below shows how to take
single words as arguments. This program is a simple adder, which takes two
number arguments and adds them, and prints that out to the user. However,
this format of taking in command-line arguments is rather basic. It is
easy to make mistakes—for instance, pass two strings, such as
and world, to this command, and you will start to get
#!/usr/bin/env python import sys if __name__ == "__main__": # The first argument of sys.argv is always the filename, # meaning that the length of system arguments will be # more than one, when command-line arguments exist. if len(sys.argv) > 2: num1 = long(sys.argv) num2 = long(sys.argv) else: print "This command takes two arguments and adds them" print "Less than two arguments given." sys.exit(1) print "%s" % str(num1 + num2)
Thankfully, Python has a number of modules to deal with command-line arguments. My personal favorite is OptionParser. OptionParser is part of the optparse module that is provided by the standard library. OptionParser allows you to do a range of very useful things with command-line arguments:
Specify a default if a certain argument is not provided.
It supports both argument flags (either present or not) and arguments with values (-n 10000).
It supports different formats of passing arguments—for example, the difference between -n=100000 and -n 100000.
Let's use the OptionParser to enhance the sending-mail script. The original script had a lot of variables hard-coded into place, such as the SMTP details and the users' login credentials. In the code provided below, command-line arguments are used to pass in these variables:
#!/usr/bin/env python import smtplib import sys from optparse import OptionParser def initialize_smtp_server(smtpserver, smtpport, email, pwd): ''' This function initializes and greets the SMTP server. It logs in using the provided credentials and returns the SMTP server object as a result. ''' smtpserver = smtplib.SMTP(smtpserver, smtpport) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo() smtpserver.login(email, pwd) return smtpserver def send_thank_you_mail(email, smtpserver): to_email = email from_email = GMAIL_EMAIL subj = "Thanks for being an active commenter" # The header consists of the To and From and Subject lines # separated using a newline character. header = "To:%s\nFrom:%s\nSubject:%s \n" % (to_email, from_email, subj) # Hard-coded templates are not best practice. msg_body = """ Hi %s, Thank you very much for your repeated comments on our service. The interaction is much appreciated. Thank You.""" % email content = header + "\n" + msg_body smtpserver.sendmail(from_email, to_email, content) if __name__ == "__main__": usage = "usage: %prog [options]" parser = OptionParser(usage=usage) parser.add_option("--email", dest="email", help="email to login to smtp server") parser.add_option("--pwd", dest="pwd", help="password to login to smtp server") parser.add_option("--smtp-server", dest="smtpserver", help="smtp server url", default="smtp.gmail.com") parser.add_option("--smtp-port", dest="smtpserverport", help="smtp server port", default=587) options, args = parser.parse_args() if not (options.email or options.pwd): parser.error("Must provide both an email and a password") smtpserver = initialize_smtp_server(options.stmpserver, options.smtpserverport, options.email, options.pwd) # for every line of input. for email in sys.stdin.readlines(): send_thank_you_mail(email, smtpserver) smtpserver.close()
This script shows the usefulness of OptionParser. It provides a simple, easy-to-use interface for command-line arguments, allowing you to define certain properties for each command-line option. It also allows you to specify default values. If certain arguments are not provided, it allows you to throw specific errors.
So what have you learned? Instead of replacing a series of bash commands with one Python script, it often is better to have Python do only the heavy lifting in the middle. This allows for more modular and reusable scripts, while also tapping into the power of all that Python offers. Using stdin as a file object allows Python to read input, which is piped to it from other commands, and writing to stdout allows it to continue passing the information through the piping system. Combining information like this can make for some very powerful programs. The examples I have given here are all for a fictional service that logs to a file.
As a real-world example, recently I have been working with gigabytes of CSV files that I have been converting using a Python script to a file that contains SQL commands to insert the information. To understand the sort of data I'm concerned with here, I ran the data for a single table, and the script took 23 hours to execute and generated an SQL file that was 20GB in size. The advantage of using a Python script in the fashion described in this article is that the whole file does not need to be read into memory. This means that an entire 20GB+ file can be processed one line at a time. Also it is easier to think about a problem when each step (reading, sorting, manipulation and writing) is separated into these logical steps. The guarantee that each of these commands, which are part of the core utilities of UNIX-like environment, is efficient and stable helps the entire experience to be more stable and secure.
The other benefit is that there is no hard-coded file that is read
in. Often having the flexibility to pass it strings rather than the
concept of files is very powerful. For instance, if 20,000
lines through a certain file, the script breaks, instead of re-running
the script from the start,
tail can be used to read only from the line
on which the script failed.
There are a lot of aspects to Python in the shell that go beyond the scope of this article, such as the os module and the subprocess module. The os module is a standard library function that holds a lot of key operating system-level operations, such as listing directories and stating files, along with an excellent submodule os.path that deals with normalizing directories paths. The subprocess module allows Python programs to run system commands and other advanced operations, such as handling piping as described above within Python code between spawned processes. Both of these libraries are worth checking out if you intend to do any Python shell scripting.
Richard Delaney is a software engineer with Demonware Ireland. Richard works on back-end Web services using Python and the Django Web framework. He has been an avid Linux user and evangelist for the past five years.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Building a Multisourced Infrastructure Using OpenVPN
- Happy GPL Birthday VLC!
- Unikernels, Docker, and Why You Should Care
- diff -u: What's New in Kernel Development
- What's New in 3D Printing, Part III: the Software
- Giving Silos Their Due
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- Non-Linux FOSS: Snk
- Firefox OS