PHP as a General-Purpose Language
If PHP is your scripting language of choice
when it comes to developing dynamic Web sites, you probably have grown
to love its immediacy and power. An estimated ten million Web sites
use at least some PHP scripting to generate their pages.
Although most people use PHP primarily as a Web development scripting
system, it possesses all the characteristics of a proper general-purpose
language that can be useful in a variety of other environments. In
this article, I illustrate how it's possible to use the
command-line version of PHP to perform complex shell operations, such
as manipulating data files, reading and parsing remote XML documents
and scheduling important tasks through cron.
The contents of this article are based on the latest version of PHP at
the time of this writing,
4.3.0, which was released at the end of 2002. However, you should be
able to use older versions of PHP 4 without many problems. I
explain the differences you may encounter as necessary.
PHP-CLI
With the release of PHP 4.3, a new version of the interpreter called command-line interface (or PHP-CLI) is available. PHP-CLI is
not a shell as the name implies but, rather, a version of
PHP designed to run from the shell. As far as software
development is concerned, only a few differences exist between
PHP-CLI and its CGI or server API (SAPI) counterparts. For one thing,
traditional Apache server variables are not available, as Apache
isn't even in the picture, and the HTTP headers are not output
when a script is executed. Also, the engine does not use output buffering,
because it would be of no benefit in a non-Web environment.
PHP-CLI is created by default when you compile your version of PHP,
unless you use the --disable-cli switch when you execute the configuration
script. It is not, however, installed by default. But, you can force
make to compile it and install it by using a special command:
make install-cli
To verify whether the CLI version of PHP is installed on your server,
all you need to do is execute this command:
php -v
The resulting version information should specify whether the CLI or
CGI version of PHP is being executed. If you have only the CGI version
and don't want to install the CLI, you still can use PHP as a
shell-scripting language. Their differences are mostly aesthetic, and
their effect can be toned down somewhat by using the right command-line
switches when invoking the interpreter.
Parsing an RSS Feed
Being a lover of weblogging, I routinely visit a certain number
of blogs on the Net. This is a somewhat tedious process, because I
don't like the idea of a news aggregator running on my machine on
a continuous basis, and I do not see the need to pay for one. It seemed,
though, that an RSS aggregator might be a great way to
show how some of PHP's powerful features, such as the fopen()
wrappers and the built-in XML parsing engine, could be used to create a
script that runs from the command line.
An RSS feed is, essentially, a simple XML document that contains
information about items published by a news source, such as Linux
Journal. Its format consists of a channel container that includes several
optional elements, such as a title and description, in addition to a
set of item subcontainers. Each of these, in turn, contains a title,
a description and a link to the news story it represents.
Typically, a news aggregator loads the information from an arbitrary
number of news feeds and presents everything together in a given format,
such as HTML. For users, a news aggregator represents a convenient way
to create a single point of information for all the news sources
of interest.
My PHP-based news aggregator, called Feeder and shown in Listing 1,
presents its results in a plain-text e-mail that is sent to the user, who
then executes the script. Feeder loads a list of RSS feeds from a file located
in ~/.feeder.rc (Listing 2). The first line of this file also contains
the e-mail address to which the news feed data should be sent. The
content of the configuration files are loaded using a simple trick:
the back-tick operator, which performs exactly the same function as it
does in the shell, is used to call the cat command. The output is then
split into an array of individual lines using the explode function.
Listing 1. Feeder, an RSS Aggregator
<?php
// Classes used internally to parse the XML
// data
class CItem
{
var $title;
var $description;
var $url;
}
class CFeed
{
var $title;
var $url;
var $items;
var $currentitem;
}
// XML handlers
function ElementStarter($parser, $name, $attrs)
{
global $currentelement;
global $elements;
$elements[$currentelement ++] = $name;
}
function ElementEnder($parser, $name)
{
global $elements;
global $currentelement;
global $currentfeed;
if ($name == 'ITEM')
{
$currentfeed->items[] =
$currentfeed->currentitem;
$currentfeed->currentitem = new CItem;
}
$currentelement--;
}
function DataHandler ($parser, $data)
{
global $elements;
global $currentelement;
global $currentfeed;
switch ($elements[$currentelement - 1])
{
case 'TITLE' :
if ($elements[$currentelement - 2] == 'ITEM')
$currentfeed->currentitem->title .= $data;
else
$currentfeed->title = $data;
break;
case 'LINK' :
if ($elements[$currentelement - 2] == 'ITEM')
$currentfeed->currentitem->url .= $data;
else
$currentfeed->url .= $data;
break;
case 'DESCRIPTION' :
if ($elements[$currentelement - 2] == 'ITEM')
$currentfeed->currentitem->description
.= $data;
else
$currentfeed->description .= $data;
break;
}
}
// Feed loading function
function get_feed ($location)
{
global $elements;
global $currentelement;
global $currentfeed;
$xml_parser = xml_parser_create();
$elements = array();
$currentelement = 0;
$currentfeed = new CFeed;
$currentfeed->currentitem = new CItem;
xml_parser_set_option
($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler
($xml_parser, "ElementStarter", "ElementEnder");
xml_set_character_data_handler
($xml_parser, "DataHandler");
if (!($fp = fopen($location, "r")))
return 'Unable to open location';
while ($data = fread($fp, 4096))
{
if (!xml_parse($xml_parser, $data, feof($fp)))
return 'XML PARSE ERROR';
}
xml_parser_free($xml_parser);
return $currentfeed;
}
// Feed formatting function
function format_feed ($feed, $url)
{
if (!is_object ($feed))
{
$res = "Error loading feed at: $url.\n" .
"$feed\n\n";
}
else
{
$res = "{$feed->title}\n[{$feed->url}]\n\n";
foreach ($feed->items as $item)
{
$res .= "{$item->title}\n[{$item->url}]\n\n" .
wordwrap ($item->description, 70) . "\n\n" .
str_repeat ('-', 70) . "\n\n";
}
}
return $res;
}
// Load up configuration file
$data = explode ("\n", trim (`cat ~/.feeder.rc`));
// The first line is the address, so skip it
$result = 0;
// Cycle through and get all the feeds
for ($i = 1; $i < count ($data); $i++)
$result .= format_feed
(get_feed ($data[$i]), $data[$i]);
// Mail them out to the user
mail ($data[0], 'Feeder update', $result);
?>
Listing 2. The Configuration File for Feeder
// Feed formatting function
function format_feed ($feed, $url)
{
ob_start();
if (!is_object ($feed)) {
?>
<p>
<b>Unable to load feed at
<a href="<?= $url ?>"?>
<?= htmlentities($url) ?></a></b></p>
<?php
} else {
?>
<h1><a href="<?= $feed->url ?>">
<?= $feed->title ?></a></h1>
<p />
<?php
foreach ($feed->items as $item) {
?>
<h2><a href="<?= $item->url ?>">
<?= htmlentities ($item->title) ?></a></h2>
<div width=500>
<?= htmlentities ($item->description) ?>
<hr></div>
<?php
}
}
$res = ob_get_contents();
ob_clean();
return $res;
}
The parsing of the XML feed happens in two phases. First, the
get_feed function uses the fopen() wrappers to download the feed in
4KB chunks. These are then passed on to an instance of the built-in
PHP XML parser, which proceeds to interpret their contents and call
ElementStarter(), ElementEnder() and DataHandler(), as needed. These
three functions, in turn, parse the contents of the XML file and create
a structure of CFeed and CItem instances that represents the feed itself.
The script then calls the format_feed function, which scans feed objects
and produces a textual version of their contents. Once all the feeds have been parsed and formatted, the resulting message
is e-mailed to the intended recipient.
As a security note, format_feed() uses the wordwrap function to format
the description of a news item so it doesn't span more than
70 columns. This helps enhance the readability of the news feed by
presenting the user with a more compact look. Prior to PHP 4.3.0, the
source code for wordwrap() included an unchecked data buffer that could,
in theory, be exploited to execute arbitrary code, thus presenting a
security issue. If you're not using the latest version of PHP,
you probably should either avoid using wordwrap() or replace it with
your home-grown version.
Executing the Script
The easiest way to execute a script from the shell is to invoke
the PHP interpreter explicitly:
marcot ~# php feeder.php
If you have the CGI version of PHP, you may want to use the -q switch,
which causes the interpreter to omit any HTTP headers that are normally
required during a Web transaction.
This explicit method, however, is not very practical if you want your users to
access the scripts you write conveniently. A better solution
consists of making the scripts executable, so they can be invoked
explicitly, as if they were autonomous programs. To do this,
first determine the exact location of your
PHP executable:
marcot ~# which php /usr/local/bin/php
The next step consists of creating a shebang—an
initial command that instructs the shell interpreter to pipe the
remainder of an executable file through a specific application (the PHP
engine in our case). The shebang must be the first line of your
script—there can't be any white spaces before it. It starts
with the character # and the character !,
followed by the name of the executable through which the remainder of the
file must be piped. For example, if you're using the CLI version
of PHP, your shebang may look like this:
#!/usr/local/bin/php
If you're using the CGI version of the PHP interpreter, you
also can pass additional options to it in order to keep it quiet and
prevent the usual HTTP headers from being printed out:
#!/usr/local/bin/php -q
The final step consists of making your script executable:
marcot ~# chmod a+x feeder.php
At this stage, you can run the script without explicitly invoking
the PHP interpreter; the shell will take care of that for you.
As you may have noticed, I have not renamed the
script to remove the .php extension. Even though
the extension itself is not necessary when running
scripts from the shell, its presence makes it easy
for text editors such as vim to recognize it and
highlight the source's syntax:
marcot ~# ./feeder.php
Running PHP Scripts through Cron
A news aggregator that must be invoked explicitly every time you
want to read your news page is not very useful. Therefore, you may want to
have your system run it automatically on a specific schedule. The cron
dæmon generally is used for this purpose. cron is a simple dæmon that runs in the background and, at fixed
intervals, reads through a special file, called crontab, that contains
schedule specifications for each of the users on the server. Based on the
information contained in the crontab file, cron executes an arbitrary
number of shell commands and, optionally, sends an e-mail notification
of their results to the user. The crontab file contains entries in the following format:
minute hour day month weekday command
The first five fields indicate the time or times at which a command
must be executed. For example:
5 9 13 9 1 /usr/bin/feeder.php
means that at 9:05 AM of September 13, the command
/usr/bin/feeder.php will be executed, but only if September 13 falls on a
Monday (weekday 1). This may sound complicated, but it's an
extreme example. Most likely, you want to execute commands on a
simpler schedule, like the beginning of every hour. This
is accomplished by using the * wild card, which means
any. So, for once an hour, on the hour, you would enter:
0 * * * * /usr/bin/feeder.php
And for once a day, at midnight, enter:
0 0 * * * /usr/bin/feeder.php
The time fields allow for even more complex specifications. For
example, you can create a list of specific times by separating them with
a comma:
0,30 * * * * /usr/bin/feeder.php
This crontab specification causes the command /usr/bin/feeder.php
to be run every 30 minutes starting from the hour. Similarly, you
can specify inclusive lists of times by separating them with a dash. For
example, the following crontab command:
0 0 * * 1-3 /usr/bin/feeder.php
causes the script to be executed at midnight, Monday through
Wednesday.
In order to change the contents of your crontab file, you need
to use the crontab utility, which also automatically
edits the correct file and notifies the dæmon that your schedule has
changed. There aren't any special requirements to run a PHP
script as a cron job, as long as it does not expect any input
from a user.
Manipulating HTML Code
Even though your PHP-CLI scripts are not outputting HTML through
a Web server, you still can use them to manipulate and produce HTML code.
Because the script is written rather modularly, converting its output
to HTML format involves changing only the format_feed function and
modifying the call to mail(). This is done so the e-mail message can be recognized
as a valid HTML document by the user's e-mail application.
One of the greatest advantages of scripting Web pages with PHP is the
ability to mix dynamic statements directly with the static HTML
code. As you can see from Listing 3, which shows an updated version of
format_feed, this concept still works perfectly even when the script is
not outputting to a Web page.
Listing 3. A Version of the format_feed Function that
Produces HTML
// Feed formatting function
function format_feed ($feed, $url)
{
ob_start();
if (!is_object ($feed))
{
?>
<p>
<b>Unable to load feed at
<a href="<?= $url ?>"?>
<?= htmlentities($url) ?></a></b></p>
<?php
}
else
{
?>
<h1><a href="<?= $feed->url ?>">
<?= $feed->title ?></a></h1>
<p />
<?php
foreach ($feed->items as $item)
{
?>
<h2><a href="<?= $item->url ?>">
<?= htmlentities ($item->title) ?></a></h2>
<div width=500>
<?= htmlentities ($item->description) ?>
<hr>
</div>
<?php
}
}
$res = ob_get_contents();
ob_clean();
return $res;
}
The trick that makes it possible to capture PHP's output in a
variable essentially consists of engaging the interpreter's output
buffer (disabled by default) by calling ob_start(). Once the
appropriate information has been output, the script retrieves the
contents of the buffer, then erases it and turns output buffering off
with a call to ob_end().
Where to Go from Here
Although the news aggregator script I present in this article
performs a rather complex set of functions—from grabbing content
off the Web to parsing XML and formatting it in HTML—it
requires only about 200 lines of code, including all the comments
and blank lines. It is possible to write the same script in Perl or even
as a shell script, with the help of some external applications such as
wget, expat and sendmail. The latter approach, in my opinion,
results in a complicated code base with plenty of opportunities
for mistakes.
PHP-CLI rarely is installed by default on a machine running Linux,
although you can count on Perl being readily available. Thus, if you have
control over the make-up of the server on which you're running scripts
and you're comfortable with PHP, there's no reason why
you need to learn another language to write most of your shell
applications. If, on the other hand, you're writing code
to run on a separate machine over which you have no control, you
may find PHP a slightly more problematic choice.
Marco Tabini is an author and software consultant
based in Toronto, Canada. His company, Marco Tabini & Associates,
Inc., specializes in the introduction of open-source software in
enterprise environments. You can reach Marco through his weblog at
blogs.phparch.com.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
PHP
PHP is very easy to use. If you have some experience of C you won't have any problems to get started with PHP. Even HTML coders can start integrating PHP into their pages straight away. But maybe PHP is too simple. What do I mean by that? The simplicity of PHP means that almost anyone can write some scripts and as a result there is a lot of badly designed code out there. This gives PHP a bad name it does not deserve because it is a very powerful tool. PHP is designed for building Web applications that are scalable up to a very large number of users. With PHP 5 many developers finally got the robust support for object oriented programming they where waiting for but also its XML and MySQL support was much improved. There is much discussion about if PHP is "enterprise ready" - I truly believe it is since it reached version five.
Not able toconnect to mysql in crontab through PHP
hi there
I am not able to connect to mysql in a PHP script that is running through crontab when it is time for crontab to run PHP script i get an error mesage in a mail that Fatal Error-Call to undefined function mysql_connect() in the script .otherwise the script is running OK in explorer or through linux command line ie. php filename.php.
I am using like - mysql_connect("localhost",$uname,$pass);
to make the connection.
plz help
Undefined mysql_connect...
For the last half hour I was trying to deal with the same problem,
And I just remembered that the "dl" function might do the job.. and looky here.. it did!!!! :)
NOW TO THE SUBJECT:
When executed from Cron, or command line, a PHP script might not recognize the MySql functions, this is because the required libraries are not loaded. It can be simply fixed by the "dl" function in the following manner:
dl("mysql.so"); //loads the mysql library/module or perhaps you might be using a different library than mysql.so
After this line.. one can use the mysql functions as one pleases.. hopefully :)
THANK YOU
dl("mysql.so"); was exactly what i was looking for, thanks so much!
Thank you so much.
This solution was a great help to me. Thanks so much. You rock!
Undefined mysql_connect...
For the last half hour I was trying to deal with the same problem,
And I just remembered that the "dl" function might do the job.. and looky here.. it did!!!! :)
NOW TO THE SUBJECT:
When executed from Cron, or command line, a PHP script might not recognize the MySql functions, this is because the required libraries are not loaded. It can be simply fixed by the "dl" function in the following manner:
dl("mysql.so"); //loads the mysql library/module or perhaps you might be using a different library than mysql.so
After this line.. one can use the mysql functions as one pleases.. hopefully :)
Re: PHP as a General-Purpose Language
this is really a very good solution to general shell scripting. I was doing this to export data to xml files, do straighforward oracle database extraction and mysql backups.
the problem i ran into is that, like mentioned in the article, php is not readily available on all boxes and cannot always be installed...this can really become a problem.
otherwise, an excellent solution, to which php is really well suited.
Re: PHP as a General-Purpose Language
PHP makes me sad. :(
Re: PHP as a General-Purpose Language
Listing 2 doesn't seem to be a configuration file.
SimpleXML
Have a look at SimpleXML. It is available since PHP5 and it makes life much easier for simple xml-files like RSS.
Re: SimpleXML
Check out this article that shows how to use SimpleXML to parse a RSS feed from php-planet.net
Post new comment