Linux Programing Hints

in
Perl is often considered a scripting language for systems administrators. Jim demonstrates that it is useful to applications and scientific programmers as well—as a prototyping tool.
Point In Polygon

Next let us consider a more difficult problem. How do we test whether a point is inside a polygon? This problem is not as simple as it may first appear, especially when you take into account the special cases—such as when the point is on the boundary of the polygon. Is that inside, outside, or do you want to count it as a separate case—on the boundary? Let's break the problem down and start by writing a Perl subroutine to test whether a point is on a line.

A not-too-efficient but easy-to-code routine works as follows. If a point is on a line the sum of the distances from the point to each of the line endpoints is the same as the distance between the endpoints, i.e. the length of the line. You might want to reread the previous sentence and make a little sketch to convince yourself that this is indeed the case. We will first need a routine (in Perl they are called subroutines) to find the distance between two points. C has a built-in function, hypot, but the one-liner in Figure 2 is the Perl equivalent. The sub keyword denotes a subroutine and the subroutine's name follows. There is no parameter list in the definition of a subroutine. Those are comments following the subroutine name in Figure 2. We will supply the parameter list when we call the subroutine, as you will see shortly.

```sub hypot # (x, y) returns sqrt(x * x + y * y)
{
sqrt \$_[0] * \$_[0] + \$_[1] * \$_[1];
}
```

Figure 2. Perl Subroutine to Find Distance Between Points

When you call a subroutine in Perl all of the parameters are automatically put into an array @_ (The @ denotes a standard array). In this case the array has two elements, the differences in the x and y coordinates of the two points. Note that the elements are referenced by address so that we need to be careful. Any modifications to these variables would change the values outside this subroutine. In Perl you can also create local variables within a subroutine, or inside of any block for that matter. Normally I would do so, but since this subroutine is so short and is likely to be called many times, I avoided the overhead of local variables. Note again, that a Perl function, sqrt, was called without any parentheses. The return value of a subroutine is the last thing calculated (or you can overrride this behavior with a specific return value). Here the Pythagorean result is the last value calculated, so no explicit return is necessary. You will probably find, as I have, that explicit return statements are rarely needed.

Now we need to tell our program what points and areas are. In C we seem to have the advantage of structures at our disposal. We would probably set up something like the typedef and declaration in Figure 3.

```typedef struct
{
double x,
y;
} POINT;
POINT *polygon_p;
```

Figure 3. Typedef and Declaration

A point has x and y coordinates and a polygon has three or more points. Every time we need a polygon we can malloc the space for it and fill it with points—and free the space when we are done with it. This is where Perl shines.

Perl has neither explicit memory allocation nor structures. The good news is that we do not have to concern ourselves with memory—it is there when we need it. The bad news is that we have to come up with some system like C's structures. This turns out to be trivial. Let us put everything into strings. Perl takes care of strings of any length, relieving us of the pesky problem of counting characters. You do not even have to add one for the terminating null character—Perl does not use a terminator.

Our point will simply be a string containing two doubles joined by a comma; our line will be a string containing two points joined by a colon; and our polygon will be a string containing three or more points joined by colons, like so:

```\$point = "1.0,2.0";
\$line = "0.0,0.0:0.0,1.0";
\$polygon = "1.0,2.0:6.0,5.0:0.0,3.1";
```

These “structures” turn out to be easy to scan visually and very easy to manage, thanks to Perl's join and split functions. Figure 4 is the point_on_line subroutine I developed using the above scheme.

```
\$epsilon = 1.0e-10;
sub point_on_line #(point, line) returns 0 or 1
{ # if on - sum of distance to ends should be distance between ends
local(\$point, \$line) = @_;
local(\$p_x, \$p_y) = split(",", \$point);
local(\$l_b, \$l_e) = split(":", \$line);
local(\$l_b_x, \$l_b_y) = split(",", \$l_b);
local(\$l_e_x, \$l_e_y) = split(",", \$l_e);
&fabs(&hypot(\$p_x - \$l_b_x, \$p_y - \$l_b_y) +
&hypot(\$p_x - \$l_e_x, \$p_y - \$l_e_y) -
&hypot(\$l_e_x - \$l_b_x, \$l_e_y - \$l_b_y)) < \$epsilon;
}
sub fabs # (x) returns absolute value of x
{
local(\$rv) = @_;
\$rv = -\$rv if \$rv < 0.0;
\$rv;
}
```

Figure 4. point-on-line Subroutine

This subroutine is called with two strings, i.e.

```&point_on_line(\$point, \$line);
```

and returns one if the point is on the line, and zero otherwise. Note that in this case I put the calling parameters into local arrays for safety and ease of maintenance—\$point and \$line are easier to remember next week than \$_[0] and \$_[1]. The split function's use should be obvious. The point string gets split into two coordinate strings by the comma, and the line string gets broken into two point strings by the colon. Then the points at the beginning and end of the line, \$l_b and \$l_e, get broken down into their respective coordinates. Finally the coordinates are passed to our hypot function. Note how the & sign is used to prefix a subroutine for the call. Perl has no equivalent to C's fabs so I quickly rolled my own in Figure 4. I used the \$epsilon variable to avoid floating point roundoff problems.

We have been building this thing from the bottom up. So far we have a routine to test whether a point is on a line. Our goal is a routine to test whether a point is within a polygon. Let me give you an overview of how we are going to solve the problem, provide you with two more Perl routines (which you should be able to read now), and then show you the external “workhorse” routine that does the final inside/on/outside determination.

Our polygon is a closed figure and every point is either inside, outside, or on the boundary (there is actually a mathematical statement called Jordan's Curve Theorem that proves this!). If we can establish a point that is guaranteed to be outside of our polygon, we can test the “insideness” of any point by connecting this test point to our “outside” point by a straight line and counting the number of times the line crosses the polygon. Clearly if the line never crosses the polygon, the point is outside; if it crosses once the point is inside; and, in general, if the line crosses the polygon's boundary an odd number of times, the point is inside. An even number of crossings means the point is outside. To put it another way, every time the line crosses the polygon it changes from being either inside to outside or vice-versa. Starting from the outside point our first crossing (if any) puts us inside; the next crossing (if any) puts us back outside; etc., odd-inside, even-outside. So, it looks like we will need a routine to test whether two lines intersect.

One way of testing for line intersection is to make sure both endpoints of each line are on opposite sides of the other line. (Reread and make a few pictures, as before.) This leads to our final routine. Now we will test whether, when we walk along a line from the beginning point to the opposite end and then turn to go straight to another point, we turn counterclockwise or clockwise. This discussion and the resulting routines are modeled after Sedgewick's in “Algorithms in C”, pages 349-354 (Robert Sedgewick, Addison Wesley, 1990). Let us start with the counterclockwise subroutine, ccw in Figure 5.

```sub ccw # (three points) return -1, 0, or 1
{
local(@points) = @_;
local(\$rv) = 0;
local(\$dx1, \$dx2, \$dy1, \$dy2, \$p0x, \$p0y, \$p1x, \$p1y, \$p2x, \$p2y);
(\$p0x, \$p0y) = split(",", \$points[0]);
(\$p1x, \$p1y) = split(",", \$points[1]);
(\$p2x, \$p2y) = split(",", \$points[2]);
\$dx1 = \$p1x - \$p0x;
\$dy1 = \$p1y - \$p0y;
\$dx2 = \$p2x - \$p0x;
\$dy2 = \$p2y - \$p0y;
switch:
{
\$rv =  1, last if \$dx1 * \$dy2 > \$dy1 * \$dx2;
\$rv = -1, last if \$dx1 * \$dy2 < \$dy1 * \$dx2;
\$rv = -1, last if (\$dx1 * \$dx2 < 0.0) || (\$dy1 * \$dy2 < 0.0);
\$rv =  1, last if (\$dx1 * \$dx1 + \$dy1 * \$dy1) < (\$dx2 * \$dx2 + \$dy2 * \$dy2);
}
\$rv;
}
```

Figure 5. Counterclockwise (ccw Subroutine)

The ccw routine accepts three points and compares the slope of the line from the second to the third with the slope of the line from the first to the second. It is carefully constructed to handle vertical lines and even the case of collinear points. Note how the input points are collected into a local array, @points, avoiding side effects and making the program easy to maintain and understand. The points each get split into their respective coordinates with the test being carried out in a block labeled switch (for obvious reasons). By now you have probably guessed that Perl allows labels, just like C. There is no specific switch construct in Perl, however. The block in ccw above is Perl's way of building a switch. The last keyword immediately exits the block and sometimes the Perl interpretor is smart enough to convert the block to a C switch statement internally (although not in this case).

```sub intersect # (two lines) returns 0 or 1
{
local(\$l1, \$l2) = @_;
local(\$l1_b, \$l1_e) = split(":", \$l1);
local(\$l2_b, \$l2_e) = split(":", \$l2);
&ccw(\$l1_b, \$l1_e, \$l2_b) * &ccw(\$l1_b, \$l1_e, \$l2_e) <= 0 &&
&ccw(\$l2_b, \$l2_e, \$l1_b) * &ccw(\$l2_b, \$l2_e, \$l1_e) <= 0;
}
```

Figure 6. Intersection Subroutine

Figure 6 is the intersection routine. It is very straightforward. The lines are split into endpoints and the endpoints are tested for “sideness”.

We now have all the building blocks to construct our “inside” subroutine. First we connect our test point with an “outside” point via a straight line. Then we walk around the polygon, testing each polygon side in turn, accumulating the intersections of the sides with the test line. If the number of intersections is odd the point is inside, and vice-versa.

```\$big_float = 1.0e7;
sub lower_left_index # (polygon) returns index of lower left corner
{
local(\$polygon) = @_;
local(\$index) = 0;
local(@vertices) = split(":", \$polygon);
local(\$x_min, \$y_min) = split(",", \$vertices[0]);
local(\$i, \$x, \$y);
for(\$i = 1; \$i <= \$#vertices; \$i++)
{
(\$x, \$y) = split(",", \$vertices[\$i]);
if((\$y < \$y_min) || ((\$y == \$y_min) && (\$x < \$x_min)))
{
\$x_min = \$x;
\$y_min = \$y;
\$index = \$i;
}
}
\$index;
}
sub inside # (point, polygon) returns 0 or 1
{
local(\$point, \$polygon) = @_;
local(@vertices) = split(":", \$polygon);
local(\$index) = &lower_left_index(\$polygon);
local(\$last_index) = \$index ? \$index - 1 : \$#vertices;
local(\$count, \$holding_point) = (0, 0);
local(\$i, \$lp, \$lt, \$vertex, \$x, \$y, \$big_x_point);
local(\$check_index) = \$index; # true if index is not zero
OUTER: for(;;)
{ # one pass loop
for(\$i = 0, \$vertex = \$vertices[\$#vertices]; \$i <= \$#vertices; \$i++)
{
\$lp = join(":", \$vertex, \$vertices[\$i]);
\$vertex = \$vertices[\$i];
if(&point_on_line_2(\$point, \$lp))
{
\$count = 1;
print "Point on boundary\n" if defined \$verbose;
last OUTER;
}
}
(\$x, \$y) = split(",", \$point);
\$big_x_point = join(",", \$big_float, \$y);
\$lt = join(":", \$point, \$big_x_point);
for(\$i = 0; \$i <= \$#vertices; \$i++)
{
if(&point_on_line_2(\$vertices[\$index], \$lt))
{
\$holding_point = 1;
}
else
{
if(!\$holding_point)
{
\$lp = join(":", \$vertices[\$last_index], \$vertices[\$index]);
\$count++ if &intersect(\$lp, \$lt);
}
elsif(\$holding_point &&
(&ccw(\$point, \$big_x_point, \$vertices[\$index]) !=
&ccw(\$point, \$big_x_point, \$vertices[\$last_index])))
{
\$count++;
}
\$last_index = \$index;
\$holding_point = 0;
}
\$index++;
if(\$check_index && (\$index == @vertices))
{
\$check_index = 0;
\$index = 0;
}
}
last;
} # one pass "loop"
\$count & 1;
}
```

Figure 7. lower-left-index and inside Subroutines

The pair of subroutines in Figure 7 are Perl implementations of functions suggested by Sedgewick in the section “Inclusion in a Polygon” (pages 353-355), although more complete. The lower_left_index subroutine returns the index of the polygon point with the smallest x coordinate among all points with the smallest y coordinate. This is necessary to account for the special cases when a polygon vertex is on the test line. Note how, in the third line within the block, the @vertices array is automatically constructed by splitting the \$polygon string with colons. Each element of the @vertices array is a pair of coordinates separated by a comma, one for each vertex. Whenever we need individual x, y pairs the split function is called, as we have seen before. This occurs before the for loop to initialize \$x_min and \$y_min, and inside the loop to generate a new test pair \$x, \$y. The upper limit in the loop variable \$i is \$#vertices, which is the index of the last member of @vertices. Perl automatically keeps one of these variables for each array. The last statement in this routine, \$index; simply establishes the return value.

The inside subroutine is our “workhorse” function. It is admittedly fairly complicated, but you should be able to follow the logic if you have read this far. Here is some help. The variable \$index is used to walk from vertex to vertex around the polygon starting from the index returned by lower_left_index. If this index is anything other than zero, it will need to be reset after the vertex with the largest index is encountered. The variable \$check_index keeps track of both whether this resetting will be necessary and, if so, whether it has been done yet. The variable \$last_index is the index of the last vertex that was not on the test line. Generally this is the index of the vertex “behind” \$index.

The OUTER label takes advantage of one of Perl's handiest features, the ability to exit effortlessly from nested loops. You can do this in C if you like goto. In the present example the polygon sides are created via the join function using the colon as the separator. The sides are stored in \$lp. If the test point is on a polygon edge, there is no need to test further, so the OUTER loop is exited immediately. Note that setting \$count to one in this case is equivalent to counting the boundary as inside the polygon, since one is an odd number. It would be trivial to count the boundary as outside or even as a special case by modifying the if block containing the statement: last OUTER;

If the point is not on an edge, a horizontal line from the test point to an outside point with an x coordinate equal to \$big_float is constructed and stored in \$lt. The remainder of this function tests for either line intersection or “sideness” depending on whether the previous vertex was on the test line, incrementing \$count as appropriate. The return value of the inside function is 1, if the point is inside, and 0 if the point is outside.

```
#!/usr/local/bin/perl
while(<DATA>)
{
chop;
\$polygon .= \$_ . ":";
}
chop \$polygon;
for(;;)
{
print "Enter x and y separated by a comma (q to quit): ";
chop(\$point = <STDIN>);
last if \$point =~ /^[qQ]/;
print("No comma!  Try again.\n"), redo if \$point !~ /,/;
\$point =~ s/ +//g;
print "Checking point: \$point\n";
printf "%s\n", &inside(\$point, \$polygon) ? "inside" : "outside";
}
__END__
5.0,4.0
0.0,0.0
10.0,5.0
0.0,10.0
```

Figure 8. Driver Routine for Inside Subroutine

A simple driver routine for the inside subroutine is presented in Figure 8. This routine reads its data from the end of the perl program itself. Anything following the line:

```__END__
```

is considered data and is accessed through the (automatically opened) DATA file handle. Two new operators introduced in this driver are “.” which concatenates two strings and “.=” which appends one string to another. That is, “.=” is to “.” as “+=” is to “+”. The chop function removes the last character from each element of its argument list. Note how the line:

```chop \$polygon;
```

trims the final colon from the polygon string, so that it is a legitimate polygon. Replace my data with your own if you want to run this driver, but be sure to put a comma between the x and y coordinates.

______________________