# Linux Programing Hints

Next let us consider a more difficult problem. How do we test whether a point is inside a polygon? This problem is not as simple as it may first appear, especially when you take into account the special cases—such as when the point is on the boundary of the polygon. Is that inside, outside, or do you want to count it as a separate case—on the boundary? Let's break the problem down and start by writing a Perl subroutine to test whether a point is on a line.

A not-too-efficient but easy-to-code routine works as
follows. If a point is on a line the sum of the distances from the
point to each of the line endpoints is the same as the distance
between the endpoints, i.e. the length of the line. You might want
to reread the previous sentence and make a little sketch to
convince yourself that this is indeed the case. We will first need
a routine (in Perl they are called subroutines) to find the
distance between two points. C has a built-in function, hypot, but
the one-liner in Figure 2 is the Perl equivalent. The
**sub** keyword denotes a subroutine and the
subroutine's name follows. There is no parameter list in the
definition of a subroutine. Those are comments following the
subroutine name in Figure 2. We will supply the parameter list when
we call the subroutine, as you will see shortly.

sub hypot # (x, y) returns sqrt(x * x + y * y) { sqrt $_[0] * $_[0] + $_[1] * $_[1]; }

**Figure 2. Perl Subroutine to Find
Distance Between Points**

When you call a subroutine in Perl all of the parameters are
automatically put into an array **@_** (The
**@** denotes a standard array). In this case the
array has two elements, the differences in the **x**
and **y** coordinates of the two points. Note that
the elements are referenced by address so that we need to be
careful. Any modifications to these variables would change the
values outside this subroutine. In Perl you can also create local
variables within a subroutine, or inside of any block for that
matter. Normally I would do so, but since this subroutine is so
short and is likely to be called many times, I avoided the overhead
of local variables. Note again, that a Perl function,
**sqrt**, was called without any parentheses. The
return value of a subroutine is the last thing calculated (or you
can overrride this behavior with a specific return value). Here the
Pythagorean result is the last value calculated, so no explicit
**return** is necessary. You will probably find, as
I have, that explicit **return** statements are
rarely needed.

Now we need to tell our program what points and areas are. In
C we seem to have the advantage of structures at our disposal. We
would probably set up something like the **typedef**
and declaration in Figure 3.

typedef struct { double x, y; } POINT; POINT *polygon_p;

**Figure 3. Typedef and
Declaration**

A point has **x** and **y**
coordinates and a polygon has three or more points. Every time we
need a polygon we can malloc the space for it and fill it with
points—and free the space when we are done with it. This is where
Perl shines.

Perl has neither explicit memory allocation nor structures. The good news is that we do not have to concern ourselves with memory—it is there when we need it. The bad news is that we have to come up with some system like C's structures. This turns out to be trivial. Let us put everything into strings. Perl takes care of strings of any length, relieving us of the pesky problem of counting characters. You do not even have to add one for the terminating null character—Perl does not use a terminator.

Our point will simply be a string containing two doubles joined by a comma; our line will be a string containing two points joined by a colon; and our polygon will be a string containing three or more points joined by colons, like so:

$point = "1.0,2.0"; $line = "0.0,0.0:0.0,1.0"; $polygon = "1.0,2.0:6.0,5.0:0.0,3.1";

These “structures” turn out to be easy to scan visually and
very easy to manage, thanks to Perl's join and split functions.
Figure 4 is the **point_on_line** subroutine I
developed using the above scheme.

$epsilon = 1.0e-10; sub point_on_line #(point, line) returns 0 or 1 { # if on - sum of distance to ends should be distance between ends local($point, $line) = @_; local($p_x, $p_y) = split(",", $point); local($l_b, $l_e) = split(":", $line); local($l_b_x, $l_b_y) = split(",", $l_b); local($l_e_x, $l_e_y) = split(",", $l_e); &fabs(&hypot($p_x - $l_b_x, $p_y - $l_b_y) + &hypot($p_x - $l_e_x, $p_y - $l_e_y) - &hypot($l_e_x - $l_b_x, $l_e_y - $l_b_y)) < $epsilon; } sub fabs # (x) returns absolute value of x { local($rv) = @_; $rv = -$rv if $rv < 0.0; $rv; }

**Figure 4. point-on-line
Subroutine**

This subroutine is called with two strings, i.e.

&point_on_line($point, $line);

and returns one if the point is on the line, and zero
otherwise. Note that in this case I put the calling parameters into
local arrays for safety and ease of
maintenance—**$point** and **$line**
are easier to remember next week than **$_[0]** and
**$_[1]**. The **split** function's
use should be obvious. The **point** string gets
split into two coordinate strings by the comma, and the
**line** string gets broken into two point strings
by the colon. Then the points at the beginning and end of the line,
**$l_b** and **$l_e**, get broken
down into their respective coordinates. Finally the coordinates are
passed to our **hypot** function. Note how the
**&** sign is used to prefix a subroutine for
the call. Perl has no equivalent to C's **fabs** so
I quickly rolled my own in Figure 4. I used the
**$epsilon** variable to avoid floating point
roundoff problems.

We have been building this thing from the bottom up. So far we have a routine to test whether a point is on a line. Our goal is a routine to test whether a point is within a polygon. Let me give you an overview of how we are going to solve the problem, provide you with two more Perl routines (which you should be able to read now), and then show you the external “workhorse” routine that does the final inside/on/outside determination.

Our polygon is a closed figure and every point is either inside, outside, or on the boundary (there is actually a mathematical statement called Jordan's Curve Theorem that proves this!). If we can establish a point that is guaranteed to be outside of our polygon, we can test the “insideness” of any point by connecting this test point to our “outside” point by a straight line and counting the number of times the line crosses the polygon. Clearly if the line never crosses the polygon, the point is outside; if it crosses once the point is inside; and, in general, if the line crosses the polygon's boundary an odd number of times, the point is inside. An even number of crossings means the point is outside. To put it another way, every time the line crosses the polygon it changes from being either inside to outside or vice-versa. Starting from the outside point our first crossing (if any) puts us inside; the next crossing (if any) puts us back outside; etc., odd-inside, even-outside. So, it looks like we will need a routine to test whether two lines intersect.

One way of testing for line intersection is to make sure both
endpoints of each line are on opposite sides of the other line.
(Reread and make a few pictures, as before.) This leads to our
final routine. Now we will test whether, when we walk along a line
from the beginning point to the opposite end and then turn to go
straight to another point, we turn counterclockwise or clockwise.
This discussion and the resulting routines are modeled after
Sedgewick's in “Algorithms in C”, pages 349-354 (Robert Sedgewick,
Addison Wesley, 1990). Let us start with the counterclockwise
subroutine, **ccw** in Figure 5.

sub ccw # (three points) return -1, 0, or 1 { local(@points) = @_; local($rv) = 0; local($dx1, $dx2, $dy1, $dy2, $p0x, $p0y, $p1x, $p1y, $p2x, $p2y); ($p0x, $p0y) = split(",", $points[0]); ($p1x, $p1y) = split(",", $points[1]); ($p2x, $p2y) = split(",", $points[2]); $dx1 = $p1x - $p0x; $dy1 = $p1y - $p0y; $dx2 = $p2x - $p0x; $dy2 = $p2y - $p0y; switch: { $rv = 1, last if $dx1 * $dy2 > $dy1 * $dx2; $rv = -1, last if $dx1 * $dy2 < $dy1 * $dx2; $rv = -1, last if ($dx1 * $dx2 < 0.0) || ($dy1 * $dy2 < 0.0); $rv = 1, last if ($dx1 * $dx1 + $dy1 * $dy1) < ($dx2 * $dx2 + $dy2 * $dy2); } $rv; }

**Figure 5. Counterclockwise (ccw
Subroutine)**

The **ccw** routine accepts three points and
compares the slope of the line from the second to the third with
the slope of the line from the first to the second. It is carefully
constructed to handle vertical lines and even the case of collinear
points. Note how the input points are collected into a local array,
**@points**, avoiding side effects and making the
program easy to maintain and understand. The points each get split
into their respective coordinates with the test being carried out
in a block labeled **switch** (for obvious reasons).
By now you have probably guessed that Perl allows labels, just like
C. There is no specific **switch** construct in
Perl, however. The block in **ccw** above is Perl's
way of building a switch. The **last** keyword
immediately exits the block and sometimes the Perl interpretor is
smart enough to convert the block to a C switch statement
internally (although not in this case).

sub intersect # (two lines) returns 0 or 1 { local($l1, $l2) = @_; local($l1_b, $l1_e) = split(":", $l1); local($l2_b, $l2_e) = split(":", $l2); &ccw($l1_b, $l1_e, $l2_b) * &ccw($l1_b, $l1_e, $l2_e) <= 0 && &ccw($l2_b, $l2_e, $l1_b) * &ccw($l2_b, $l2_e, $l1_e) <= 0; }

**Figure 6. Intersection
Subroutine**

Figure 6 is the intersection routine. It is very straightforward. The lines are split into endpoints and the endpoints are tested for “sideness”.

We now have all the building blocks to construct our “inside” subroutine. First we connect our test point with an “outside” point via a straight line. Then we walk around the polygon, testing each polygon side in turn, accumulating the intersections of the sides with the test line. If the number of intersections is odd the point is inside, and vice-versa.

$big_float = 1.0e7; sub lower_left_index # (polygon) returns index of lower left corner { local($polygon) = @_; local($index) = 0; local(@vertices) = split(":", $polygon); local($x_min, $y_min) = split(",", $vertices[0]); local($i, $x, $y); for($i = 1; $i <= $#vertices; $i++) { ($x, $y) = split(",", $vertices[$i]); if(($y < $y_min) || (($y == $y_min) && ($x < $x_min))) { $x_min = $x; $y_min = $y; $index = $i; } } $index; } sub inside # (point, polygon) returns 0 or 1 { local($point, $polygon) = @_; local(@vertices) = split(":", $polygon); local($index) = &lower_left_index($polygon); local($last_index) = $index ? $index - 1 : $#vertices; local($count, $holding_point) = (0, 0); local($i, $lp, $lt, $vertex, $x, $y, $big_x_point); local($check_index) = $index; # true if index is not zero OUTER: for(;;) { # one pass loop for($i = 0, $vertex = $vertices[$#vertices]; $i <= $#vertices; $i++) { $lp = join(":", $vertex, $vertices[$i]); $vertex = $vertices[$i]; if(&point_on_line_2($point, $lp)) { $count = 1; print "Point on boundary\n" if defined $verbose; last OUTER; } } ($x, $y) = split(",", $point); $big_x_point = join(",", $big_float, $y); $lt = join(":", $point, $big_x_point); for($i = 0; $i <= $#vertices; $i++) { if(&point_on_line_2($vertices[$index], $lt)) { $holding_point = 1; } else { if(!$holding_point) { $lp = join(":", $vertices[$last_index], $vertices[$index]); $count++ if &intersect($lp, $lt); } elsif($holding_point && (&ccw($point, $big_x_point, $vertices[$index]) != &ccw($point, $big_x_point, $vertices[$last_index]))) { $count++; } $last_index = $index; $holding_point = 0; } $index++; if($check_index && ($index == @vertices)) { $check_index = 0; $index = 0; } } last; } # one pass "loop" $count & 1; }

**Figure 7. lower-left-index and inside
Subroutines**

The pair of subroutines in Figure 7 are Perl implementations
of functions suggested by Sedgewick in the section “Inclusion in a
Polygon” (pages 353-355), although more complete. The
**lower_left_index** subroutine returns the index of
the polygon point with the smallest x coordinate among all points
with the smallest y coordinate. This is necessary to account for
the special cases when a polygon vertex is on the test line. Note
how, in the third line within the block, the
**@vertices** array is automatically constructed by
splitting the **$polygon** string with colons. Each
element of the **@vertices** array is a pair of
coordinates separated by a comma, one for each vertex. Whenever we
need individual x, y pairs the split function is called, as we have
seen before. This occurs before the **for** loop to
initialize **$x_min** and **$y_min**,
and inside the loop to generate a new test pair
**$x**, **$y**. The upper limit in
the loop variable **$i** is
**$#vertices**, which is the index of the last
member of **@vertices**. Perl automatically keeps
one of these variables for each array. The last statement in this
routine, **$index;** simply establishes the return
value.

The inside subroutine is our “workhorse” function. It is
admittedly fairly complicated, but you should be able to follow the
logic if you have read this far. Here is some help. The variable
**$index** is used to walk from vertex to vertex
around the polygon starting from the index returned by
**lower_left_index**. If this index is anything
other than zero, it will need to be reset after the vertex with the
largest index is encountered. The variable
**$check_index** keeps track of both whether this
resetting will be necessary and, if so, whether it has been done
yet. The variable **$last_index** is the index of
the last vertex that was not on the test line. Generally this is
the index of the vertex “behind” **$index**.

The **OUTER** label takes advantage of one of
Perl's handiest features, the ability to exit effortlessly from
nested loops. You can do this in C if you like
**goto**. In the present example the polygon sides
are created via the **join** function using the
colon as the separator. The sides are stored in
**$lp**. If the test point is on a polygon edge,
there is no need to test further, so the **OUTER**
loop is exited immediately. Note that setting
**$count** to one in this case is equivalent to
counting the boundary as inside the polygon, since one is an odd
number. It would be trivial to count the boundary as outside or
even as a special case by modifying the if block containing the
statement: **last OUTER;**

If the point is not on an edge, a horizontal line from the
test point to an outside point with an x coordinate equal to
**$big_float** is constructed and stored in
**$lt**. The remainder of this function tests for
either line intersection or “sideness” depending on whether the
previous vertex was on the test line, incrementing
**$count** as appropriate. The return value of the
inside function is 1, if the point is inside, and 0 if the point is
outside.

#!/usr/local/bin/perl while(<DATA>) { chop; $polygon .= $_ . ":"; } chop $polygon; for(;;) { print "Enter x and y separated by a comma (q to quit): "; chop($point = <STDIN>); last if $point =~ /^[qQ]/; print("No comma! Try again.\n"), redo if $point !~ /,/; $point =~ s/ +//g; print "Checking point: $point\n"; printf "%s\n", &inside($point, $polygon) ? "inside" : "outside"; } __END__ 5.0,4.0 0.0,0.0 10.0,5.0 0.0,10.0

**Figure 8. Driver Routine for Inside
Subroutine**

A simple driver routine for the inside subroutine is presented in Figure 8. This routine reads its data from the end of the perl program itself. Anything following the line:

__END__

is considered data and is accessed through the (automatically
opened) **DATA** file handle. Two new operators
introduced in this driver are “**.**” which
concatenates two strings and “**.=**” which appends
one string to another. That is, “**.=**” is to
“**.**” as “**+=**” is to
“**+**”. The **chop** function
removes the last character from each element of its argument list.
Note how the line:

chop $polygon;

trims the final colon from the polygon string, so that it is a legitimate polygon. Replace my data with your own if you want to run this driver, but be sure to put a comma between the x and y coordinates.