Quantcast
Username/Email:  Password: 

Why Not Python?, Part 4

In this final installment of the old C hacker's foray into Python, he teaches his problem-solving program how to guess.


To solve the puzzle presented and discussed in
Part 3 of this
series
, as well as similar puzzles, I have to figure
out how to code Step 2.6 in Python. (See Part 3 for an outline of all the
steps in this problem-solving exercise.) The code should be
straightforward, except for saving (Step 2.6.1) and restoring (Step 2.6.4)
values of compound data structures.

In fact, this part of the code turned out to be harder to write than I
expected. A list of my mistakes can be seen below in Appendix B, but
the important lesson is this: simply assigning a list doesn't copy it, as
Learning Python says. Instead, another reference
is created to the same list. Here's where we bump into that issue:


% python
Python 2.4 (#1, Mar 22 2005, 21:42:42)
[GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> old = [ 1, 2, 3 ]
>>> new = old
>>> old.append(4)
>>> old
[1, 2, 3, 4]
>>> new
[1, 2, 3, 4]
>>>

Here, I saved a copy of "old", but it's useless as a backup, because
modifying old causes "new" to change too. This is true because only was
ever one list, with both old and new pointing to it. One easy way to
prevent this occurrence is to copy a slice of the old list, like this:


>>> old = [ 1, 2, 3 ]
>>> new = old[:]            # copy a "slice" of "old"
>>> old.append(4)           # modify "old"
>>> old
[1, 2, 3, 4]
>>> new                     # "new" is unaffected
[1, 2, 3]
>>>

Here I modified old, but new kept its original value.

Ultimately I decided to create a few new functions within the
class itself--I think Java and C++ people call them "methods". The
functions deal with "opaque" objects--opaque to the caller, that is--and
are responsible for knowing what information has to go into a copy and what
must be restored from it. The getbackup() and restore() member
functions serve this purpose. Let's add them to cell_class.py:


61      def getpos(self): return self.XYZpos
62      def getbackup(self):
63         return [ self.XYZvalue, self.XYZset_of_possibles[:] ]
64      def restore(self,bkp):
65         self.XYZvalue = bkp[0]
66         self.XYZset_of_possibles = bkp[1][:]

Now cells[N].getbackup() returns a two-element
list. The first element contains the current value of cell N, which
may be 0 for unknown. The second element is itself a list, the
list of cell N's possible values. But because we can modify the
"XYZset_of_possibles" piece of the cell, we need a copy of the list
instead of another reference to it. Thus, we have to copy the slice, as
shown in the example above.

On the restore() side, if bkp is a value returned from
cells[N].getbackup(), restore() puts those
values back in. Until very recently, I had this coded as:


   def restore(self,bkp):
      self.value = bkp[0]
      self.set_of_possibles = bkp[1]

But this code contained a subtle problem. I explain it below in Appendix C.

In case getpos() isn't self-explanatory, let me explain it here. As with
the other accessor routines, getpos() is there so my only access to the
"XYZpos" field of a cell is through this read-only function. I use this
to keep track of what the program is doing, for debug messages. Its
use will be apparent below.

Why am I so hung up on not using field names outside of the class?
The field "pos"--actually "XYZpos", as I've changed it--is something
that shouldn't be written after it's created. I've accidentally changed
fields/variables unintentionally before, and this time I wanted to protect
myself from doing so. I'm not sure it's worth the effort, though;
it's not a complete solution, because getrow(), for example, gives
a reference to a list. And it's harder to shoot yourself in the foot
in Python than it is in C. Think, for example, of the classic mistake
if (x=1) {.

With that under our belts, let's take a look at s3.py, the complete
program. I've snuck in a few more changes, because this old C hacker
still is learning


1   #!/usr/bin/python
2   import cell_class
3   import sys
4   import string
5
6   # Constants
7   YEA = 1
8   NAY = None
9
10   # Rather than using "ord()" all over the place, do this:
11   MyAlphabet = '-123456789'
12
13   # becomes YEA if "-v" specified.
14   verbose = NAY
15
16   # We have to define this somewhere.
17   Inconsistent = "formerly 'Unknown_with_no_hope'"
18
19   cells = []
20
21   # step 1
22   def loadit(input):
23      global cells
24      for i in range(0,81): cells.insert(i,cell_class.Cell(i)) # 1.1
25
26      # Here, any cell's set_of_possibles should be full
27
28      which_cell = 0
29      for one_input_line in input.readlines():
30         for char in one_input_line:
31            if char in '\t\n ': continue
32            if char in '-.':
33               which_cell = which_cell + 1
34            elif char in MyAlphabet:
35 cells[which_cell].setvalue(string.index(MyAlphabet,char))
36               which_cell = which_cell + 1
37            if which_cell > 81:
38               raise too_much_input
39      pass   # so the debugger can break here
40      input.close()
41
42
43   def solve_simply():
44      global cells
45      something_was_set = 1
46      while something_was_set:
47         something_was_set = 0
48         unknown_cells = filter(lambda a: not a.known(), cells)
49
50         # 2.3 quit when all cells are known
51         if len(unknown_cells) == 0: break         ## Done!  All known.
52
53         for acell in unknown_cells:
54
55            # 2.1 only one member in set_of_possibles?
56            possible_digits = acell.possibles()
57            if len(possible_digits) == 1:
58               acell.setvalue(possible_digits[0])
59               something_was_set = 1
60               continue
61
62            # 2.4 unknown but no possibilities?
63            if len(possible_digits) == 0:
64               raise Inconsistent
65
66            # 2.2 any digit in only one cell in a row, column or submatrix?
67            global dig
68            for dig in possible_digits:
69               whohas = filter(lambda a: a.could_be(dig), acell.getrow())
70               if len(whohas) == 1: break # unique in row
71               whohas = filter(lambda a: a.could_be(dig), acell.getcolumn())
72               if len(whohas) == 1: break # unique in col
73             whohas = filter(lambda a: a.could_be(dig), acell.getsubmatrix())
74               if len(whohas) == 1: break # unique in sub
75            else: # "acell" has no unique digit (no "break" hit)
76               continue
77
78            # Unique!  Set that value
79            acell.setvalue(dig)
80            something_was_set = 1
81            pass # end of "for acell in unknown_cells"
82         pass # end of "while something_was_set"
83
84      # print len(unknown_cells), "unknown cells"
85      return unknown_cells
86
87
88   # Code for step 3
89   def dumpit():
90      for bor in range(0,81,9):
91         for i in range(bor,bor+9):
92            print MyAlphabet[cells[i].getvalue()], # '-' if unknown
93         print                                     # end of this row
94      pass # end of dumpit()
95
96
97   # return YEA if solved.  Stop at first solution.
98   def solve_completely(level):
99      try:
100         unknown_cells = solve_simply()
101      except Inconsistent:                 # part of 2.6.4
102         return NAY
103
104      if len(unknown_cells) == 0: return YEA  # declare victory
105
106      # 2.6, create a guess
107      if verbose:
108         print "level =", level, ";", len(unknown_cells), "cells unknown"
109         dumpit()
110
111      # 2.6.1: Make a copy: a list of objects returned from 
		cell_class.getbackup()
112      mycopy = []
113      for acell in cells:
114         mycopy.append(acell.getbackup())
115
116   #  unknown_cells.sort(
117   #     lambda c1, c2: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]            # ... shortest set_of_possibles
119      myset = mycell.possibles()
120      if verbose: print "L=", level, "cell", mycell.getpos(), myset
121      for myval in myset:
122      if verbose: print "L=", level, "; cell[", mycell.getpos(), "] =", myval
123         mycell.setvalue(myval)            # 2.6.2
124         if solve_completely(level + 1):   # 2.6.3: call myself
125            return YEA                     # 2.6.5: It worked; we're done!
126
127         # 2.6.4: It didn't work.  Restore my backup copy.
128         for i in range(81):
129            cells[i].restore(mycopy[i])
130
131         pass # Now try the next value in myset
132      return NAY                  # None of the values worked; tell caller
133
134
135   def doit(argv):
136      global verbose              # 'cause we're going to modify it
137      if len(argv) > 1 and argv[1] and argv[1] == '-v':
138         verbose = YEA
139         del argv[1]
140
141      if len(argv) > 1 and argv[1]:
142         input = open(argv[1], 'r')
143      else:
144         input = sys.stdin
145
146      loadit(input)                # step 1
147      solve_completely(0)          # step 2
148      dumpit()                     # step 3
149
150
151   # main begins here
152   if __name__ == '__main__': doit(sys.argv)

In lines 7-8 I create YEA and NAY, because Python, like C, doesn't have
intuitive names for "yes" and "no". Python does have "1" and "None", but
their meanings aren't as obvious to me as YEA and NAY are. I suppose I
could have used TRUE and FALSE, but I've run into header file problems in
C with them, so I got out of the habit.

A reader of Part 2 pointed out that I was making a lot of calls to
ord(). This version uses the string MyAlphabet; see lines 11, 34-35, 90.
The change in line 90 will help if we want to enhance the program to
deal with 16x16 Sudoku puzzles, in which they use digits and letters, not
only digits.

In lines 34-35, I want to do the equivalent of
MyAlphabet.index(char), which would work if MyAlphabet were a list.
But I have to use string.index() instead, so I need to import Python's
built-in string package, which is done in line 4. I prefer using a
string here, because Python strings are "immutable". That is, I can't change the
contents in place, as I could with a list.

I added a "verbose" flag (line 14) that can be, you guessed it, YEA
or NAY. If YEA, the program gives a little extra output so you can see
what's going on.

In Step 2.4, I want to raise the "Inconsistent" signal and catch it
in certain cases. But to do that, Python requires me to define it,
which I do in line 17.

Now we come to the major restructuring. Where s2.py had one large
function, doit(), that contained Steps 1, 2 and 3, s3.py breaks this up
into functions. The functions are loadit() for Step 1, solve_simply()
for Steps 2.1-2.5 and dumpit() for Step 3.

Why did I choose to do this breakup, which incidentally wasn't hard to do?
Because we might have to make multiple guesses, and each guess could cause
a recursive call into that same routine in order to (possibly) make another
guess. I explain further below.

The code for loadit() is basically the same as before, except for the
use of string.index, mentioned above. Also, we now read the input lines
using more typical Python idioms.

The code for Steps 2.1-2.5 now is contained in solve_simply(), lines 43-85.
It's basically unchanged, except that I renamed the signal we raise
when we discover that the puzzle is inconsistent, usually because
of an incorrect guess. Also, solve_simply() returns unknown_cells, line
85. That is, it returns a list of Cell instances whose values are not
yet known.

Step 3 now is implemented by dumpit() in lines 89-94.

The next routine, solve_completely(), basically embodies all the parts
of Step 2.6, It takes a single parameter, "level", which basically says
how many guesses we've had to make on this path through the puzzle. The
first thing it does is call solve_simply(). If solve_simply()
runs into an inconsistency, it raises the Inconsistent signal.
This causes execution to resume at line 102, and solve_completely()
returns right there.

Typically, however, execution proceeds to line 104. For many
puzzles, solve_simply() returns an empty list, so solve_completely()
returns YEA. In other words, except for very hard puzzles, s3.py
acts exactly like s2.py.

But for those harder puzzles, we get to line 104 only to discover that
there are 54 unknown cells. This is where the fun starts. In lines
111-114, we execute Step 2.6.1. That means we we create a backup copy
of each cell's current value and set_of_possibles, using getbackup(),
as described above.

In line 118, we choose a cell from unknown_cells[]. We're going to make
a guess at the correct value to assign to "mycell".

Before that, lines 116-117 sorted unknown_cells[], based on the length of
each cell's set of possible values. This seemed like a good idea at the
time I wrote it, but only because it assumes that sorting is free. Doing
the sort makes the program take nearly three times as long to solve the
hardest-known puzzle. This is true even if I change line 117 to use
".XYZset_of_possibles" instead of ".possibles()".

The loop in lines 121-131 embodies the core of the guessing algorithm,
Steps 2.6.2 through 2.6.5. Basically it plows through each value in
myset[] (line 121), assigning it to "mycell" (123) and calling itself
(124) with an incremented value for "level". When that happens, the new
instance of solve_completely() calls solve_simply at line 100. If an
inconsistency is found, line 102 is hit. We resume execution in the old
instance at line 127, where we restore the backup copy in lines 128-129.
We then try the next value in myset[] by running the next pass of the
loop, starting at line 122.

If no inconsistency is found, however, the new instance proceeds as the
old one did. It chooses a cell from unknown_cells[], saves a copy of
each cell's value and set_of_possibles, tries each value in the set
by calling itself and so on.

I should mention here, in case it's not obvious, that each time
solve_completely() calls itself, it gets a new set of local
variables--unknown_cells, mycell, myset and so on. These local variables do
not interact across calls. Each instance or each level of recursion is
independent of the others.

As with all recursive routines, we need to be careful of a couple of
things. First, we need to make sure the "base cases" are covered, which
are those cases where you don't want the routine to call itself at all.
On this score, we're okay because, as mentioned above, for the simple case, s3.py behaves
exactly like s2.py did. It calls solve_simply(), and if that solves the
puzzle, the puzzle is solved.

The second thing to do in recursion is to make sure that you really do get
to a base case. In other words, make sure the recursion terminates so you
don't have an infinite descent. In this aspect too, solve_completely() is safe,
because on each pass, we assign a value to a formerly unknown cell (Step 2.6.2,
line 123) before calling solve_completely() again. Thus, we are guaranteed that
solve_simply (at line 100) returns a strictly shorter list than it did before.
So those aspects of the recursion look reasonable.

If, at any level, we try all the values in myset and never get a
solution, when we get to line 132, it returns NAY. On a predecessor
level, this should result in another value being tried. So the program
eventually should get to a solution, provided that a solution exists--which
it may not, if you mis-type the puzzle.

The doit function begins at line 135, where you can see how this all of
fits together. The new doit() function:

  • decides if we're going to run with verbose YEA or
    NAY (137-139)
  • figures out where the puzzle input is coming from
    (141-144)
  • pulls the puzzle into the cells[] array
    (146)
  • calls the complete solver and prints the result
    (147-148)

So, does it work?


   % cat hardest-known
    - - -  - 7 -  9 4 -
    - 7 -  - 9 -  - - 5
    3 - -  - - 5  - 7 -
    - 8 7  4 - -  1 - -
    4 6 3  - - -  - - -
    - - -  - - 7  - 8 -
    8 - -  7 - -  - - -
    7 - -  - - -  - 2 8
    - 5 -  2 6 8  - - -
   % ./s3.py hardest-known
   2 1 5 8 7 6 9 4 3
   6 7 8 3 9 4 2 1 5
   3 4 9 1 2 5 8 7 6
   5 8 7 4 3 2 1 6 9
   4 6 3 9 8 1 7 5 2
   1 9 2 6 5 7 3 8 4
   8 2 6 7 4 3 5 9 1
   7 3 4 5 1 9 6 2 8
   9 5 1 2 6 8 4 3 7
   %

Sure enough, it does.

Now this program is quite short and does only a small part of the
Sudoku problem. That is, it only solves puzzles; it doesn't create them.
It assumes that a puzzle has only one solution; it doesn't try to verify
that by looking for multiple solutions. And it takes a "dumb" approach
to the puzzle; it starts with any old cell and assigns any old value.
But it does accomplish what we set out to do--solve any
Sudoku puzzle under the sun.

Although the program isn't long--less than 200 lines, not
counting blank and all-comment lines--we used a number of Python's
features, included:

  • defining and using a class
  • elements in a class--data structures and member
    functions or methods
  • the __init__ method
  • some pitfalls of copying lists
  • importing modules
  • operations on data structures: strings,
    lists
  • loops: "for" and "while"
  • using "global"
  • some functional programming features: filter and
    lambda
  • opening, reading and closing a file
  • raising and catching an exception
  • elementary argument (sys.argv) handling
  • recursion
  • elementary debugger operations

This experience has been a lot of fun. Thanks for joining me on this
ride.
Appendix B: Debugging Step 2.6
The code here is rather long, and some of my debugging required a lot
of printouts. I'm not going to give you a complete code listing so
you can repeat the exact same mistakes on your Linux box. Instead I'll
give you a summary. My most recent mistake, or "bug" if you will,
is detailed in Appendix C. Also, this part is based on an older version,
before cell_class.py was separated from s3.py. So the line numbers
may not make much sense.

a. When I tried to run my original version, I got


    % python s3_orig.py 1109.puz
      File "s3_orig.py", line 141
        def save_a_copy()
                        ^
    SyntaxError: invalid syntax
    %

I forgot the : at the end. I had the same problem a few lines later.
D'oh!

b. Next, I hit this:


    % python s3_orig.py 1109.puz
    Traceback (most recent call last):
      File "s3_orig.py", line 197, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3_orig.py", line 193, in doit
        solve_completely(0)          # step 2
      File "s3_orig.py", line 159, in solve_completely
        except Inconsistent:                 # part of 2.6.4
    NameError: global name 'Inconsistent' is not defined
    %

I should have remembered this from my work on the Coconuts problem. I added
this earlier in the program:


    # We have to define this somewhere.
    Inconsistent = "formerly 'Unknown_with_no_hope'"

c. With that fixed, I got


    % python s3.py 1109.puz
    Traceback (most recent call last):
      File "s3.py", line 199, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 195, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 160, in solve_completely
        unknown_cells = solve_simply()
    TypeError: solve_simply() takes exactly 1 argument (0 given)
    %

That's dumb! At some point I thought solve_simply was going to take
a "lev" parameter, but there's really no point. So, I fixed it like
this:


    % diff s3_orig.py s3.py
    58a59,60
    > # We have to define this somewhere.
    > Inconsistent = "formerly 'Unknown_with_no_hope'"
    86c88
    < def solve_simply(lev):
    ---
    > def solve_simply():
    %

d. Then, I got a name problem with cells:


    % python s3.py 1109.puz
    Traceback (most recent call last):
      File "s3.py", line 199, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 195, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 160, in solve_completely
        unknown_cells = solve_simply()
      File "s3.py", line 92, in solve_simply
        unknown_cells = filter(lambda a: not a.known(), cells)
    NameError: global name 'cells' is not defined
    %

What caused that? Formerly, "cells" was defined in the doit()
function. When I split doit() into three parts, I forgot to
declare cells in the global space and to say "global cells" in
every function that modifies it.

e. With that fix, it worked for the simple case.


    % python s3.py 1109.puz
    1 5 2 3 4 8 7 6 9
    8 9 4 5 6 7 3 2 1
    3 6 7 2 1 9 5 8 4
    6 3 1 7 9 4 2 5 8
    4 8 5 6 3 2 1 9 7
    7 2 9 1 8 5 6 4 3
    5 7 8 4 2 1 9 3 6
    2 4 6 9 7 3 8 1 5
    9 1 3 8 5 6 4 7 2
    %

f. Now what happens when we get to a harder puzzle?


    % python s3.py hardest-known
    Traceback (most recent call last):
      File "s3.py", line 203, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 199, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 180, in solve_completely
        mycell.setvalue(myval)                    # 2.6.2
      File "s3.py", line 35, in setvalue
        if val not in self.set_of_possibles: raise setting_impossible_value
    NameError: global name 'setting_impossible_value' is not defined
    %

That didn't work so well. I tried running verbose, which didn't help
too much. So I added one more output:


        myset = mycell.set_of_possibles[:]
    +   if verbose: print "L=", level, "cell", mycell.pos, myset
        for myval in myset:
           if verbose: print "L=", level, "; cell[", mycell.pos, "] =", myval
           mycell.setvalue(myval)                    # 2.6.2

That didn't help either, so I had to run the debugger. First, I commented out
"doit". Then, in order to catch the signal, I put the "raise" statements on
their own lines, like this:


32      # setvalue is used for 1.2.2
33      def setvalue(self, val):
34         # a couple of sanity checks
35         if val not in range(1,9+1):
36            raise val_must_be_between_1_and_9
37         if val not in self.set_of_possibles:
38            raise setting_impossible_value       ### BREAKPOINT HERE ###
39         if self.known():
40            raise setvalue_called_but_already_known

Let's see how it goes:


    % python
    Python 2.4 (#1, Mar 22 2005, 21:42:42)
    [GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import s3
    >>> import pdb
    >>> pdb.run('s3.doit(["whatever", "-v", "hardest-known"])')
    > <string>(1)?()
    (Pdb) b s3.doit
    Breakpoint 1 at /home/collin/sudoku-article/s3.py:194
    (Pdb) n
    > /home/collin/sudoku-article/s3.py(196)doit()
    -> if len(argv) > 1 and argv[1] and argv[1] == '-v':
    (Pdb) b 38
    Breakpoint 2 at /home/collin/sudoku-article/s3.py:38
    (Pdb) 

Much debugging output later, I got this:


    L= 6 cell 24 [2, 6]
    L= 6 ; cell[ 24 ] = 2
    L= 6 ; cell[ 24 ] = 6

This is the first time in execution that we are trying to set
the same cell to a second value.


    > /home/collin/sudoku-article/s3.py(38)setvalue()
    -> raise setting_impossible_value
    (Pdb) p self
    <s3.Cell instance at 0x4049398c>
    (Pdb) p self.pos
    24
    (Pdb) p self.set_of_possibles
    []

Wait a minute, in that new line I added for the verbose
output, it printed "myset" as containing [2, 6]. Why is
it now saying that cell 24 can't be that?


    (Pdb) where
      /usr/lib/python2.4/bdb.py(366)run()
    -> exec cmd in globals, locals
      <string>(1)?()
      /home/collin/sudoku-article/s3.py(206)doit()
    -> solve_completely(0)          # step 2
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(187)solve_completely()
    -> mycell.setvalue(myval)                    # 2.6.2
    > /home/collin/sudoku-article/s3.py(38)setvalue()
    -> raise setting_impossible_value
    (Pdb) up
    > /home/collin/sudoku-article/s3.py(187)solve_completely()
    -> mycell.setvalue(myval)                    # 2.6.2
    (Pdb) l
    182        mycell = unknown_cells[0]         # ... shortest set_of_possibles
    183        myset = mycell.set_of_possibles[:]
    184        if verbose: print "L=", level, "cell", mycell.pos, myset
    185        for myval in myset:
    186        if verbose: print "L=", level, "; cell[", mycell.pos, "] =", myval
    187  ->       mycell.setvalue(myval)                    # 2.6.2
    188           if solve_completely(level + 1):           # 2.6.3: call myself 
    189              return YEA                             # 2.6.5: Done.
    190           # 2.6.4: It didn't work.  Restore and try the next value
    191           restore_the_copy(mycopy)
    192        return NAY                        # Propagate "didn't work" back
    (Pdb) p mycopy[24]
    <s3.Cell instance at 0x4049398c>
    (Pdb) p mycopy[24].pos
    24
    (Pdb) p mycopy[24].set_of_possibles
    []
    (Pdb) p cells[24].set_of_possibles
    []
    (Pdb)

My copying code is wrong. How wrong is it?


    (Pdb) p cells[24].value
    2
    (Pdb) p mycopy[24].value
    2
    (Pdb) p cells[24]
    <s3.Cell instance at 0x4049398c>
    (Pdb) p mycopy[24]
    <s3.Cell instance at 0x4049398c>

It's totally and completely wrong! See how cells[24]
and mycopy[24] refer to the exact same address?
This
tutorial
suggests that maybe a shallow copy could work:


    (Pdb) import copy
    (Pdb) foo = copy.copy(cells[1])
    (Pdb) foo.set_of_possibles = foo.set_of_possibles[:]
    (Pdb) print cells[1]
    <s3.Cell instance at 0x404934ac>
    (Pdb) p foo
    <s3.Cell instance at 0x4049cc0c>
    (Pdb) p foo.set_of_possibles
    [1, 2]
    (Pdb) p cells[1].set_of_possibles
    [1, 2]
    (Pdb) cells[1].setvalue(1)
    (Pdb) p cells[1].set_of_possibles
    []
    (Pdb) p foo.set_of_possibles
    [1, 2]
    (Pdb)

Yes, that should work fine. So try this:


    # for 2.6.1: maybe overkill since we need only value and
    # set_of_possibles
    def save_a_copy():
       the_copy = []
       for acell in cells: # have to copy the set of possibles, not just a ref.
          acopy = copy.copy(acell)
          acopy.set_of_possibles = acell.set_of_possibles[:]
          the_copy.append(acopy)
       return the_copy

Fools rush in where wise men fear to tread; uncomment the last
line and let 'er rip:


% python s3.py hardest-known
    2 1 5 8 7 6 9 4 3
    6 7 8 3 9 4 2 1 5
    3 4 9 1 2 5 8 7 6
    5 8 7 4 3 2 1 6 9
    4 6 3 9 8 1 7 5 2
    1 9 2 6 5 7 3 8 4
    8 2 6 7 4 3 5 9 1
    7 3 4 5 1 9 6 2 8
    9 5 1 2 6 8 4 3 7
    %

Although that worked, it's not so nice, because both
save_a_copy() and restore_the_copy() refer to fields in the
class. It's violating the object-oriented principle of
information hiding. Instead, let's hide the information
within the class and treat the backup copy as an opaque
object. We only need the value and the set_of_possibles,
so that's what we'll put into each opaque object. Add
these lines to s3cell_class.py:


       def getbackup(self):
          return [ self.value, self.set_of_possibles[:] ]
       def restore(self,bkp):
          self.value = bkp[0]
          self.set_of_possibles = bkp[1]   # See Appendix C on this.

Now, code save_a_copy and restore_the_copy to use the above routines:


    # for 2.6.1: cell.getbackup() returns an opaque (to us) object for
    # later use
    def save_a_copy():
       the_copy = []
       for acell in cells:
          the_copy.append(acell.getbackup())
       return the_copy

    # for 2.6.4: restore, using the object given by cell.getbackup()
    def restore_the_copy(acopy):
       for i in range(81):
          cells[i].restore(acopy[i])
       pass # end of restore_the_copy

Appendix C: Why cell_class.restore() Must Restore a Slice
We were discussing these methods:


62      def getbackup(self):
63         return [ self.XYZvalue, self.XYZset_of_possibles[:] ]
64      def restore(self,bkp):
65         self.XYZvalue = bkp[0]
66         self.XYZset_of_possibles = bkp[1][:]

I mentioned that restore originally was coded as


   def restore(self,bkp):
      self.value = bkp[0]
      self.set_of_possibles = bkp[1]

which caused some subtle problems. What kind of problems might this
cause? My original thinking was this: for getbackup(), the "original"
list--the one in cells[]--is about to change, hence we need to make
a copy of the list. But we don't need to do that on restore(),
because after all, what's going to happen to the backup copy?
We're simply going to restore from it again, aren't we?

That thinking has a fatal flaw. Recall that the backup copy was saved
only once (lines 112-114). The copy, and in particular the copy of
the set_of_possibles, is completely separate from the data structures in
cells[]. But what happens if the loop (lines 121-131) is executed
three or four times? Suppose, in other words, that at line 121,
myset contains [1,2,4]. We execute lines 123-124 with myval==1. Assuming
that we didn't find a solution, we execute the loop in
128-129. When restore() was coded the old way, the set_of_possibles
(XYZset_of_possibles) was set to reference the same list that's
in mycopy. We don't have a problem yet, because what's in cells[] does
match what was there before.

Now we execute 123-124 with myval==2. During the course of that
execution, the set_of_possibles[] in various cells is modified.
Because the old version of restore() copied references, at
this point we were modifying the lists stored in mycopy[]. Let's assume
that line 124 returns NAY a second time, and we execute lines 128-129.
The restore() call restores the old value that any cell had, but
it does absolutely nothing with a cell's XYZset_of_possibles, because
the cell's XYZset_of_possibles is a second reference to the
same list that mycopy[] has!

So, when we execute lines 123-124 with myval==4, each cell's
XYZset_of_possibles is set to the exact same set that it was after
the failed attempt that occurred when myval was 2. This results in spurious
executions of line 64--Inconsistent is raised--and the puzzle is not solved.

What hid this problem was the sort operation in s3.py:


116   #  unknown_cells.sort(
117   #     lambda c1, c2: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]             # ... shortest set_of_possibles
119      myset = mycell.possibles()
120      if verbose: print "L=", level, "cell", mycell.getpos(), myset
121      for myval in myset:
122      if verbose: print "L=", level, "; cell[", mycell.getpos(), "] =", myval

When lines 116-117 were uncommented, line 118 would set mycell
to a cell with a short set_of_possibles. In the case of the
hardest-known puzzle--the only puzzle I know of that actually requires
guessing--this means len(myset)==2 always. Therefore, the loop
(lines 122-131) is executed twice at the most. Because the loop never
is executed a third time in the same instance of solve_completely(),
we don't run into the case where a cell's XYZset_of_possibles[] loses
some values erroneously. Thus, the defect in restore() remained hidden.

So how did I come to find out about the defect? Well, the original
program executed the sort operation, and the program took some 25
seconds on my old Toshiba 460CDX laptop (Pentium MMX, bogomips 163).
I tried commenting out lines 116-117 and found that the hardest-known
puzzle was solved in about eight seconds. So I wondered what would
happen if I ran the recursion with the longest list first. I uncommented
lines 116-117 and changed the sort to arrange unknown_cells in descending
order, like this:


116      unknown_cells.sort(
117         lambda c2, c1: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]

Notice that c1 and c2 are reversed in line 117. Then, line 118 gets
a cell with the longest set of possibles--six possible values,
actually. That's how I discovered the defect with the original version of
restore().

Collin Park works for Network Appliance, where he uses Linux on his
desktop and laptop computers. He does data recovery and other
Linux-related stuff at home, where he lives with his wife and their two
teenage daughters. All use Linux to meet their computing needs.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article!

OldAl's picture

Collin Park deserves commendation for this great article. I sure hope it will be published. When I read part1 and 2 of this article, I started buying the Linux Journal. (At my ripe age I like to read in bed, besides I spend too much time at the monitor already).

It is a courageous account of a C haker exploring the intricacies of Python. And an instructive account. I sure want to see it printed!

From down-under Australia,

With sincere best wishes to Collin, his better half and their two brighter-than-your daughters (but I would have to compare them to our granddaughters...)

OldAl.

I hope this doesn't get

Anonymous's picture

I hope this doesn't get printed.

Pure code

Mikael S. H.'s picture

Is it possible to get a pure code copy of you program. It's not as straghtforwards as I thought to remove the numbering when copy-pasting it directly from here.

unnumbered code

collin's picture

OK, I've put a copy here with a pointer back to part4 of the article.

Yes and No constants.

Coachher's picture

Python does actually have yes and no constants, they're True and False. None is closer to null rather than no.

Yes/No, YEA/NAY, True/False

collin's picture

I have rather older pythons. In Part 2 I mentioned using 1.5.x, which is the Python version on my Toshiba 460CDX laptop that I bought used in Japan in 1998 -- I wrote most of these articles on that old reliable laptop. But even 2.2 doesn't grok "False":

Python 2.2 (#1, Mar 26 2002, 15:46:04)
[GCC 2.95.3 20010315 (SuSE)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo(urk):
... if urk==17: return True
... return False
...
>>> print foo(14)
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in foo
NameError: global name 'False' is not defined
>>>

Almost there ...

ucntcme's picture

That is because they were introduced in 2.2.1, not 2.2.0. ;)

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options