Why Not Python?, Part 4

In this final installment of the old C hacker's foray into Python, he teaches his problem-solving program how to guess.

To solve the puzzle presented and discussed in Part 3 of this series, as well as similar puzzles, I have to figure out how to code Step 2.6 in Python. (See Part 3 for an outline of all the steps in this problem-solving exercise.) The code should be straightforward, except for saving (Step 2.6.1) and restoring (Step 2.6.4) values of compound data structures.

In fact, this part of the code turned out to be harder to write than I expected. A list of my mistakes can be seen below in Appendix B, but the important lesson is this: simply assigning a list doesn't copy it, as Learning Python says. Instead, another reference is created to the same list. Here's where we bump into that issue:


% python
Python 2.4 (#1, Mar 22 2005, 21:42:42)
[GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> old = [ 1, 2, 3 ]
>>> new = old
>>> old.append(4)
>>> old
[1, 2, 3, 4]
>>> new
[1, 2, 3, 4]
>>>

Here, I saved a copy of "old", but it's useless as a backup, because modifying old causes "new" to change too. This is true because only was ever one list, with both old and new pointing to it. One easy way to prevent this occurrence is to copy a slice of the old list, like this:


>>> old = [ 1, 2, 3 ]
>>> new = old[:]            # copy a "slice" of "old"
>>> old.append(4)           # modify "old"
>>> old
[1, 2, 3, 4]
>>> new                     # "new" is unaffected
[1, 2, 3]
>>>

Here I modified old, but new kept its original value.

Ultimately I decided to create a few new functions within the class itself--I think Java and C++ people call them "methods". The functions deal with "opaque" objects--opaque to the caller, that is--and are responsible for knowing what information has to go into a copy and what must be restored from it. The getbackup() and restore() member functions serve this purpose. Let's add them to cell_class.py:


61      def getpos(self): return self.XYZpos
62      def getbackup(self):
63         return [ self.XYZvalue, self.XYZset_of_possibles[:] ]
64      def restore(self,bkp):
65         self.XYZvalue = bkp[0]
66         self.XYZset_of_possibles = bkp[1][:]

Now cells[N].getbackup() returns a two-element list. The first element contains the current value of cell N, which may be 0 for unknown. The second element is itself a list, the list of cell N's possible values. But because we can modify the "XYZset_of_possibles" piece of the cell, we need a copy of the list instead of another reference to it. Thus, we have to copy the slice, as shown in the example above.

On the restore() side, if bkp is a value returned from cells[N].getbackup(), restore() puts those values back in. Until very recently, I had this coded as:


   def restore(self,bkp):
      self.value = bkp[0]
      self.set_of_possibles = bkp[1]

But this code contained a subtle problem. I explain it below in Appendix C.

In case getpos() isn't self-explanatory, let me explain it here. As with the other accessor routines, getpos() is there so my only access to the "XYZpos" field of a cell is through this read-only function. I use this to keep track of what the program is doing, for debug messages. Its use will be apparent below.

Why am I so hung up on not using field names outside of the class? The field "pos"--actually "XYZpos", as I've changed it--is something that shouldn't be written after it's created. I've accidentally changed fields/variables unintentionally before, and this time I wanted to protect myself from doing so. I'm not sure it's worth the effort, though; it's not a complete solution, because getrow(), for example, gives a reference to a list. And it's harder to shoot yourself in the foot in Python than it is in C. Think, for example, of the classic mistake if (x=1) {.

With that under our belts, let's take a look at s3.py, the complete program. I've snuck in a few more changes, because this old C hacker still is learning


1   #!/usr/bin/python
2   import cell_class
3   import sys
4   import string
5
6   # Constants
7   YEA = 1
8   NAY = None
9
10   # Rather than using "ord()" all over the place, do this:
11   MyAlphabet = '-123456789'
12
13   # becomes YEA if "-v" specified.
14   verbose = NAY
15
16   # We have to define this somewhere.
17   Inconsistent = "formerly 'Unknown_with_no_hope'"
18
19   cells = []
20
21   # step 1
22   def loadit(input):
23      global cells
24      for i in range(0,81): cells.insert(i,cell_class.Cell(i)) # 1.1
25
26      # Here, any cell's set_of_possibles should be full
27
28      which_cell = 0
29      for one_input_line in input.readlines():
30         for char in one_input_line:
31            if char in '\t\n ': continue
32            if char in '-.':
33               which_cell = which_cell + 1
34            elif char in MyAlphabet:
35 cells[which_cell].setvalue(string.index(MyAlphabet,char))
36               which_cell = which_cell + 1
37            if which_cell > 81:
38               raise too_much_input
39      pass   # so the debugger can break here
40      input.close()
41
42
43   def solve_simply():
44      global cells
45      something_was_set = 1
46      while something_was_set:
47         something_was_set = 0
48         unknown_cells = filter(lambda a: not a.known(), cells)
49
50         # 2.3 quit when all cells are known
51         if len(unknown_cells) == 0: break         ## Done!  All known.
52
53         for acell in unknown_cells:
54
55            # 2.1 only one member in set_of_possibles?
56            possible_digits = acell.possibles()
57            if len(possible_digits) == 1:
58               acell.setvalue(possible_digits[0])
59               something_was_set = 1
60               continue
61
62            # 2.4 unknown but no possibilities?
63            if len(possible_digits) == 0:
64               raise Inconsistent
65
66            # 2.2 any digit in only one cell in a row, column or submatrix?
67            global dig
68            for dig in possible_digits:
69               whohas = filter(lambda a: a.could_be(dig), acell.getrow())
70               if len(whohas) == 1: break # unique in row
71               whohas = filter(lambda a: a.could_be(dig), acell.getcolumn())
72               if len(whohas) == 1: break # unique in col
73             whohas = filter(lambda a: a.could_be(dig), acell.getsubmatrix())
74               if len(whohas) == 1: break # unique in sub
75            else: # "acell" has no unique digit (no "break" hit)
76               continue
77
78            # Unique!  Set that value
79            acell.setvalue(dig)
80            something_was_set = 1
81            pass # end of "for acell in unknown_cells"
82         pass # end of "while something_was_set"
83
84      # print len(unknown_cells), "unknown cells"
85      return unknown_cells
86
87
88   # Code for step 3
89   def dumpit():
90      for bor in range(0,81,9):
91         for i in range(bor,bor+9):
92            print MyAlphabet[cells[i].getvalue()], # '-' if unknown
93         print                                     # end of this row
94      pass # end of dumpit()
95
96
97   # return YEA if solved.  Stop at first solution.
98   def solve_completely(level):
99      try:
100         unknown_cells = solve_simply()
101      except Inconsistent:                 # part of 2.6.4
102         return NAY
103
104      if len(unknown_cells) == 0: return YEA  # declare victory
105
106      # 2.6, create a guess
107      if verbose:
108         print "level =", level, ";", len(unknown_cells), "cells unknown"
109         dumpit()
110
111      # 2.6.1: Make a copy: a list of objects returned from 
		cell_class.getbackup()
112      mycopy = []
113      for acell in cells:
114         mycopy.append(acell.getbackup())
115
116   #  unknown_cells.sort(
117   #     lambda c1, c2: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]            # ... shortest set_of_possibles
119      myset = mycell.possibles()
120      if verbose: print "L=", level, "cell", mycell.getpos(), myset
121      for myval in myset:
122      if verbose: print "L=", level, "; cell[", mycell.getpos(), "] =", myval
123         mycell.setvalue(myval)            # 2.6.2
124         if solve_completely(level + 1):   # 2.6.3: call myself
125            return YEA                     # 2.6.5: It worked; we're done!
126
127         # 2.6.4: It didn't work.  Restore my backup copy.
128         for i in range(81):
129            cells[i].restore(mycopy[i])
130
131         pass # Now try the next value in myset
132      return NAY                  # None of the values worked; tell caller
133
134
135   def doit(argv):
136      global verbose              # 'cause we're going to modify it
137      if len(argv) > 1 and argv[1] and argv[1] == '-v':
138         verbose = YEA
139         del argv[1]
140
141      if len(argv) > 1 and argv[1]:
142         input = open(argv[1], 'r')
143      else:
144         input = sys.stdin
145
146      loadit(input)                # step 1
147      solve_completely(0)          # step 2
148      dumpit()                     # step 3
149
150
151   # main begins here
152   if __name__ == '__main__': doit(sys.argv)

In lines 7-8 I create YEA and NAY, because Python, like C, doesn't have intuitive names for "yes" and "no". Python does have "1" and "None", but their meanings aren't as obvious to me as YEA and NAY are. I suppose I could have used TRUE and FALSE, but I've run into header file problems in C with them, so I got out of the habit.

A reader of Part 2 pointed out that I was making a lot of calls to ord(). This version uses the string MyAlphabet; see lines 11, 34-35, 90. The change in line 90 will help if we want to enhance the program to deal with 16x16 Sudoku puzzles, in which they use digits and letters, not only digits.

In lines 34-35, I want to do the equivalent of MyAlphabet.index(char), which would work if MyAlphabet were a list. But I have to use string.index() instead, so I need to import Python's built-in string package, which is done in line 4. I prefer using a string here, because Python strings are "immutable". That is, I can't change the contents in place, as I could with a list.

I added a "verbose" flag (line 14) that can be, you guessed it, YEA or NAY. If YEA, the program gives a little extra output so you can see what's going on.

In Step 2.4, I want to raise the "Inconsistent" signal and catch it in certain cases. But to do that, Python requires me to define it, which I do in line 17.

Now we come to the major restructuring. Where s2.py had one large function, doit(), that contained Steps 1, 2 and 3, s3.py breaks this up into functions. The functions are loadit() for Step 1, solve_simply() for Steps 2.1-2.5 and dumpit() for Step 3.

Why did I choose to do this breakup, which incidentally wasn't hard to do? Because we might have to make multiple guesses, and each guess could cause a recursive call into that same routine in order to (possibly) make another guess. I explain further below.

The code for loadit() is basically the same as before, except for the use of string.index, mentioned above. Also, we now read the input lines using more typical Python idioms.

The code for Steps 2.1-2.5 now is contained in solve_simply(), lines 43-85. It's basically unchanged, except that I renamed the signal we raise when we discover that the puzzle is inconsistent, usually because of an incorrect guess. Also, solve_simply() returns unknown_cells, line 85. That is, it returns a list of Cell instances whose values are not yet known.

Step 3 now is implemented by dumpit() in lines 89-94.

The next routine, solve_completely(), basically embodies all the parts of Step 2.6, It takes a single parameter, "level", which basically says how many guesses we've had to make on this path through the puzzle. The first thing it does is call solve_simply(). If solve_simply() runs into an inconsistency, it raises the Inconsistent signal. This causes execution to resume at line 102, and solve_completely() returns right there.

Typically, however, execution proceeds to line 104. For many puzzles, solve_simply() returns an empty list, so solve_completely() returns YEA. In other words, except for very hard puzzles, s3.py acts exactly like s2.py.

But for those harder puzzles, we get to line 104 only to discover that there are 54 unknown cells. This is where the fun starts. In lines 111-114, we execute Step 2.6.1. That means we we create a backup copy of each cell's current value and set_of_possibles, using getbackup(), as described above.

In line 118, we choose a cell from unknown_cells[]. We're going to make a guess at the correct value to assign to "mycell".

Before that, lines 116-117 sorted unknown_cells[], based on the length of each cell's set of possible values. This seemed like a good idea at the time I wrote it, but only because it assumes that sorting is free. Doing the sort makes the program take nearly three times as long to solve the hardest-known puzzle. This is true even if I change line 117 to use ".XYZset_of_possibles" instead of ".possibles()".

The loop in lines 121-131 embodies the core of the guessing algorithm, Steps 2.6.2 through 2.6.5. Basically it plows through each value in myset[] (line 121), assigning it to "mycell" (123) and calling itself (124) with an incremented value for "level". When that happens, the new instance of solve_completely() calls solve_simply at line 100. If an inconsistency is found, line 102 is hit. We resume execution in the old instance at line 127, where we restore the backup copy in lines 128-129. We then try the next value in myset[] by running the next pass of the loop, starting at line 122.

If no inconsistency is found, however, the new instance proceeds as the old one did. It chooses a cell from unknown_cells[], saves a copy of each cell's value and set_of_possibles, tries each value in the set by calling itself and so on.

I should mention here, in case it's not obvious, that each time solve_completely() calls itself, it gets a new set of local variables--unknown_cells, mycell, myset and so on. These local variables do not interact across calls. Each instance or each level of recursion is independent of the others.

As with all recursive routines, we need to be careful of a couple of things. First, we need to make sure the "base cases" are covered, which are those cases where you don't want the routine to call itself at all. On this score, we're okay because, as mentioned above, for the simple case, s3.py behaves exactly like s2.py did. It calls solve_simply(), and if that solves the puzzle, the puzzle is solved.

The second thing to do in recursion is to make sure that you really do get to a base case. In other words, make sure the recursion terminates so you don't have an infinite descent. In this aspect too, solve_completely() is safe, because on each pass, we assign a value to a formerly unknown cell (Step 2.6.2, line 123) before calling solve_completely() again. Thus, we are guaranteed that solve_simply (at line 100) returns a strictly shorter list than it did before. So those aspects of the recursion look reasonable.

If, at any level, we try all the values in myset and never get a solution, when we get to line 132, it returns NAY. On a predecessor level, this should result in another value being tried. So the program eventually should get to a solution, provided that a solution exists--which it may not, if you mis-type the puzzle.

The doit function begins at line 135, where you can see how this all of fits together. The new doit() function:

  • decides if we're going to run with verbose YEA or NAY (137-139)

  • figures out where the puzzle input is coming from (141-144)

  • pulls the puzzle into the cells[] array (146)

  • calls the complete solver and prints the result (147-148)

So, does it work?


   % cat hardest-known
    - - -  - 7 -  9 4 -
    - 7 -  - 9 -  - - 5
    3 - -  - - 5  - 7 -
    - 8 7  4 - -  1 - -
    4 6 3  - - -  - - -
    - - -  - - 7  - 8 -
    8 - -  7 - -  - - -
    7 - -  - - -  - 2 8
    - 5 -  2 6 8  - - -
   % ./s3.py hardest-known
   2 1 5 8 7 6 9 4 3
   6 7 8 3 9 4 2 1 5
   3 4 9 1 2 5 8 7 6
   5 8 7 4 3 2 1 6 9
   4 6 3 9 8 1 7 5 2
   1 9 2 6 5 7 3 8 4
   8 2 6 7 4 3 5 9 1
   7 3 4 5 1 9 6 2 8
   9 5 1 2 6 8 4 3 7
   %

Sure enough, it does.

Now this program is quite short and does only a small part of the Sudoku problem. That is, it only solves puzzles; it doesn't create them. It assumes that a puzzle has only one solution; it doesn't try to verify that by looking for multiple solutions. And it takes a "dumb" approach to the puzzle; it starts with any old cell and assigns any old value. But it does accomplish what we set out to do--solve any Sudoku puzzle under the sun.

Although the program isn't long--less than 200 lines, not counting blank and all-comment lines--we used a number of Python's features, included:

  • defining and using a class

  • elements in a class--data structures and member functions or methods

  • the __init__ method

  • some pitfalls of copying lists

  • importing modules

  • operations on data structures: strings, lists

  • loops: "for" and "while"

  • using "global"

  • some functional programming features: filter and lambda

  • opening, reading and closing a file

  • raising and catching an exception

  • elementary argument (sys.argv) handling

  • recursion

  • elementary debugger operations

This experience has been a lot of fun. Thanks for joining me on this ride.

Appendix B: Debugging Step 2.6

The code here is rather long, and some of my debugging required a lot of printouts. I'm not going to give you a complete code listing so you can repeat the exact same mistakes on your Linux box. Instead I'll give you a summary. My most recent mistake, or "bug" if you will, is detailed in Appendix C. Also, this part is based on an older version, before cell_class.py was separated from s3.py. So the line numbers may not make much sense.

a. When I tried to run my original version, I got


    % python s3_orig.py 1109.puz
      File "s3_orig.py", line 141
        def save_a_copy()
                        ^
    SyntaxError: invalid syntax
    %

I forgot the : at the end. I had the same problem a few lines later. D'oh!

b. Next, I hit this:


    % python s3_orig.py 1109.puz
    Traceback (most recent call last):
      File "s3_orig.py", line 197, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3_orig.py", line 193, in doit
        solve_completely(0)          # step 2
      File "s3_orig.py", line 159, in solve_completely
        except Inconsistent:                 # part of 2.6.4
    NameError: global name 'Inconsistent' is not defined
    %

I should have remembered this from my work on the Coconuts problem. I added this earlier in the program:


    # We have to define this somewhere.
    Inconsistent = "formerly 'Unknown_with_no_hope'"

c. With that fixed, I got


    % python s3.py 1109.puz
    Traceback (most recent call last):
      File "s3.py", line 199, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 195, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 160, in solve_completely
        unknown_cells = solve_simply()
    TypeError: solve_simply() takes exactly 1 argument (0 given)
    %

That's dumb! At some point I thought solve_simply was going to take a "lev" parameter, but there's really no point. So, I fixed it like this:


    % diff s3_orig.py s3.py
    58a59,60
    > # We have to define this somewhere.
    > Inconsistent = "formerly 'Unknown_with_no_hope'"
    86c88
    < def solve_simply(lev):
    ---
    > def solve_simply():
    %

d. Then, I got a name problem with cells:


    % python s3.py 1109.puz
    Traceback (most recent call last):
      File "s3.py", line 199, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 195, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 160, in solve_completely
        unknown_cells = solve_simply()
      File "s3.py", line 92, in solve_simply
        unknown_cells = filter(lambda a: not a.known(), cells)
    NameError: global name 'cells' is not defined
    %

What caused that? Formerly, "cells" was defined in the doit() function. When I split doit() into three parts, I forgot to declare cells in the global space and to say "global cells" in every function that modifies it.

e. With that fix, it worked for the simple case.


    % python s3.py 1109.puz
    1 5 2 3 4 8 7 6 9
    8 9 4 5 6 7 3 2 1
    3 6 7 2 1 9 5 8 4
    6 3 1 7 9 4 2 5 8
    4 8 5 6 3 2 1 9 7
    7 2 9 1 8 5 6 4 3
    5 7 8 4 2 1 9 3 6
    2 4 6 9 7 3 8 1 5
    9 1 3 8 5 6 4 7 2
    %

f. Now what happens when we get to a harder puzzle?


    % python s3.py hardest-known
    Traceback (most recent call last):
      File "s3.py", line 203, in ?
        doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
      File "s3.py", line 199, in doit
        solve_completely(0)          # step 2
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 181, in solve_completely
        if solve_completely(level + 1):           # 2.6.3: call myself
      File "s3.py", line 180, in solve_completely
        mycell.setvalue(myval)                    # 2.6.2
      File "s3.py", line 35, in setvalue
        if val not in self.set_of_possibles: raise setting_impossible_value
    NameError: global name 'setting_impossible_value' is not defined
    %

That didn't work so well. I tried running verbose, which didn't help too much. So I added one more output:


        myset = mycell.set_of_possibles[:]
    +   if verbose: print "L=", level, "cell", mycell.pos, myset
        for myval in myset:
           if verbose: print "L=", level, "; cell[", mycell.pos, "] =", myval
           mycell.setvalue(myval)                    # 2.6.2

That didn't help either, so I had to run the debugger. First, I commented out "doit". Then, in order to catch the signal, I put the "raise" statements on their own lines, like this:


32      # setvalue is used for 1.2.2
33      def setvalue(self, val):
34         # a couple of sanity checks
35         if val not in range(1,9+1):
36            raise val_must_be_between_1_and_9
37         if val not in self.set_of_possibles:
38            raise setting_impossible_value       ### BREAKPOINT HERE ###
39         if self.known():
40            raise setvalue_called_but_already_known

Let's see how it goes:


    % python
    Python 2.4 (#1, Mar 22 2005, 21:42:42)
    [GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import s3
    >>> import pdb
    >>> pdb.run('s3.doit(["whatever", "-v", "hardest-known"])')
    > <string>(1)?()
    (Pdb) b s3.doit
    Breakpoint 1 at /home/collin/sudoku-article/s3.py:194
    (Pdb) n
    > /home/collin/sudoku-article/s3.py(196)doit()
    -> if len(argv) > 1 and argv[1] and argv[1] == '-v':
    (Pdb) b 38
    Breakpoint 2 at /home/collin/sudoku-article/s3.py:38
    (Pdb) 

Much debugging output later, I got this:


    L= 6 cell 24 [2, 6]
    L= 6 ; cell[ 24 ] = 2
    L= 6 ; cell[ 24 ] = 6

This is the first time in execution that we are trying to set the same cell to a second value.


    > /home/collin/sudoku-article/s3.py(38)setvalue()
    -> raise setting_impossible_value
    (Pdb) p self
    <s3.Cell instance at 0x4049398c>
    (Pdb) p self.pos
    24
    (Pdb) p self.set_of_possibles
    []

Wait a minute, in that new line I added for the verbose output, it printed "myset" as containing [2, 6]. Why is it now saying that cell 24 can't be that?


    (Pdb) where
      /usr/lib/python2.4/bdb.py(366)run()
    -> exec cmd in globals, locals
      <string>(1)?()
      /home/collin/sudoku-article/s3.py(206)doit()
    -> solve_completely(0)          # step 2
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(188)solve_completely()
    -> if solve_completely(level + 1):           # 2.6.3: call myself
      /home/collin/sudoku-article/s3.py(187)solve_completely()
    -> mycell.setvalue(myval)                    # 2.6.2
    > /home/collin/sudoku-article/s3.py(38)setvalue()
    -> raise setting_impossible_value
    (Pdb) up
    > /home/collin/sudoku-article/s3.py(187)solve_completely()
    -> mycell.setvalue(myval)                    # 2.6.2
    (Pdb) l
    182        mycell = unknown_cells[0]         # ... shortest set_of_possibles
    183        myset = mycell.set_of_possibles[:]
    184        if verbose: print "L=", level, "cell", mycell.pos, myset
    185        for myval in myset:
    186        if verbose: print "L=", level, "; cell[", mycell.pos, "] =", myval
    187  ->       mycell.setvalue(myval)                    # 2.6.2
    188           if solve_completely(level + 1):           # 2.6.3: call myself 
    189              return YEA                             # 2.6.5: Done.
    190           # 2.6.4: It didn't work.  Restore and try the next value
    191           restore_the_copy(mycopy)
    192        return NAY                        # Propagate "didn't work" back
    (Pdb) p mycopy[24]
    <s3.Cell instance at 0x4049398c>
    (Pdb) p mycopy[24].pos
    24
    (Pdb) p mycopy[24].set_of_possibles
    []
    (Pdb) p cells[24].set_of_possibles
    []
    (Pdb)

My copying code is wrong. How wrong is it?


    (Pdb) p cells[24].value
    2
    (Pdb) p mycopy[24].value
    2
    (Pdb) p cells[24]
    <s3.Cell instance at 0x4049398c>
    (Pdb) p mycopy[24]
    <s3.Cell instance at 0x4049398c>

It's totally and completely wrong! See how cells[24] and mycopy[24] refer to the exact same address? This tutorial suggests that maybe a shallow copy could work:


    (Pdb) import copy
    (Pdb) foo = copy.copy(cells[1])
    (Pdb) foo.set_of_possibles = foo.set_of_possibles[:]
    (Pdb) print cells[1]
    <s3.Cell instance at 0x404934ac>
    (Pdb) p foo
    <s3.Cell instance at 0x4049cc0c>
    (Pdb) p foo.set_of_possibles
    [1, 2]
    (Pdb) p cells[1].set_of_possibles
    [1, 2]
    (Pdb) cells[1].setvalue(1)
    (Pdb) p cells[1].set_of_possibles
    []
    (Pdb) p foo.set_of_possibles
    [1, 2]
    (Pdb)

Yes, that should work fine. So try this:


    # for 2.6.1: maybe overkill since we need only value and
    # set_of_possibles
    def save_a_copy():
       the_copy = []
       for acell in cells: # have to copy the set of possibles, not just a ref.
          acopy = copy.copy(acell)
          acopy.set_of_possibles = acell.set_of_possibles[:]
          the_copy.append(acopy)
       return the_copy

Fools rush in where wise men fear to tread; uncomment the last line and let 'er rip:


% python s3.py hardest-known
    2 1 5 8 7 6 9 4 3
    6 7 8 3 9 4 2 1 5
    3 4 9 1 2 5 8 7 6
    5 8 7 4 3 2 1 6 9
    4 6 3 9 8 1 7 5 2
    1 9 2 6 5 7 3 8 4
    8 2 6 7 4 3 5 9 1
    7 3 4 5 1 9 6 2 8
    9 5 1 2 6 8 4 3 7
    %

Although that worked, it's not so nice, because both save_a_copy() and restore_the_copy() refer to fields in the class. It's violating the object-oriented principle of information hiding. Instead, let's hide the information within the class and treat the backup copy as an opaque object. We only need the value and the set_of_possibles, so that's what we'll put into each opaque object. Add these lines to s3cell_class.py:


       def getbackup(self):
          return [ self.value, self.set_of_possibles[:] ]
       def restore(self,bkp):
          self.value = bkp[0]
          self.set_of_possibles = bkp[1]   # See Appendix C on this.

Now, code save_a_copy and restore_the_copy to use the above routines:


    # for 2.6.1: cell.getbackup() returns an opaque (to us) object for
    # later use
    def save_a_copy():
       the_copy = []
       for acell in cells:
          the_copy.append(acell.getbackup())
       return the_copy

    # for 2.6.4: restore, using the object given by cell.getbackup()
    def restore_the_copy(acopy):
       for i in range(81):
          cells[i].restore(acopy[i])
       pass # end of restore_the_copy

Appendix C: Why cell_class.restore() Must Restore a Slice

We were discussing these methods:


62      def getbackup(self):
63         return [ self.XYZvalue, self.XYZset_of_possibles[:] ]
64      def restore(self,bkp):
65         self.XYZvalue = bkp[0]
66         self.XYZset_of_possibles = bkp[1][:]

I mentioned that restore originally was coded as


   def restore(self,bkp):
      self.value = bkp[0]
      self.set_of_possibles = bkp[1]

which caused some subtle problems. What kind of problems might this cause? My original thinking was this: for getbackup(), the "original" list--the one in cells[]--is about to change, hence we need to make a copy of the list. But we don't need to do that on restore(), because after all, what's going to happen to the backup copy? We're simply going to restore from it again, aren't we?

That thinking has a fatal flaw. Recall that the backup copy was saved only once (lines 112-114). The copy, and in particular the copy of the set_of_possibles, is completely separate from the data structures in cells[]. But what happens if the loop (lines 121-131) is executed three or four times? Suppose, in other words, that at line 121, myset contains [1,2,4]. We execute lines 123-124 with myval==1. Assuming that we didn't find a solution, we execute the loop in 128-129. When restore() was coded the old way, the set_of_possibles (XYZset_of_possibles) was set to reference the same list that's in mycopy. We don't have a problem yet, because what's in cells[] does match what was there before.

Now we execute 123-124 with myval==2. During the course of that execution, the set_of_possibles[] in various cells is modified. Because the old version of restore() copied references, at this point we were modifying the lists stored in mycopy[]. Let's assume that line 124 returns NAY a second time, and we execute lines 128-129. The restore() call restores the old value that any cell had, but it does absolutely nothing with a cell's XYZset_of_possibles, because the cell's XYZset_of_possibles is a second reference to the same list that mycopy[] has!

So, when we execute lines 123-124 with myval==4, each cell's XYZset_of_possibles is set to the exact same set that it was after the failed attempt that occurred when myval was 2. This results in spurious executions of line 64--Inconsistent is raised--and the puzzle is not solved.

What hid this problem was the sort operation in s3.py:


116   #  unknown_cells.sort(
117   #     lambda c1, c2: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]             # ... shortest set_of_possibles
119      myset = mycell.possibles()
120      if verbose: print "L=", level, "cell", mycell.getpos(), myset
121      for myval in myset:
122      if verbose: print "L=", level, "; cell[", mycell.getpos(), "] =", myval

When lines 116-117 were uncommented, line 118 would set mycell to a cell with a short set_of_possibles. In the case of the hardest-known puzzle--the only puzzle I know of that actually requires guessing--this means len(myset)==2 always. Therefore, the loop (lines 122-131) is executed twice at the most. Because the loop never is executed a third time in the same instance of solve_completely(), we don't run into the case where a cell's XYZset_of_possibles[] loses some values erroneously. Thus, the defect in restore() remained hidden.

So how did I come to find out about the defect? Well, the original program executed the sort operation, and the program took some 25 seconds on my old Toshiba 460CDX laptop (Pentium MMX, bogomips 163). I tried commenting out lines 116-117 and found that the hardest-known puzzle was solved in about eight seconds. So I wondered what would happen if I ran the recursion with the longest list first. I uncommented lines 116-117 and changed the sort to arrange unknown_cells in descending order, like this:


116      unknown_cells.sort(
117         lambda c2, c1: cmp(len(c1.possibles()), len(c2.possibles())))
118      mycell = unknown_cells[0]

Notice that c1 and c2 are reversed in line 117. Then, line 118 gets a cell with the longest set of possibles--six possible values, actually. That's how I discovered the defect with the original version of restore().

Collin Park works for Network Appliance, where he uses Linux on his desktop and laptop computers. He does data recovery and other Linux-related stuff at home, where he lives with his wife and their two teenage daughters. All use Linux to meet their computing needs.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article!

OldAl's picture

Collin Park deserves commendation for this great article. I sure hope it will be published. When I read part1 and 2 of this article, I started buying the Linux Journal. (At my ripe age I like to read in bed, besides I spend too much time at the monitor already).

It is a courageous account of a C haker exploring the intricacies of Python. And an instructive account. I sure want to see it printed!

From down-under Australia,

With sincere best wishes to Collin, his better half and their two brighter-than-your daughters (but I would have to compare them to our granddaughters...)

OldAl.

I hope this doesn't get

Anonymous's picture

I hope this doesn't get printed.

Pure code

Mikael S. H.'s picture

Is it possible to get a pure code copy of you program. It's not as straghtforwards as I thought to remove the numbering when copy-pasting it directly from here.

unnumbered code

collin's picture

OK, I've put a copy here with a pointer back to part4 of the article.

Yes and No constants.

Coachher's picture

Python does actually have yes and no constants, they're True and False. None is closer to null rather than no.

Yes/No, YEA/NAY, True/False

collin's picture

I have rather older pythons. In Part 2 I mentioned using 1.5.x, which is the Python version on my Toshiba 460CDX laptop that I bought used in Japan in 1998 -- I wrote most of these articles on that old reliable laptop. But even 2.2 doesn't grok "False":

Python 2.2 (#1, Mar 26 2002, 15:46:04)
[GCC 2.95.3 20010315 (SuSE)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo(urk):
... if urk==17: return True
... return False
...
>>> print foo(14)
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in foo
NameError: global name 'False' is not defined
>>>

Almost there ...

ucntcme's picture

That is because they were introduced in 2.2.1, not 2.2.0. ;)

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState