Why Not Python?, Part 2

This time out, the old C hacker drags himself into the 1990s to solve Sudoku puzzles.

Before I get into the next program, I should mention the Python home page, which offers recent versions of the interpreter and a lot of helpful information about the language. In particular, this tutorial is excellent. Still, I highly recommend getting a hold of Learning Python by Mark Lutz and David Ascher (O'Reilly 1999), which explains things way more thoroughly than the on-line tutorial. You also may have the tutorial and other Python documentation on your GNU/Linux box. My laptop has the tutorial located at


/usr/doc/packages/pyth_doc/html/tut/tut.html

as part of the Python documentation package, pyth_doc-1.5.1-11 in my case. Recent distributions may include both PDF and HTML versions of the documentation, and you also can download them here or from a mirror.

Speaking of distributions, we need to think about the issue of compatibility. I'm writing most of this article on a laptop running SuSE's Office'99 distribution, which includes this version of Python:


% python
Python 1.5.1 (#1, Sep 24 1998, 04:14:53)  [GCC 2.7.2.1] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>>

The on-line tutorial I mention above is much newer; it identifies itself as part of Python 2.4.2. All the programs in this article have been run on Python 2.4 (SuSE 9.3) and Python 1.5.1 (SuSE 5.3 - Office'99).

Okay, on to the next program.

Have you ever see a puzzle that made you want to write a program to solve it? I had that feeling a few months ago, when I noticed Sudoku in our local paper.

As you can see from the image above, the puzzle consists of a 9x9 matrix that is divided like a tic-tac-toe board into nine 3x3 sub-matrices. Each sub-matrix, each row and each column must contain one of each of the digits from 1-9.

Data Structure

Writing a program to address this puzzle shouldn't be all that hard. First, you need a data structure. You could number the cells from 0-80, like this:


   0   1   2   3   4   5   6   7   8
   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23  24  25  26
  27  28  29  30  31  32  33  34  35
  36  37  38  39  40  41  42  43  44
  45  46  47  48  49  50  51  52  53
  54  55  56  57  58  59  60  61  62
  63  64  65  66  67  68  69  70  71
  72  73  74  75  76  77  78  79  80

Or, you could make an array for each row, with an array containing all of these row-arrays. You also could do it with columns. Or, you could group each sub-matrix into a one- or two-dimensional array and form a one- or two-dimensional array containing these sub-matrices.

All of these options are possible, but because we have to deal with rows, columns and sub-matrices, I decided to stick with a one-dimensional array of cells, numbered 0-80. I also decided to use some other data structures to handle the rows.

For each cell, we need to track:

  • the digit in the cell, if known

  • the set of possible digits that could be in the cell, if the exact digit is unknown

We could have a data structure containing the known digits of all the cells and another one containing the set of possible digits of all the cells. Or, we could have a single array of data structures, one data structure for each cell. Each one would contain all the information about its cell, regardless of whether the digit is known or unknown.

Something--call it programmer's intuition--gives me the feeling that the latter option will make the coding easier.

So we're going to have an array, numbered 0-80. Each element in the array is a data structure to tell us the digit, if known. If the exact digit is unknown, the array tells us the set of possible digits. i

Algorithm

A data structure by itself isn't a program, however; we need to operate on it. What the program must do, essentially, is:

  1. read in the values of the cells specified and draw some elementary conclusions about what the blanks will be

  2. fill in the empty cells; that is, solve the puzzle)

  3. print out the answer

Back in the 1970s, this new-fangled thing called "structured programming" talked about top-down design and stepwise refinement. The idea was to break the big steps into smaller pieces that could be coded confidently.

I like this idea, but I also like the new new-fangled thing called "extreme programming", which says to write the test cases before you write the code. So I decided to practice this idea on step #1. I fed a puzzle to the code and checked a few of the blanks to make sure we restricted the possibilities appropriately.

Step 1: Refining, Coding and Testing

Start by reading in the values of the cells specified and drawing some elementary conclusions about what the blanks will be. It might be cool if the program could read the newspaper and do optical character recognition (OCR) to figure out the digits. But even if it could, the OCR part still needs to communicate the numbers to the problem-solving part. Let's restrict ourselves to the problem-solving part and leave the OCR to the person behind the keyboard.

So, let's have the program read the digits from a file. Rather than doing anything fancy, let it ignore whitespace. Let each digit stand for itself, and use - to represent a blank cell. We also could allow . for blank cells, which is what I've seen on a Sudoku message board. So the following would be valid input:


        1-234----
        ---567---
        --7---584
        63-7-4---
        48-----97
        ---1-5-43
        578---9--
        ---973---
        ----564-2

as would this:


        1.234.......567.....7...58463.7.4...48.....97
        ...1.5.43578...9.....973.......564.2

or this:


         1 - 2 3 4 - - - -
         - - - 5 6 7 - - -
         - - 7 - - - 5 8 4
         6 3 - 7 - 4 - - -
         4 8 - - - - - 9 7
         - - - 1 - 5 - 4 3
         5 7 8 - - - 9 - -
         - - - 9 7 3 - - -
         - - - - 5 6 4 - 2

All of these input variations mean exactly the same thing. That is, all of them represent the puzzle shown in the image scanned from the paper.

Now, besides reading the characters, the program should fill in the appropriate values in the data structures. Actually, before we even start, we should initialize each cell's structure to say the value is not known but could be anything. In other words, the set of possible digits would be {1,2,3,4,5,6,7,8,9}.

Then, as we read in the values, when we find a known one, we would set the "if known" part to that value. We then would remove that value from the set_of_possibles from the rest of that cell's row, column and 3x3 sub-matrix.

The rows are numbered in the range (0,9), with cells 0-8 in row 0, cells 9-17 in row 1 and so on. We can calculate the row number by taking int(pos/9), where pos is the cell position.

Columns are numbered in the range (0,9) with cells 0,9,18,27... in column 0; cells 1,10,19,28... in column 1 and so on. Column numbers are calculated by taking (pos%9).

Submatrices are numbered 0-8. A cell with a row number in the range (0,3) and a column number in the range (0,3) is in submatrix 0; column numbers in the range (3,6) are in submatrix 1 and so on. These submatrices are laid out as

Therefore, submatrix 0 consists of cells 0,1,2,9,10,11,18,19,20. Submatrix 1 is comprised of cells 3,4,5,12,13,14,21,22,23. To figure out the submatrix number, we don't need the exact row and column number; we simply need int(myrow/3) and int(mycol/3):

    

        mysub = int(myrow/3) * 3 + int(mycol/3)

To test this portion of the code, we feed in the above example and use the Python debugger (pdb) to check it:

  • although cell 8 (upper right-hand corner) isn't yet "known", it could be 6 or 9.

  • cell 72 (lower left) can be 3 or 9.

  • submatrix 2 contains cell 8.

  • submatrix 6 contains cell 72.

Now, to do the "stepwise refinement" portion of step 1, we initialize each cell to contain:


        value=unknown
        set_of_possibles={1,2,3,4,5,6,7,8,9}

We also read in each value from the input file. If the value is unknown--either "-" or "."--leave the cell alone. If known, set the data structure's digit (the "if known" part) to that known value. Then, zap the set_of_possibles for that cell. Finally, remove that value from the set of possible digits in the rest of that cell's row, column and 3x3 sub-matrix.

Okay, so let's code it! We have an array (a "list" in Python) of cells. For each cell, we need a notion of what rows, columns and submatrices it belongs to.

The Python book talks about "classes", which would give me a chance to try out this object-oriented programming thing.

All right, after a little puzzling, here is what I came up with. Note: unlike the "Coconuts" program in Part 1, which was short and easy, I ran into a few errors with this one. Details are outlined in Appendix A.


    1   class Cell:
    2      rows = [ [], [], [], [], [], [], [], [], [] ]    # nine each...
    3      columns = [ [], [], [], [], [], [], [], [], [] ]
    4      submatrices = [ [], [], [], [], [], [], [], [], [] ]
    5   
    6      # for step 1.1, do "cells[i]=Cell(i)" for i in range(0,81)
    7      def __init__(self,pos):
    8         # "pos" = position in the puzzle.  Valid values: range (0,81)
    9         global rows, columns, submatrices
   10         if pos not in range(0,81): raise Illegal_pos_in_Cells_initializer
   11         self.pos = pos
   12         self.value = 0
   13         self.set_of_possibles = range(1, 10)  # 1-9 INclusive
   14   
   15         # For step 1.2.2b, track which row, col, sub that I belong to.
   16         myrow = int(pos / 9)
   17         mycol = pos % 9
   18         mysub = int(myrow/3) * 3 + int(mycol/3)
   19   
   20         self.row = Cell.rows[myrow]
   21         self.col = Cell.columns[mycol]
   22         self.sub = Cell.submatrices[mysub]
   23   
   24         self.row.append(self)
   25         self.col.append(self)
   26         self.sub.append(self)
   27   
   28      def known(self):
   29         return (self.value != 0)
   30   
   31      # setvalue is used for 1.2.2
   32      def setvalue(self, val):
   33         # a couple of sanity checks
   34         if val not in range(1,9+1): raise val_must_be_between_1_and_9
   35         if val not in self.set_of_possibles: raise setting_impossible_value
   36         if self.known(): raise setvalue_called_but_already_known
   37   
   38         self.value = val              # 1.2.2a
   39   
   40         # Now do 1.2.2b
   41         for other in self.row + self.col + self.sub:
   42            if other is self: continue
   43            if other.known(): continue
   44            if val in other.set_of_possibles:
   45               # Python 1.5.1 had "remove", which isn't in the book.
   46               # Deprecated?
   47               x = other.set_of_possibles.index(val)
   48               del other.set_of_possibles[x]
   49   
   50   import sys
   51   def doit(argv):
   52      cells = []
   53      for i in range(0,81): cells.insert(i,Cell(i))          # 1.1
   54   
   55      # Here, any cell's set_of_possibles should be full
   56   
   57      if len(argv) > 1 and argv[1]:
   58         input = open(argv[1], 'r')
   59      else:
   60         input = sys.stdin
   61   
   62      all_input_lines = input.readlines()
   63      input.close()
   64   
   65      which_cell = 0
   66      for one_input_line in all_input_lines:
   67         for char in one_input_line:
   68            if char in '\t\n ': continue
   69            if char in '-.': 
   70               which_cell = which_cell + 1
   71            elif ord(char) in range (ord('1'), ord('9')+1):
   72               cells[which_cell].setvalue(ord(char)-ord('0'))
   73               which_cell = which_cell + 1
   74            if which_cell > 81: raise too_much_input
   75      pass   # so the debugger can break here
   76   
   77   # main begins here
   78   doit(sys.argv)

I'll try to explain this. Lines 1-48 describe a class called "Cell", which has:

  • three lists: rows, columns and submatrices (lines 2-4)

  • an initialization function (lines 7-26)

  • a function to say if a given cell's value is known (lines 28-29)

  • a function that implements the algorithm (lines 32-48)

Every instance of the class "Cell" represents one cell in the puzzle and includes the following attributes:

  • pos: position in the puzzle. 0 for the upper left-hand corner, 80 in the lower right-hand corner.

  • value: zero if not yet known; otherwise, a number from 1 to 9 inclusive.

  • set_of_possibles: a Python list of values that--as far as we know--the cell could be. The set_of_possibles is zapped (set to an empty list) once the value is known.

  • row: a list of cells that are in the same row that this cell is in; it's actually a reference to an element of Cell.rows.

  • col, sub: analogous to "row" but corresponding to the cell's column and submatrix, respectively.

The calculations for which row, column and submatrix are carried out in lines 16-18. This __init__ function assumes that it will be called exactly once for each value of "pos" in the range(0,81).

For setting the value of a cell, the "setvalue" function is called. Some simple checks are done--Is the value legal? Are you setting the cell to a value we already know it can't be?--in lines 34-36, and the value itself is set in line 38. Lines 41-48 find all cells in the same row, column or submatrix and remove "val" from the set_of_possibles.

Lines 51-75 describe the "doit" function. My original version didn't have this as a function, but it's now in a function to make debugging easier. It initializes the cells[] list and then decides (lines 57-60) whether to get the puzzle from a file or from stdin.

Lines 66-74 interpret the input and call the "setvalue" function for the appropriate cell.

Line 78 simply tells Python that when run, it should execute the "doit" function.

The next problem is: does the code really work? It turns out that pdb isn't exactly like, say, gdb, because I had to modify the module in order to debug it. Here's what I mean: you import the module, import pdb and then execute the pdb.run function. But when you import the module, it executes everything in the module. That's why I used the function "doit", which does everything at line 51. To debug s0.py, I also have to comment out line 78, like this:


     % vi s0.py              ## could have used any other editor...
     % pr -tn s0.py|tail
        69            if char in '-.': 
        70               which_cell = which_cell + 1
        71            elif ord(char) in range (ord('1'), ord('9')+1):
        72               cells[which_cell].setvalue(ord(char)-ord('0'))
        73               which_cell = which_cell + 1
        74            if which_cell > 81: raise too_much_input
        75      pass   # so the debugger can break here
        76   
        77   # main begins here
        78   # doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING
     % 

With that out of the way, recall the four test cases:

  • although cell 8 (upper right-hand corner) isn't yet "known," it could be 6 or 9.

  • cell 72 (lower left-hand corner) can be 3 or 9.

  • submatrix 2 contains cell 8.

  • submatrix 6 contains cell 72.

So let's try it!


     % python
     Python 1.5.1 (#1, Sep 24 1998, 04:14:53)  [GCC 2.7.2.1] on linux2
     Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
     >>> import s0

    ### The file is "s0.py" but when "import"ing, you don't type ".py"

     >>> import pdb
     >>> pdb.run('s0.doit(["s0.py", "1109.puz"])')

    ### For pdb.run, you have to qualify the name of the function with
    ### the module name.  

     > <string>(0)?()
     (Pdb) break s0.doit
    (Pdb) n
     > s0.py(51)doit()
     -> def doit(argv):
     (Pdb) l 70
      65        which_cell = 0
      66        for one_input_line in all_input_lines:
      67           for char in one_input_line:
      68              if char in '\t\n ': continue
      69              if char in '-.': 
      70                 which_cell = which_cell + 1
      71              elif ord(char) in range (ord('1'), ord('9')+1):
      72                 cells[which_cell].setvalue(ord(char)-ord('0'))
      73                 which_cell = which_cell + 1
      74              if which_cell > 81: raise too_much_input
      75        pass   # so the debugger can break here
     (Pdb) b 75

    ### I wanted to have a breakpoint at the end of the function doit,
    ### so I put in an artificial "pass" statement in line 75

     (Pdb) c
     > s0.py(75)doit()
     -> pass   # so the debugger can break here
     (Pdb) p cells[8].set_of_possibles
     [6, 9]
     (Pdb) p cells[8].known()
     0

    ### My first test case: cell 8 is not known, and its possible values
    ### are 6 and 9.  Whee!

     (Pdb) p cells[72].set_of_possibles
     [3, 9]
     (Pdb) p cells[72].known()
     0
     (Pdb) for x in Cell.submatrices[2]: print x.pos,
     6 7 8 15 16 17 24 25 26(Pdb) print 

    ### Sorry for the ugly formatting.  I wanted to print out the 
    ### "cell numbers" of each cell in the "submatrices" array and 
    ### do it in a single line.  It printed the cells in submatrices[2]
    ### and didn't start a new line -- so the Pdb prompt was stuck on
    ### the end of the line.  Saying "print" got us to a new line.

    ### Bottom line, though: submatrices[2] does contain cell 8.
    ### And submatrices[6] contains cell 72:

     (Pdb) for x in Cell.submatrices[6]: print x.pos,
     54 55 56 63 64 65 72 73 74(Pdb) print

     (Pdb) quit
     >>> 
     % 

So that's the code for Step 1 of the algorithm.

Running without the Debugger

Having a debugger is better than not having one, but the debugger requires several manual steps--it's labor intensive. I believe it was Larry Wall, the inventor of Perl, who listed "laziness" as an attribute of good programmers. Larry is especially right when it comes to testing. It has to be easy to run tests, otherwise people--meaning me in this case and probably you too--won't do them correctly. The results will be wrong, either false positives or false negatives. So it's better not to have to use the debugger to test our program. Therefore, let's code Step 3, "glue" it onto Step 1 and test both parts together.

Coding Step 3

Now we need to print out the answer. What we should do here is print the digit that's in each cell. If we run it immediately after Step 1, some cells will be unknown. Let's print those out as "-", that is, in a form that this program can read later.

We can add this to the end of the program from last time.


       for bor in range(0,81,9):
          for i in range(bor,bor+9):
             if cells[i].value > 0: print cells[i].value,        # usual case
             elif len(cells[i].set_of_possibles) > 0: print '-', # unknown
             else: print '?',                                    # inconsistent!!
          print                        # end of this row

With part 1 and part 3, we have:


        % python
        Python 1.5.1 (#1, Sep 24 1998, 04:14:53)  [GCC 2.7.2.1] on linux2
        Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
        >>> import s1  
        >>> import pdb
        >>> pdb.run('s1.doit(["p1", "1109.puz"])')
        > <string>(0)?()
        (Pdb) n
        > <string>(1)?()
        (Pdb) n
        1 - 2 3 4 - - - -
        - - - 5 6 7 - - -
        - - 7 - - - 5 8 4
        6 3 - 7 - 4 - - -
        4 8 - - - - - 9 7
        - - - 1 - 5 - 4 3
        5 7 8 - - - 9 - -
        - - - 9 7 3 - - -
        - - - - 5 6 4 - 2
        --Return--
        > <string>(1)?()->None
        (Pdb) quit
        >>> 
        % 

That worked, so I changed the program to print the output at the end and made the doit function run without intervention. That way, I don't have to run the debugger every time. The end of the program now looks like this:



    % pr -tn s1.py | tail -12
       77      # Step 2 should go here
       78   
       79      # Code for step 3
       80      for bor in range(0,81,9):
       81         for i in range(bor,bor+9):
       82            if cells[i].value > 0: print cells[i].value,        # usual case
       83            elif len(cells[i].set_of_possibles) > 0: print '-', # unknown
       84            else: print '?',                                    # inconsistent!!
       85         print                        # end of this row
       86   
       87   # main begins here
       88   doit(sys.argv)     ## COMMENT OUT FOR DEBUGGING

And it can process the input like this:


        % python s1.py 1109.puz 
        1 - 2 3 4 - - - -
        - - - 5 6 7 - - -
        - - 7 - - - 5 8 4
        6 3 - 7 - 4 - - -
        4 8 - - - - - 9 7
        - - - 1 - 5 - 4 3
        5 7 8 - - - 9 - -
        - - - 9 7 3 - - -
        - - - - 5 6 4 - 2
        %
 

Looking at the above, I realized that my old C-hacker self had violated one of the principles of object-oriented programming--information hiding. The class Cell should completely encapsulate the internals of the cell's data structure, and users should use accessor functions (such as setvalue) to access the internals. Lines 82 and 83 "peek" inside the data structure, so let me add a "getvalue" accessor function and change the Step 3 code to use the "known" function. Here are the changed parts, with "*" indicating a line with new or changed code:


  47               x = other.set_of_possibles.index(val)
  48               del other.set_of_possibles[x]
  49   
* 50      def getvalue(self): return self.value
  51   
  52   import sys
  53   def doit(argv):
  ...
  81      # Code for step 3
  82      for bor in range(0,81,9):
  83         for i in range(bor,bor+9):
* 84            if cells[i].known(): print cells[i].getvalue(),     # usual case
* 85            else: print '-',                                    # unknown
  86         print                        # end of this row
  87   
  88   # main begins here

This version is cleaner, and it still runs without the debugger.

We'll handle Step 2 of the algorithm in the next part of this article.

Appendix A: How the Code Was Generated

Here is my original version of the above code. It contains several errors:


    1   class Cell:
    2      rows = [ [], [], [], [], [], [], [], [], [] ]    # nine each...
    3      columns = [ [], [], [], [], [], [], [], [], [] ]
    4      submatrices = [ [], [], [], [], [], [], [], [], [] ]
    5   
    6      # for step 1.1, do "cells[i]=Cell(i)" for i in range(0,81)
    7      def __init__(self,pos):
    8         # "pos" = position in the puzzle.  Valid values: range (0,81)
    9         global rows, columns, submatrices
   10         if pos not in range(0,81): throw Illegal_pos_in_Cells_initializer
   11         self.pos = pos
   12         self.value = 0
   13         self.set_of_possibles = range(1, 10)  # 1-9 INclusive
   14   
   15         # For step 1.2.2b, track which row, col, sub that I belong to.
   16         myrow = int(pos / 9)
   17         mycol = pos % 9
   18         mysub = int(myrow/3) * 3 + int(mycol/3)
   19         # The above calculation of "mysub" may be a little obscure, so
   20         # let me explain.  
   21         # 
   22   
   23         self.row = rows[myrow]
   24         self.col = columns[mycol]
   25         self.sub = submatrices[mysub]
   26   
   27         rows[myrow].append(self)
   28         columns[mycol].append(self)
   29         submatrices[mysub].append(self)
   30   
   31      def known(self):
   32         return (self.value != 0)
   33   
   34      # setvalue is used for 1.2.2
   35      def setvalue(self, val):
   36         # a couple of sanity checks
   37         if val not in range(1,9+1): throw val_must_be_between_1_and_9
   38         if val not in self.set_of_possibles: throw setting_impossible_value
   39         if self.known(): throw setvalue_called_but_already_known
   40   
   41         self.value = val              # 1.2.2a
   42   
   43         # Now do 1.2.2b
   44         for other in self.row + self.col + self.sub:
   45            if other is self: continue
   46            if other.known(): continue
   47            if val in other.set_of_possibles:
   48               # Python 1.5.1 had "remove", which isn't in the book.
   49               # Deprecated?
   50               x = other.set_of_possibles.index(val)
   51               other.set_of_possibles.del(x)
   52   
   53   import sys
   54   # main begins here
   55   cells = []
   56   for i in range(0,81): cells.insert(i,Cells(i))          # 1.1
   57   
   58   # Here, any cell's set_of_possibles should be full
   59   
   60   if sys.argv[1]:
   61      input = open(sys.argv[1], 'r')
   62   else:
   63      input = sys.stdin
   64   
   65   all_input_lines = input.readlines()
   66   input.close()
   67   
   68   which_cell = 0
   69   for one_input_line in all_input_lines:
   70      for char in one_input_line:
   71         if char in '\t\n ': continue
   72         if char in '-.': 
   73            which_cell = which_cell + 1
   74         elif ord(char) in range (ord('1'), ord('9')+1):
   75            cells[which_cell].setval(ord(char)-ord('0'))
   76            which_cell = which_cell + 1
   77         if which_cell >= 81: throw too_much_input

It took a half-dozen tries and changes before this code actually ran. Here's what happened. When I wanted to raise an exception, I wrote "throw" instead of "raise". Although I had read the fine manual, I had forgotten the magic word, and Python didn't like that:


        % python s0.py 
          File "s0.py", line 10
            if pos not in range(0,81): throw Illegal_pos_in_Cells_initializer
                                                                            ^
        SyntaxError: invalid syntax
        %

So I changed all "throw"s to "raise"s.

Because appending to a list is done with LIST.append(), I somehow thought deletions should be done the same way.


        % python s0.py
          File "s0.py", line 51
            other.set_of_possibles.del(x)
                                     ^
        SyntaxError: invalid syntax
        %
 

The correct syntax is


        del other.set_of_possibles[x]

Then, I had a typing error. I typed "Cells" rather than "Cell" in line 56. "Cell" is the name of the class, "cells" is an array, or a "list" in Python:


        % python s0.py
        Traceback (innermost last):
          File "s0.py", line 56, in ?
            for i in range(0,81): cells.insert(i,Cells(i))          # 1.1
        NameError: Cells
        %
 

Changing that to "Cell(i)" solved the problem.

This next error was harder to fix; it was a name scoping problem:


        % python s0.py
        Traceback (innermost last):
          File "s0.py", line 56, in ?
            for i in range(0,81): cells.insert(i,Cell(i))          # 1.1
          File "s0.py", line 23, in __init__
            self.row = rows[myrow]
        NameError: rows
        % 

After referring to the book, I realized that I had to use "Cell.rows" here. All references to Class data needs to be qualified by the class name, that is "Cell". So after fixing that, as well as "Cell.columns" and "Cell.submatrices", things were looking better.

With the syntax errors under control, there were some other dumb mistakes to fix, such as this one:


        % python s0.py
        Traceback (innermost last):
          File "s0.py", line 60, in ?
            if sys.argv[1]:
        IndexError: list index out of range
        %
 

Without first checking the length of sys.argv, I couldn't be sure that sys.argv[1] even existed. So now I check the length of sys.argv before accessing it.

After fixing that, I tried running the code. It didn't complain right away but waited for input. Now I was getting somewhere. I used the mouse to copy-and-paste the puzzle from above:


        % python s0.4.py
                 1 - 2 3 4 - - - -
                 - - - 5 6 7 - - -
                 - - 7 - - - 5 8 4
                 6 3 - 7 - 4 - - -
                 4 8 - - - - - 9 7
                 - - - 1 - 5 - 4 3
                 5 7 8 - - - 9 - -
                 - - - 9 7 3 - - -
                 - - - - 5 6 4 - 2
        Traceback (innermost last):
          File "s0.4.py", line 75, in ?
            cells[which_cell].setval(ord(char)-ord('0'))
        AttributeError: setval
        %
 

Another typing problem! I had used "setval" instead of "setvalue", which was easy to fix.

I put the puzzle data into a file called 1109.puz and ran the program that way; I don't like using the mouse.

The next problem was this:


        % python s0.5.py 1109.puz 
        Traceback (innermost last):
          File "s0.5.py", line 77, in ?
            if which_cell >= 81: raise too_much_input
        NameError: too_much_input
        %
 

which_cell always tells me how many cells I have processed so far. So after reading information for the last cell, number 80 in the array, which_cell will be 81. It was an off-by-one error, in other words. I changed ">=" to ">", after which I got:


        % python s0.6.py 1109.puz 
        %
 

Hooray! No complaints. This final version is what appears in the main part of this article.

Collin Park works for Network Appliance, where he uses Linux on his desktop and laptop computers. He does data recovery and other Linux-related stuff at home, where he lives with his wife and their two teenage daughters. All use Linux to meet their computing needs.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Simplest solver...

Swaroop's picture

that I've seen is this one.

Your "Simplest Solver"

Al. Kabaila's picture

Very nice solver, indeed. I was particularly impressed by your simple way of keeping the input of "problem" within the code and your use of "sets" module.

However, neither the original article, nor your alternative is able to solve sudoku puzzles that are more complex than the simplest ones. Just try the following sudoku from today's (2006-02-06) Canberra Times:

problem = [

0, 0, 2, 0, 0, 9, 6, 0, 0,
8, 3, 9, 7, 0, 0, 0, 0, 0,
4, 0, 0, 2, 0, 0, 0, 0, 0,

1, 9, 0, 0, 0, 7, 0, 0, 0,
7, 0, 0, 0, 9, 0, 0, 0, 4,
0, 0, 0, 3, 0, 0, 0, 9, 8,

0, 0, 0, 0, 0, 2, 0, 0, 6,
0, 0, 0, 0, 0, 8, 3, 1, 7,
0, 0, 7, 6, 0, 0, 2, 0, 0

]

You see, the algorithm that you use and the article discusses, actually relies on there being at least one "possibles" list that has one element only (yes, when this occurs, the cell _must_ be filled with that element). However, more complex sudoku puzzles, lead to positions where choices are not restricted to one. Then some of the choices lead to "death" - generation of zero length of possibles, whilst at least one other leads to a solution. Much, much more complex situation, actually. Just try it with the above problem data!

Kind regards,

OldAl.

PS: I am a (very) old man, but fairly new to Python. Where can I get hold of guidelines about "sets" module? TIA.
A.

>> PS: I am a (very) old

Anonymous's picture

>> PS: I am a (very) old man, but fairly new to Python. Where can I get hold of guidelines about "sets" module? TIA.

Sets have been introduced as a module in Python 2.3 and as a full-fledged native object in Python 2.4 (they're at the same level of dict or list), collin gave you the 2.3 documentation, the documentation for 2.4 sets would be at the Set Types page in the Lib Ref (http://python.org/doc/2.4.2/lib/types-set.html)

simple solver

collin's picture

However, neither the original article, nor your alternative is able to solve sudoku puzzles that are more complex than the simplest ones.

Absolutely right. To solve puzzles like that you need something that is either smarter (knows about locked candidates, "naked pairs" and the like) -- or can guess. Part 4 has my approach. It seems to solve your puzzle OK...

% ./s3.py cbtimes.20060206
5 7 2 4 8 9 6 3 1
8 3 9 7 6 1 4 5 2
4 6 1 2 3 5 8 7 9
1 9 4 8 2 7 5 6 3
7 8 3 5 9 6 1 2 4
6 2 5 3 1 4 7 9 8
3 5 8 1 7 2 9 4 6
2 4 6 9 5 8 3 1 7
9 1 7 6 4 3 2 8 5
%

About sets -- apparently they're new in 2.3. I've never used them (I've older Pythons mostly) but they look pretty cool:
http://python.org/doc/2.3.5/lib/module-sets.html

Collin, thanks for your

Rainer Klein's picture

Collin,
thanks for your great article.

Here is an alternative way to print the matrix:

print "The Puzzle is:"
count = 0
for item in cells:
print item.known() and item.getvalue() or " -",
count += 1
if (count % 9) == 0: print

Cheers,

Rainer

thanks, that would work fine too!

collin's picture

it looks a little nicer than mine too....

Changing code for debugging not necessary

Michael Wood's picture

Instead of:

doit(sys.argv)

use:

if __name__ == "__main__":
    doit(sys.argv)

That way doit() will not be run when you import the module.

misc comments

Pádraig Brady's picture

lists do have the remove method:
dir([]), help([].remove)
Online references are always more
up to date than the paper one you referred to:
pydoc

Python allows nice shortcuts like:
rows = [ [] ]*9

Reading lines from a file is a very common operation.
The pythonic way to do it is:

for line in input:
print line,

See also: python methods for reading lines

You've too many ord() calls. Just convert using int() once.

cheers.
Pádraig.

thanks for your misc. comments

collin's picture

I'm getting rid of the ord() calls and modifying the file reading loop per your suggestion. Thanks!

I saw .remove() somewhere ... on the website maybe? but wondered if it was deprecated or something because the book didn't mention it. Seeing no indication of that, though, I decided to go ahead and use the remove() ... in the next part of the article, coming up soon.

beware repeating mutables!

David Goodger's picture

Pádraig Brady wrote:

"""
Python allows nice shortcuts like:
rows = [ [] ]*9
"""

Danger Will Robinson! This is a common mistake. The above example gives you 9 references to THE SAME LIST!

>>> rows = [ [] ] * 9
>>> rows
[[], [], [], [], [], [], [], [], []]
>>> rows[0]
[]
>>> rows[0].append(1)
>>> rows
[[1], [1], [1], [1], [1], [1], [1], [1], [1]]

The correct way to initialize a list of lists is with a for loop, or (in newer Pythons) a list comprehension:

>>> rows = [[] for i in range(9)]
>>> rows
[[], [], [], [], [], [], [], [], []]
>>> rows[0].append(1)
>>> rows
[[1], [], [], [], [], [], [], [], []]

Each of the 9 lists is independent here.

Looking at the above, I

Fredrik Blom's picture

Looking at the above, I realized that my old C-hacker self had violated one of the principles of object-oriented programming--information hiding. The class Cell should completely encapsulate the internals of the cell's data structure, and users should use accessor functions (such as setvalue) to access the internals.

Getters and setters have no useful purpose in Python as you cannot hide the data. It'll only add some extra overhead; and in the end gained nothing.

I disagree. Properties are

Anonymous's picture

I disagree. Properties are extremely useful for virtual members (e.g. output/input data that don't match internal data of the object, proxy for other data, ...), or to validate/protect data.

The pythonic way to say "don't touch this" is to prefix a member value with "_". While it's not enforced by the language, it is by convention an object for internal use only and not part of the object's external interface (don't whine if that breaks in the next version).

Properties more or less deprecated Java style getters/setters.

Cool

Anonymous's picture

Cool

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState