Translate Haskell into English Manually, Part II
This is part two of a two-part series [see linuxjournal.com/article/9096 for part one]. In this
article, I build upon the concept
of a parse pipeline with the State monad. Then, with all the pieces
in place, I show the complete Haskell program. I also take a step
back and consider why Haskell isn't as popular as some other languages
and whether it deserves a place in your programmer's toolbox.
The State Monad
In part one of this series, I showed parse pipelines that looked like:
c $ b $ a ctx
Now, there are two problems with basing the whole program around this
construct. The first is that everything is written backwards. It's not too
hard to work around that by creating an operator that reads left to right,
unlike $. However, it's also a pain to pass the
ParseContext explicitly to and from every function that acts as a ParseContext
transformer. [Note: 1. In an early version of my program, I wrote the
following:
-- |> is like a UNIX pipe. infixl 9 |> x |> f = f x
Hence, c $ b $ a ctx could be written
ctx |> a |> b |> c. I
originally
wrote the whole program around this construct instead of the State
monad,
but I like the State monad better because of Haskell's syntactic sugar
for monads.]
Purely functional means you have to pass everything to the function
(that is, no globals) and you have to return everything from the function
(that is, no side effects allowed, not even printing a message to
the user). Passing everything explicitly can get tedious, but that's
something Haskell has an answer for—monads. Instead of having to
piece a pipeline together using $, the do syntax can transform the
above into:
do a b c
Not only does the do syntax “read the right way”, but (and this is a key
point) it's also syntactic sugar for saying, “the result of 'a' will be needed
to call 'b', and the result of 'b' will be needed to call 'c'.”
It's still a
pipeline, but now it's syntactically convenient.
The State monad is a monad that comes with Haskell, which is perfectly suited
for the current situation. The State monad is a simple monad that simply
encapsulates a bit of state. Looking at the example above, a, b and c
are all functions that take and modify this implicit state. Because each might
modify that state, they must be run in order, which, of course, isn't
necessarily the norm in Haskell.
The notion of a ParseContext transformer dovetails perfectly with how the
State monad works. Each ParseContext transformer will now have the
following signature:
f :: State ParseContext ()
This translates into:
"f" is a function that has type "State ParseContext ()", which is to say that "f" operates within the "State" monad where the "State" monad is encapsulating a "ParseContext" object.
In summary, what does the monad buy us?
-
It allows us to use the do syntactic sugar in order to tie statements
together into a chain that must be executed in order. - It takes care of implicitly passing around ParseContext objects.
Lest my reader get bleary eyed, let's see some code! I've updated the
makeCool.hs program to use the State monad, thus creating a true Rube
Goldberg (en.wikipedia.org/wiki/Rube_Goldberg#Rube_Goldberg_machines)
program:
import Control.Monad.State
data TokenType = Identifier | Qualifier | Type | Symbol Char
deriving (Show, Eq)
data Token = Token {
tokenType :: TokenType,
tokenValue :: String
} deriving Show
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String, -- The output generated so far.
currTok :: Token, -- The current token, if defined.
stack :: [Token] -- A stack of tokens we haven't dealt with yet.
} deriving Show
makeCool :: State ParseContext ()
makeCool = do
ParseContext {input = s} <- get
put (ParseContext {input = "", output = s ++ " is cool!\n"})
return ()
main = do
s <- getContents
let ctx = ParseContext {input = s, output = ""} in
putStrLn $ output $ execState makeCool $ ctx
Running it:
$ echo -n "Haskell" | runhugs -98 makeCoolState.hs Haskell is cool!
A few things deserve note. First, in order to use the State monad, you must
pass -98 to runhugs, which enables special Hugs extensions. Next, what
used
to be:
putStrLn $ output $ makeCool ctx
is now:
putStrLn $ output $ execState makeCool $ ctx
The execState function is used to get into and out of the “State monad
world”. Occasionally, it's easy to get confused by higher order functions
like execState; however, it's easy enough to apply a cookie-cutter approach.
That is, execState makeCool $ ctx takes
ctx as an initial state and returns
a final state. makeCool operates within
“the context” of the State monad.
Easy enough. If you really want to know the nitty-gritty details of how the
State monad works, there's a really fun explanation on Wikipedia
(en.wikibooks.org/wiki/Programming:Haskell_monads#The_State_monad).
Notice that makeCool uses the do syntax to tie multiple statements
together. First it uses the get function to get the current state. Then it
uses the put function to put back an updated state. Then it returns (),
which is to say nothing useful. Looking at get and put, at it's heart,
the makeCool function is still a ParseContext transformer.
Another thing deserves note. As usual, there are a few convenience functions
for interacting with the State monad. For instance, instead of using get
and put explicitly, it's also possible to use the modify function,
combining the two activities. Hence, the makeCool function could have been
written:
makeCool :: State ParseContext ()
makeCool = modify (\ctx ->
ctx {input = "", output = input ctx ++ " is cool!"})
The modify function takes a callback that takes the state as input and
returns state as output. Instead of passing a named function to modify, I'm
passing an anonymous function, hence the \ syntax. Translating:
"makeCool" is a "ParseContext transformer". Modify the "ctx" by setting "input" to "" and setting "output" to the original input plus the string " is cool!".
As in other languages, anonymous functions can be either ultra-convenient or
ultra-inscrutable, depending on both the author and the reader. Nonetheless,
I'll use the convenience functions where, uh, convenient.
Pulling It All Together
Okay, we're getting really close now--one more example. Let's transform
makeCoolState.hs into something slightly more useful and
“pretend” we're
reading tokens. We'll naively treat each character as a Symbol token, and
we'll simply read four tokens.
import Control.Monad.State
data TokenType = Identifier | Qualifier | Type | Symbol Char
deriving (Show, Eq)
data Token = Token {
tokenType :: TokenType,
tokenValue :: String
} deriving Show
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String, -- The output generated so far.
currTok :: Token, -- The current token, if defined.
stack :: [Token] -- A stack of tokens we haven't dealt with yet.
} deriving Show
-- Read one token from input and put it on the stack.
getToken :: State ParseContext ()
getToken = do
ctx@(ParseContext {input = c:input', stack = stack'}) <- get
let tok = Token {tokenType = Symbol c, tokenValue = [c]} in
put (ctx {input = input', currTok = tok, stack = tok:stack'})
-- This is a completely different way of writing getToken.
getTokenAnotherWay :: State ParseContext ()
getTokenAnotherWay = modify getTokenAnotherWay'
where
getTokenAnotherWay' ctx =
ctx {input = input', currTok = tok, stack = tok:stack'}
where
c = head $ input $ ctx
input' = tail $ input $ ctx
tok = Token {tokenType = Symbol c, tokenValue = [c]}
stack' = stack $ ctx
-- Call getToken a few times.
parse :: State ParseContext ()
parse = do
getToken -- I would not normally duplicate lines like this.
getToken -- I'm just trying to illustrate a point.
getTokenAnotherWay
getTokenAnotherWay
main = do
s <- getContents -- currTok is undefined below. It's defined later.
let ctx = ParseContext {input = s, output = "", stack = []} in
print $ execState parse $ ctx
Running it:
$ echo "Haskell" | runhugs -98 monadPipes.hs
ParseContext {
input = "ell\n",
output = "",
currTok = Token {tokenType = Symbol 'k', tokenValue = "k"},
stack = [
Token {tokenType = Symbol 'k', tokenValue = "k"},
Token {tokenType = Symbol 's', tokenValue = "s"},
Token {tokenType = Symbol 'a', tokenValue = "a"},
Token {tokenType = Symbol 'H', tokenValue = "H"}]}
At the risk of cheating, I've manually added newlines and indentation to the
output to make it more legible. Nonetheless, this output is really exciting!
It's taken the string Haskell and treated the first four letters as Symbol
tokens, placing each one on the stack.
This fascinating glimpse into the heart of the ParseContext was made possible
simply by making sure to add deriving Show in the right places and calling
print on the final ParseContext in the line:
print $ execState parse $ ctx
Notice a few other things. The do syntax is used to parse four tokens in a
row:
parse = do getToken -- I would not normally duplicate lines like this. getToken -- I'm just trying to illustrate a point. getTokenAnotherWay getTokenAnotherWay
In getToken, I used get and put, but in getTokenAnotherWay, I used
modify. In general, I've varied how the code was implemented in very minor
ways to show different ways of doing things. After all, even C has three
different constructs for a loop; you pick the most convenient tool for the job
at hand.
As a last note, to experience how incredibly lazy Haskell is, instead of using
echo to enter the value Haskell, try typing the input characters one at a
time, pressing Return after each character. Haskell will start outputting the
output before you are done typing the input. In fact, Haskell will produce as
much output as possible for the input you have provided. In a sense, because
the input is read lazily, the rest of the program can run
“proactively”.
Now, take a deep breath, and prepare for the plunge. Here's what you have at
your disposal:
- You have the C version of the code.
-
There's a fantastic “Tour of the Haskell Syntax” that you
can refer to at
www.cs.uu.nl/%7Eafie/haskell/tourofsyntax.html. -
If you installed Hugs, the Haskell Prelude contains most of the
simple functions that I make use of. I find it helpful to refer to the
source as it's meant to be self documenting. On my system, it's
installed at /usr/lib/hugs/libraries/Hugs/Prelude.hs. -
Similarly, the source for the State monad is
/usr/lib/hugs/packages/mtl/Control/Monad/State.hs. The formal
documentation is at haskell.org/hawiki/MonadState.
At long last, here's the code:
{- Translate C type declarations into English.
This exercise was taken from "Expert C Programming: Deep C Secrets", p. 84.
Example: echo -n "int *p;" | runhugs -98 cdecl.hs
-}
import Char
import Control.Monad.State
data TokenType = Identifier | Qualifier | Type | Symbol Char
deriving (Show, Eq)
data Token = Token {
tokenType :: TokenType,
tokenValue :: String
} deriving Show
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String, -- The output generated so far.
currTok :: Token, -- The current token, if defined.
stack :: [Token] -- A stack of tokens we haven't dealt with yet.
} deriving Show
-- Convenience functions:
currTokType = tokenType . currTok
currTokValue = tokenValue . currTok
stackHead = head . stack
-- "Write" to a ParseContext's output. Use ++ for simplicity.
writeOutput :: String -> State ParseContext ()
writeOutput s = modify (\ctx -> ctx {output = output ctx ++ s})
-- Pop the stack.
pop :: State ParseContext ()
pop = modify (\ctx -> ctx {stack = tail $ stack $ ctx})
-- Write the value of the top of the stack and then pop it.
popAndWrite :: State ParseContext ()
popAndWrite = do
top <- gets stackHead
writeOutput (tokenValue top)
pop
-- Classify a string into a Token.
classifyString :: String -> Token
classifyString "const" = Token Qualifier "read-only"
classifyString "*" = Token (Symbol '*') "pointer to"
classifyString [c]
| not (isAlphaNum c) = Token (Symbol c) [c]
classifyString s = Token tokType s
where
tokType = case s of
"volatile" -> Qualifier
x | x `elem` ["void", "char", "signed", "unsigned", "short",
"int", "long", "float", "double", "struct",
"union", "enum"] -> Type
x -> Identifier
-- Read the next token into currTok.
getToken :: State ParseContext ()
getToken = modify getToken'
where
getToken' ctx@(ParseContext {input = s}) =
ctx {currTok = token, input = theRest}
where
(token, theRest) = lexString $ lstrip $ s
lstrip s = dropWhile isSpace s
-- Read a token. Return it and the left-over portion of the string.
lexString :: String -> (Token, String)
lexString s@(c:cs) | isAlphaNum c = (token, theRest)
where
(tokString, theRest) = span isAlphaNum s
token = classifyString tokString
lexString (c:cs) = (classifyString [c], cs)
-- Put tokens on the stack until we reach the first identifier.
readToFirstIdentifier :: State ParseContext ()
readToFirstIdentifier = do
getToken
pushUntilIdentifier
afterIdentifier <- get
let s = identifier ++ " is "
identifier = currTokValue afterIdentifier in
put (afterIdentifier {output = s})
getToken
-- Keep pushing tokens until we hit an identifier.
pushUntilIdentifier :: State ParseContext ()
pushUntilIdentifier = do
ctx <- get
if currTokType ctx == Identifier
then return () -- Leave things as they are.
else do
put (ctx {stack = (currTok ctx) : (stack ctx)})
getToken
pushUntilIdentifier
return ()
-- Deal with arrays.
dealWithArrays :: State ParseContext ()
dealWithArrays = do
ctx <- get
case currTokType ctx of
Symbol '[' -> do
writeOutput "array "
getToken
writeIfNumber
getToken
writeOutput "of "
dealWithArrays
_ -> return () -- Recurse until we get past the ['s.
where
writeIfNumber = do -- Call writeSize if a number.
tokValue <- gets currTokValue
if isDigit $ head $ tokValue
then do
writeSize
getToken
else return ()
writeSize = do -- Output the array size.
tokValue <- gets currTokValue
let num = show $ (+ -1) $ read $ tokValue
s = "0.." ++ num ++ " " in -- Can't use where instead of let here.
writeOutput s
-- Deal with function arguments.
dealWithFunctionArgs :: State ParseContext ()
dealWithFunctionArgs = do
getUntilParen
getToken
writeOutput "function returning "
where
getUntilParen = do -- Read tokens until we hit ).
ctx <- get
case currTokType ctx of
Symbol ')' -> return ()
_ -> do
getToken
getUntilParen
-- Deal with pointers.
dealWithPointers :: State ParseContext ()
dealWithPointers = do
top <- gets stackHead
case tokenType top of
Symbol '*' -> do
popAndWrite
writeOutput " "
dealWithPointers
_ -> return () -- Recurse until we get past the *'s.
-- Process tokens that we stacked while reading to identifier.
dealWithStack :: State ParseContext ()
dealWithStack = do
stack' <- gets stack
case stack' of
[] -> return ()
(x:xs) ->
case tokenType x of
Symbol '(' -> do
pop
getToken
dealWithDeclarator
_ -> popAndWrite
-- Do all parsing after first identifier.
dealWithDeclarator :: State ParseContext ()
dealWithDeclarator = do
tokType <- gets currTokType
case tokType of
Symbol '[' -> dealWithArrays
Symbol '(' -> dealWithFunctionArgs
_ -> return () -- "Exit" the case, not the function.
dealWithPointers
dealWithStack
-- Do all parsing.
parse :: State ParseContext ()
parse = do
readToFirstIdentifier
dealWithDeclarator
-- Translate a C type declaration into English.
translate :: String -> String
translate s =
output $ execState parse $ ctx -- Change "output" to "show" to debug.
where ctx = ParseContext {input = s, output = "", stack = []}
main :: IO ()
main = do
input <- getContents
putStrLn $ translate $ input
Commentary
I really hate to spoil the tone of such a nice tutorial with opinionated
commentary, but the exceedingly careful reader may have noticed that the C
version is shorter than the Haskell version! In fact, by my count, there are
106 useful lines of code in the C version and 146 lines of useful code in the
Haskell version. What happened? This is the type of thing that can make a
budding article writer lose sleep at night!
Notice a few things. The C version is using static memory allocation. It
pre-allocates space for 100 tokens, each of which can be up to 64 characters
long. Modern C applications must usually manage memory dynamically, and it's a
painful, error-prone chore. Nor does the example do any error handling.
Haskell's exception system went completely unused. Last of all, the example
didn't make use of any abstractions. Haskell's high-level, functional nature
means that libraries, in general, can be higher level in Haskell than in C, but
this is meaningless if you don't even use a parsing library.
Note, however, that the Haskell version already has some benefits over the C
version. For instance, being able to print a ParseContext for free was
really nice. Theoretically, because all parsing state is contained in a
ParseContext, you can pass off the ParseContext to another machine and let
it continue the computation. If, during parsing, for some reason it was
necessary to undo what you had done and go back to some previous parsing
state, it's absolutely possible to throw away your current ParseContext and
continue with some previous ParseContext.
Consider the task of removing a bolt. You have your choice of a crescent
wrench (say, C), a socket set (say, Java) or an impact wrench (say, Haskell).
An impact wrench is incredibly powerful and can make difficult tasks easy.
However, what we've seen is that sometimes it takes longer to retrieve the
compressor from the garage in order to use an impact wrench than it takes to
simply get the job done with a crescent wrench.
If Haskell is so powerful, why isn't it more popular? I think it's because the
authors and many of the users are just too dang intelligent! I've heard Guido
van Rossum, the author of Python, say something like, “ML is a great
language--it's just too bad you have to have an IQ higher than 150 to use
it.”
Consider that in Paul Hudak's book, The Haskell School of
Expression, Hudak
drags the user through writing code to calculate the area of a
concave
polygon [exercise 2.5 on page 33] before he even shows the user the canonical
“Hello world” example [page 37], and I still haven't gotten to the part in the
book that explains how to use Hugs. [Note:
I'm being facetious. I don't think it's covered in the book at all.
Hugs is an implementation of Haskell, and the book is about the language
itself, not any particular implementation. However, my point stands that
unless you actually get an interpreter or compiler running so that you can
actually write some code, you can't truly claim to know a language. I spent
hours trying to figure out why the Hugs shell wouldn't let me define a
function before I found out that you have to store them in a file and load them
as a library.]
Furthermore, as much as I love Haskell's type system, I think there's a real
misconception. Hudak writes:
The best news is that Haskell's type system will tell you if your program
is well-typed before you run it. This is a big advantage because most
programming errors are manifested as typing errors [p. 8].
I don't agree. Although I would agree that most bugs are caused by programmers
typing the wrong code, I would not agree that most bugs are manifested as
typing errors! In fact, I find that most of my bugs are caused by subtly
misunderstanding the fine details of a difficult problem, and I've heard other
well-respected programmers say the same thing. Personally, being a fan of duck
typing, I find that most of the time, types are almost a nuisance!
[Note: Lest Java programmers send me hate mail, this is a topic covered
repeatedly by Bruce Eckel, author of Thinking in
Java, on his blog
(mindview.net/WebLog/ArticleIndex). For instance,
this post: mindview.net/WebLog/log-0025. If you consider tools like
QuickCheck (www.cs.chalmers.se/~rjmh/QuickCheck), which can use
Haskell's type system to generate test cases automatically, you'll see
that a whole article could be written debating whether Bruce Eckel's ideas
apply to Haskell. Nonetheless, I think it's fair to say that Haskell's type
system gives you “more bang for the buck” than Java's.]
I spent a lot of time wondering why Haskell programmers seemed to be so
successful. I've come to the following conclusion:
It's not that good programmers are necessarily drawn to Haskell--as Google
has shown, there are a lot of extremely talented programmers who are content
to code in Java. Rather, it's that you have to be a smart, patient,
disciplined and dedicated programmer to code in Haskell at all.
The truth of the matter is you can be a bad programmer and make a living
coding in Java. The same cannot be said of Haskell.
In summary: I think that it's fair to say that Haskell has some flaws (for
instance, you have to be smart to use it), and it might not ever be as popular
as Java, PHP or Ruby. However, languages are like friends; you take the good
with the bad and enjoy them as a whole (with the exception of C++; almost no
one uses all of C++).
Haskell is like a brilliant kernel programmer I once worked with. Asking him
anything outside his areas of expertise was like trying to dereference a random
location in memory. I use to joke that his user interface was about as
friendly as a Commodore 64--“syntax error”. Nonetheless, you just couldn't
beat him if you needed a new device driver for some random piece of hardware.
Likewise, if I had to reimplement Pascal or write Perl 6, Haskell would be my
first choice--although I'd definitely use a powerful parsing library. In fact,
I'm pleased to say that Haskell is a welcome addition to my programmer's
toolbox!
Credits
I'd like to thank the readers of the haskell-cafe mailing list
www.haskell.org/mailman/listinfo/haskell-cafe) for taking the time to
read and critique my code. I received comments from Benjamin Franksen, Bill
Wood, Brian Hulley, Bulat Ziganshin, Jean-Philippe Bernardy, Malcolm Wallace,
Pete Chown, Scott Turner, Tomasz Zielonka and Udo Stenzel. I received
extensive feedback from Daniel Fischer and Neil Mitchell. I'd also like to
thank my friends for their technical reviews: Anuranjan Shukla, Brandon L. Golm
(who turns my Engrish into English), Brennan Evans, Christopher Golden, Krishna
Srinivasan and Sam Rushing. In particular, Jared Updike was a great source of
help throughout the entire process.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
Good static typing does make a gidfference.
As background: my first five years of full-time programming were Java, the next five years after that, Ruby, and over the last year or so I've moved to nearly-full-time Haskell.
Types in Haskell do often feel like a nuisance when you come from a Ruby background, especially when you're doing quick-and-dirty hacks where you're not so concerned about correctness because you're going to throw out the code after a few uses.
However, these days I just as often find the lack of types in Ruby to be more of a nuisance because I have to work fairly hard to keep silly bugs from creeping into my programs (oops, that was supposed to be an array of arrays, not an array) whereas in Haskell the type checker deals with all of this for me. In Haskell I find myself using far fewer unit tests and doing both small and large refactorings with more confidence. I also find I no longer need to use error checking code as I did on occasion in Ruby, where I'd look at a situation and say to myself, "if the wrong class of object gets out of here, it's going to appear half-way across the program and I'll spend ages debugging it."
Not to mention which, when you really start using monads and combinators, you'll find them extremely powerful.
cjs@cynic.net
it's unfair to blindly translate algorithms
Brandon Moore has sent me an alternate version of the code that is both simple and yet still shorter than the C version. His findings suggest that an algorithm in C may not translate easily or naturally into Haskell; which, of course, is not surprising. I purposely made the Haskell program as close to the C program as possible to aid the reader in understanding it, but perhaps this was an unfair handicap to Haskell.
Alternate solution
I wrote some alternate code which makes a simple parser monad, rather than the simple state monad used in the article. It's like a recursive descent parser, but with all the recursion wrapped up inside the monad. This doesn't quite handle quantifiers like const and static at the beginning of the declaration correctly, but the C and Haskell code are already both wrong in different ways, and the full syntax is really complicated so I don't care. I count 62 lines of code.
Many thanks for a very clear
Many thanks for a very clear tutorial. I'm some way from grokking monads but this should help...As you say, introductory articles on monads are usually a bit deep (The subtitle of the Programmers Guide to Monads, 'Don't Panic', sounds promising but 2 or 3 pages later sure enough I was panicking!) so congratulations on not mentioning Category Theory :-)
Your link to the syntax tour is broken. I suppose it's meant to be http://www.cs.uu.nl/~afie/haskell/tourofsyntax.html but that gives a 404 - another place is http://cs.anu.edu.au/Student/comp1100/haskell/tourofsyntax.html but I don't know if this is it's home.
thanks
Thanks for your comment as well as mentioning the broken link.
Post new comment