Translate Haskell into English Manually, Part II
Okay, we're getting really close now--one more example. Let's transform makeCoolState.hs into something slightly more useful and “pretend” we're reading tokens. We'll naively treat each character as a Symbol token, and we'll simply read four tokens.
import Control.Monad.State
data TokenType = Identifier | Qualifier | Type | Symbol Char
deriving (Show, Eq)
data Token = Token {
tokenType :: TokenType,
tokenValue :: String
} deriving Show
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String, -- The output generated so far.
currTok :: Token, -- The current token, if defined.
stack :: [Token] -- A stack of tokens we haven't dealt with yet.
} deriving Show
-- Read one token from input and put it on the stack.
getToken :: State ParseContext ()
getToken = do
ctx@(ParseContext {input = c:input', stack = stack'}) <- get
let tok = Token {tokenType = Symbol c, tokenValue = [c]} in
put (ctx {input = input', currTok = tok, stack = tok:stack'})
-- This is a completely different way of writing getToken.
getTokenAnotherWay :: State ParseContext ()
getTokenAnotherWay = modify getTokenAnotherWay'
where
getTokenAnotherWay' ctx =
ctx {input = input', currTok = tok, stack = tok:stack'}
where
c = head $ input $ ctx
input' = tail $ input $ ctx
tok = Token {tokenType = Symbol c, tokenValue = [c]}
stack' = stack $ ctx
-- Call getToken a few times.
parse :: State ParseContext ()
parse = do
getToken -- I would not normally duplicate lines like this.
getToken -- I'm just trying to illustrate a point.
getTokenAnotherWay
getTokenAnotherWay
main = do
s <- getContents -- currTok is undefined below. It's defined later.
let ctx = ParseContext {input = s, output = "", stack = []} in
print $ execState parse $ ctx
Running it:
$ echo "Haskell" | runhugs -98 monadPipes.hs
ParseContext {
input = "ell\n",
output = "",
currTok = Token {tokenType = Symbol 'k', tokenValue = "k"},
stack = [
Token {tokenType = Symbol 'k', tokenValue = "k"},
Token {tokenType = Symbol 's', tokenValue = "s"},
Token {tokenType = Symbol 'a', tokenValue = "a"},
Token {tokenType = Symbol 'H', tokenValue = "H"}]}
At the risk of cheating, I've manually added newlines and indentation to the output to make it more legible. Nonetheless, this output is really exciting! It's taken the string Haskell and treated the first four letters as Symbol tokens, placing each one on the stack.
This fascinating glimpse into the heart of the ParseContext was made possible simply by making sure to add deriving Show in the right places and calling print on the final ParseContext in the line:
print $ execState parse $ ctx
Notice a few other things. The do syntax is used to parse four tokens in a row:
parse = do getToken -- I would not normally duplicate lines like this. getToken -- I'm just trying to illustrate a point. getTokenAnotherWay getTokenAnotherWay
In getToken, I used get and put, but in getTokenAnotherWay, I used modify. In general, I've varied how the code was implemented in very minor ways to show different ways of doing things. After all, even C has three different constructs for a loop; you pick the most convenient tool for the job at hand.
As a last note, to experience how incredibly lazy Haskell is, instead of using echo to enter the value Haskell, try typing the input characters one at a time, pressing Return after each character. Haskell will start outputting the output before you are done typing the input. In fact, Haskell will produce as much output as possible for the input you have provided. In a sense, because the input is read lazily, the rest of the program can run “proactively”.
Now, take a deep breath, and prepare for the plunge. Here's what you have at your disposal:
You have the C version of the code.
There's a fantastic “Tour of the Haskell Syntax” that you can refer to at www.cs.uu.nl/%7Eafie/haskell/tourofsyntax.html.
If you installed Hugs, the Haskell Prelude contains most of the simple functions that I make use of. I find it helpful to refer to the source as it's meant to be self documenting. On my system, it's installed at /usr/lib/hugs/libraries/Hugs/Prelude.hs.
Similarly, the source for the State monad is /usr/lib/hugs/packages/mtl/Control/Monad/State.hs. The formal documentation is at haskell.org/hawiki/MonadState.
At long last, here's the code:
{- Translate C type declarations into English.
This exercise was taken from "Expert C Programming: Deep C Secrets", p. 84.
Example: echo -n "int *p;" | runhugs -98 cdecl.hs
-}
import Char
import Control.Monad.State
data TokenType = Identifier | Qualifier | Type | Symbol Char
deriving (Show, Eq)
data Token = Token {
tokenType :: TokenType,
tokenValue :: String
} deriving Show
data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String, -- The output generated so far.
currTok :: Token, -- The current token, if defined.
stack :: [Token] -- A stack of tokens we haven't dealt with yet.
} deriving Show
-- Convenience functions:
currTokType = tokenType . currTok
currTokValue = tokenValue . currTok
stackHead = head . stack
-- "Write" to a ParseContext's output. Use ++ for simplicity.
writeOutput :: String -> State ParseContext ()
writeOutput s = modify (\ctx -> ctx {output = output ctx ++ s})
-- Pop the stack.
pop :: State ParseContext ()
pop = modify (\ctx -> ctx {stack = tail $ stack $ ctx})
-- Write the value of the top of the stack and then pop it.
popAndWrite :: State ParseContext ()
popAndWrite = do
top <- gets stackHead
writeOutput (tokenValue top)
pop
-- Classify a string into a Token.
classifyString :: String -> Token
classifyString "const" = Token Qualifier "read-only"
classifyString "*" = Token (Symbol '*') "pointer to"
classifyString [c]
| not (isAlphaNum c) = Token (Symbol c) [c]
classifyString s = Token tokType s
where
tokType = case s of
"volatile" -> Qualifier
x | x `elem` ["void", "char", "signed", "unsigned", "short",
"int", "long", "float", "double", "struct",
"union", "enum"] -> Type
x -> Identifier
-- Read the next token into currTok.
getToken :: State ParseContext ()
getToken = modify getToken'
where
getToken' ctx@(ParseContext {input = s}) =
ctx {currTok = token, input = theRest}
where
(token, theRest) = lexString $ lstrip $ s
lstrip s = dropWhile isSpace s
-- Read a token. Return it and the left-over portion of the string.
lexString :: String -> (Token, String)
lexString s@(c:cs) | isAlphaNum c = (token, theRest)
where
(tokString, theRest) = span isAlphaNum s
token = classifyString tokString
lexString (c:cs) = (classifyString [c], cs)
-- Put tokens on the stack until we reach the first identifier.
readToFirstIdentifier :: State ParseContext ()
readToFirstIdentifier = do
getToken
pushUntilIdentifier
afterIdentifier <- get
let s = identifier ++ " is "
identifier = currTokValue afterIdentifier in
put (afterIdentifier {output = s})
getToken
-- Keep pushing tokens until we hit an identifier.
pushUntilIdentifier :: State ParseContext ()
pushUntilIdentifier = do
ctx <- get
if currTokType ctx == Identifier
then return () -- Leave things as they are.
else do
put (ctx {stack = (currTok ctx) : (stack ctx)})
getToken
pushUntilIdentifier
return ()
-- Deal with arrays.
dealWithArrays :: State ParseContext ()
dealWithArrays = do
ctx <- get
case currTokType ctx of
Symbol '[' -> do
writeOutput "array "
getToken
writeIfNumber
getToken
writeOutput "of "
dealWithArrays
_ -> return () -- Recurse until we get past the ['s.
where
writeIfNumber = do -- Call writeSize if a number.
tokValue <- gets currTokValue
if isDigit $ head $ tokValue
then do
writeSize
getToken
else return ()
writeSize = do -- Output the array size.
tokValue <- gets currTokValue
let num = show $ (+ -1) $ read $ tokValue
s = "0.." ++ num ++ " " in -- Can't use where instead of let here.
writeOutput s
-- Deal with function arguments.
dealWithFunctionArgs :: State ParseContext ()
dealWithFunctionArgs = do
getUntilParen
getToken
writeOutput "function returning "
where
getUntilParen = do -- Read tokens until we hit ).
ctx <- get
case currTokType ctx of
Symbol ')' -> return ()
_ -> do
getToken
getUntilParen
-- Deal with pointers.
dealWithPointers :: State ParseContext ()
dealWithPointers = do
top <- gets stackHead
case tokenType top of
Symbol '*' -> do
popAndWrite
writeOutput " "
dealWithPointers
_ -> return () -- Recurse until we get past the *'s.
-- Process tokens that we stacked while reading to identifier.
dealWithStack :: State ParseContext ()
dealWithStack = do
stack' <- gets stack
case stack' of
[] -> return ()
(x:xs) ->
case tokenType x of
Symbol '(' -> do
pop
getToken
dealWithDeclarator
_ -> popAndWrite
-- Do all parsing after first identifier.
dealWithDeclarator :: State ParseContext ()
dealWithDeclarator = do
tokType <- gets currTokType
case tokType of
Symbol '[' -> dealWithArrays
Symbol '(' -> dealWithFunctionArgs
_ -> return () -- "Exit" the case, not the function.
dealWithPointers
dealWithStack
-- Do all parsing.
parse :: State ParseContext ()
parse = do
readToFirstIdentifier
dealWithDeclarator
-- Translate a C type declaration into English.
translate :: String -> String
translate s =
output $ execState parse $ ctx -- Change "output" to "show" to debug.
where ctx = ParseContext {input = s, output = "", stack = []}
main :: IO ()
main = do
input <- getContents
putStrLn $ translate $ input
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Introduction to Named Pipes
- Red Hat's Dell Deal, Waiting for Cowpland
- SSHFS: Super Easy File Access over SSH
- How to Be Cute on All Desktops with Qt
- How's The Weather?
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Writing a Linux Driver
- Still Searching
- Linux System Administration: A User's Guide
- The Neuros MP3 Digital Audio Computer
- This is the easiest tutorial
56 min 37 sec ago - Ahh, the Koolaid.
6 hours 35 min ago - git-annex assistant
12 hours 34 min ago - direct cable connection
12 hours 57 min ago - Agreed on AirDroid. With my
13 hours 7 min ago - I just learned this
13 hours 11 min ago - enterprise
13 hours 41 min ago - not living upto the mobile revolution
16 hours 33 min ago - Deceptive Advertising and
17 hours 8 min ago - Let\'s declare that you have
17 hours 9 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
Good static typing does make a gidfference.
As background: my first five years of full-time programming were Java, the next five years after that, Ruby, and over the last year or so I've moved to nearly-full-time Haskell.
Types in Haskell do often feel like a nuisance when you come from a Ruby background, especially when you're doing quick-and-dirty hacks where you're not so concerned about correctness because you're going to throw out the code after a few uses.
However, these days I just as often find the lack of types in Ruby to be more of a nuisance because I have to work fairly hard to keep silly bugs from creeping into my programs (oops, that was supposed to be an array of arrays, not an array) whereas in Haskell the type checker deals with all of this for me. In Haskell I find myself using far fewer unit tests and doing both small and large refactorings with more confidence. I also find I no longer need to use error checking code as I did on occasion in Ruby, where I'd look at a situation and say to myself, "if the wrong class of object gets out of here, it's going to appear half-way across the program and I'll spend ages debugging it."
Not to mention which, when you really start using monads and combinators, you'll find them extremely powerful.
cjs@cynic.net
it's unfair to blindly translate algorithms
Brandon Moore has sent me an alternate version of the code that is both simple and yet still shorter than the C version. His findings suggest that an algorithm in C may not translate easily or naturally into Haskell; which, of course, is not surprising. I purposely made the Haskell program as close to the C program as possible to aid the reader in understanding it, but perhaps this was an unfair handicap to Haskell.
Alternate solution
I wrote some alternate code which makes a simple parser monad, rather than the simple state monad used in the article. It's like a recursive descent parser, but with all the recursion wrapped up inside the monad. This doesn't quite handle quantifiers like const and static at the beginning of the declaration correctly, but the C and Haskell code are already both wrong in different ways, and the full syntax is really complicated so I don't care. I count 62 lines of code.
Many thanks for a very clear
Many thanks for a very clear tutorial. I'm some way from grokking monads but this should help...As you say, introductory articles on monads are usually a bit deep (The subtitle of the Programmers Guide to Monads, 'Don't Panic', sounds promising but 2 or 3 pages later sure enough I was panicking!) so congratulations on not mentioning Category Theory :-)
Your link to the syntax tour is broken. I suppose it's meant to be http://www.cs.uu.nl/~afie/haskell/tourofsyntax.html but that gives a 404 - another place is http://cs.anu.edu.au/Student/comp1100/haskell/tourofsyntax.html but I don't know if this is it's home.
thanks
Thanks for your comment as well as mentioning the broken link.