Translate Haskell into English Manually, Part II

Write a program in Haskell that translates C type declarations into English.
Pulling It All Together

Okay, we're getting really close now--one more example. Let's transform makeCoolState.hs into something slightly more useful and “pretend” we're reading tokens. We'll naively treat each character as a Symbol token, and we'll simply read four tokens.


import Control.Monad.State

data TokenType = Identifier | Qualifier | Type | Symbol Char
  deriving (Show, Eq)

data Token = Token {
  tokenType :: TokenType,
  tokenValue :: String
} deriving Show

data ParseContext = ParseContext {
  input :: String,    -- The input that has not been parsed yet.
  output :: String,   -- The output generated so far.
  currTok :: Token,   -- The current token, if defined.
  stack :: [Token]    -- A stack of tokens we haven't dealt with yet.
} deriving Show

-- Read one token from input and put it on the stack.
getToken :: State ParseContext ()
getToken = do
  ctx@(ParseContext {input = c:input', stack = stack'}) <- get
  let tok = Token {tokenType = Symbol c, tokenValue = [c]} in
    put (ctx {input = input', currTok = tok, stack = tok:stack'})

-- This is a completely different way of writing getToken.
getTokenAnotherWay :: State ParseContext ()
getTokenAnotherWay = modify getTokenAnotherWay'
  where
    getTokenAnotherWay' ctx =
      ctx {input = input', currTok = tok, stack = tok:stack'}
      where
        c = head $ input $ ctx
        input' = tail $ input $ ctx
        tok = Token {tokenType = Symbol c, tokenValue = [c]}
        stack' = stack $ ctx

-- Call getToken a few times.
parse :: State ParseContext ()
parse = do
  getToken            -- I would not normally duplicate lines like this.
  getToken            -- I'm just trying to illustrate a point.
  getTokenAnotherWay
  getTokenAnotherWay

main = do
  s <- getContents    -- currTok is undefined below.  It's defined later.
  let ctx = ParseContext {input = s, output = "", stack = []} in
    print $ execState parse $ ctx

Running it:

$ echo "Haskell" | runhugs -98 monadPipes.hs
ParseContext {
  input = "ell\n",
  output = "",
  currTok = Token {tokenType = Symbol 'k', tokenValue = "k"},
  stack = [
    Token {tokenType = Symbol 'k', tokenValue = "k"},
    Token {tokenType = Symbol 's', tokenValue = "s"},
    Token {tokenType = Symbol 'a', tokenValue = "a"},
    Token {tokenType = Symbol 'H', tokenValue = "H"}]}

At the risk of cheating, I've manually added newlines and indentation to the output to make it more legible. Nonetheless, this output is really exciting! It's taken the string Haskell and treated the first four letters as Symbol tokens, placing each one on the stack.

This fascinating glimpse into the heart of the ParseContext was made possible simply by making sure to add deriving Show in the right places and calling print on the final ParseContext in the line:

print $ execState parse $ ctx

Notice a few other things. The do syntax is used to parse four tokens in a row:

parse = do
  getToken            -- I would not normally duplicate lines like this.
  getToken            -- I'm just trying to illustrate a point.
  getTokenAnotherWay
  getTokenAnotherWay

In getToken, I used get and put, but in getTokenAnotherWay, I used modify. In general, I've varied how the code was implemented in very minor ways to show different ways of doing things. After all, even C has three different constructs for a loop; you pick the most convenient tool for the job at hand.

As a last note, to experience how incredibly lazy Haskell is, instead of using echo to enter the value Haskell, try typing the input characters one at a time, pressing Return after each character. Haskell will start outputting the output before you are done typing the input. In fact, Haskell will produce as much output as possible for the input you have provided. In a sense, because the input is read lazily, the rest of the program can run “proactively”.

Now, take a deep breath, and prepare for the plunge. Here's what you have at your disposal:

  1. You have the C version of the code.

  2. There's a fantastic “Tour of the Haskell Syntax” that you can refer to at www.cs.uu.nl/%7Eafie/haskell/tourofsyntax.html.

  3. If you installed Hugs, the Haskell Prelude contains most of the simple functions that I make use of. I find it helpful to refer to the source as it's meant to be self documenting. On my system, it's installed at /usr/lib/hugs/libraries/Hugs/Prelude.hs.

  4. Similarly, the source for the State monad is /usr/lib/hugs/packages/mtl/Control/Monad/State.hs. The formal documentation is at haskell.org/hawiki/MonadState.

At long last, here's the code:


{- Translate C type declarations into English.
  This exercise was taken from "Expert C Programming:  Deep C Secrets", p. 84.
   Example: echo -n "int *p;" | runhugs -98 cdecl.hs
-}

import Char
import Control.Monad.State

data TokenType = Identifier | Qualifier | Type | Symbol Char
  deriving (Show, Eq)

data Token = Token {
  tokenType :: TokenType,
  tokenValue :: String
} deriving Show

data ParseContext = ParseContext {
  input :: String,    -- The input that has not been parsed yet.
  output :: String,   -- The output generated so far.
  currTok :: Token,   -- The current token, if defined.
  stack :: [Token]    -- A stack of tokens we haven't dealt with yet.
} deriving Show

-- Convenience functions:
currTokType = tokenType . currTok
currTokValue = tokenValue . currTok
stackHead = head . stack

-- "Write" to a ParseContext's output.  Use ++ for simplicity.
writeOutput :: String -> State ParseContext ()
writeOutput s = modify (\ctx -> ctx {output = output ctx ++ s})

-- Pop the stack.
pop :: State ParseContext ()
pop = modify (\ctx -> ctx {stack = tail $ stack $ ctx})

-- Write the value of the top of the stack and then pop it.
popAndWrite :: State ParseContext ()
popAndWrite = do
  top <- gets stackHead
  writeOutput (tokenValue top)
  pop

-- Classify a string into a Token.
classifyString :: String -> Token
classifyString "const"  = Token Qualifier "read-only"
classifyString "*"      = Token (Symbol '*') "pointer to"
classifyString [c]
  | not (isAlphaNum c)  = Token (Symbol c) [c]
classifyString s        = Token tokType s
  where
    tokType = case s of
      "volatile" -> Qualifier
      x | x `elem` ["void", "char", "signed", "unsigned", "short",
                    "int", "long", "float", "double", "struct",
                    "union", "enum"] -> Type
      x -> Identifier

-- Read the next token into currTok.
getToken :: State ParseContext ()
getToken = modify getToken'
  where
    getToken' ctx@(ParseContext {input = s}) =
      ctx {currTok = token, input = theRest}
      where
        (token, theRest) = lexString $ lstrip $ s
        lstrip s = dropWhile isSpace s

-- Read a token.  Return it and the left-over portion of the string.
lexString :: String -> (Token, String)
lexString s@(c:cs) | isAlphaNum c = (token, theRest)
  where
    (tokString, theRest) = span isAlphaNum s
    token = classifyString tokString
lexString (c:cs) = (classifyString [c], cs)

-- Put tokens on the stack until we reach the first identifier.
readToFirstIdentifier :: State ParseContext ()
readToFirstIdentifier = do
  getToken
  pushUntilIdentifier
  afterIdentifier <- get
  let s = identifier ++ " is "
      identifier = currTokValue afterIdentifier in
    put (afterIdentifier {output = s})
  getToken

-- Keep pushing tokens until we hit an identifier.
pushUntilIdentifier :: State ParseContext ()
pushUntilIdentifier = do
  ctx <- get
  if currTokType ctx == Identifier
    then return ()                      -- Leave things as they are.
    else do
      put (ctx {stack = (currTok ctx) : (stack ctx)})
      getToken
      pushUntilIdentifier
      return ()

-- Deal with arrays.
dealWithArrays :: State ParseContext ()
dealWithArrays = do
  ctx <- get
  case currTokType ctx of
    Symbol '[' -> do
      writeOutput "array "
      getToken
      writeIfNumber
      getToken
      writeOutput "of "
      dealWithArrays
    _ -> return ()                      -- Recurse until we get past the ['s.
  where
    writeIfNumber = do                  -- Call writeSize if a number.
      tokValue <- gets currTokValue
      if isDigit $ head $ tokValue
        then do
          writeSize
          getToken
        else return ()
    writeSize = do                      -- Output the array size.
      tokValue <- gets currTokValue
      let num = show $ (+ -1) $ read $ tokValue
          s = "0.." ++ num ++ " " in    -- Can't use where instead of let here.
        writeOutput s

-- Deal with function arguments.
dealWithFunctionArgs :: State ParseContext ()
dealWithFunctionArgs = do
  getUntilParen
  getToken
  writeOutput "function returning "
  where
    getUntilParen = do                  -- Read tokens until we hit ).
      ctx <- get
      case currTokType ctx of
        Symbol ')' -> return ()
        _ -> do
          getToken
          getUntilParen

-- Deal with pointers.
dealWithPointers :: State ParseContext ()
dealWithPointers = do
  top <- gets stackHead
  case tokenType top of
    Symbol '*' -> do
      popAndWrite
      writeOutput " "
      dealWithPointers
    _ -> return ()                      -- Recurse until we get past the *'s.

-- Process tokens that we stacked while reading to identifier.
dealWithStack :: State ParseContext ()
dealWithStack = do
  stack' <- gets stack
  case stack' of
    [] -> return ()
    (x:xs) ->
      case tokenType x of
        Symbol '(' -> do
          pop
          getToken
          dealWithDeclarator
        _ -> popAndWrite

-- Do all parsing after first identifier.
dealWithDeclarator :: State ParseContext ()
dealWithDeclarator = do
  tokType <- gets currTokType
  case tokType of
    Symbol '[' -> dealWithArrays
    Symbol '(' -> dealWithFunctionArgs
    _ -> return ()                      -- "Exit" the case, not the function.
  dealWithPointers
  dealWithStack

-- Do all parsing.
parse :: State ParseContext ()
parse = do
  readToFirstIdentifier
  dealWithDeclarator

-- Translate a C type declaration into English.
translate :: String -> String
translate s =
  output $ execState parse $ ctx        -- Change "output" to "show" to debug.
  where ctx = ParseContext {input = s, output = "", stack = []}

main :: IO ()
main = do
  input <- getContents
  putStrLn $ translate $ input

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Good static typing does make a gidfference.

Curt Sampson's picture

As background: my first five years of full-time programming were Java, the next five years after that, Ruby, and over the last year or so I've moved to nearly-full-time Haskell.

Types in Haskell do often feel like a nuisance when you come from a Ruby background, especially when you're doing quick-and-dirty hacks where you're not so concerned about correctness because you're going to throw out the code after a few uses.

However, these days I just as often find the lack of types in Ruby to be more of a nuisance because I have to work fairly hard to keep silly bugs from creeping into my programs (oops, that was supposed to be an array of arrays, not an array) whereas in Haskell the type checker deals with all of this for me. In Haskell I find myself using far fewer unit tests and doing both small and large refactorings with more confidence. I also find I no longer need to use error checking code as I did on occasion in Ruby, where I'd look at a situation and say to myself, "if the wrong class of object gets out of here, it's going to appear half-way across the program and I'll spend ages debugging it."

Not to mention which, when you really start using monads and combinators, you'll find them extremely powerful.

cjs@cynic.net

it's unfair to blindly translate algorithms

Shannon -jj Behrens's picture

Brandon Moore has sent me an alternate version of the code that is both simple and yet still shorter than the C version. His findings suggest that an algorithm in C may not translate easily or naturally into Haskell; which, of course, is not surprising. I purposely made the Haskell program as close to the C program as possible to aid the reader in understanding it, but perhaps this was an unfair handicap to Haskell.

Alternate solution

Brandon Moore's picture

I wrote some alternate code which makes a simple parser monad, rather than the simple state monad used in the article. It's like a recursive descent parser, but with all the recursion wrapped up inside the monad. This doesn't quite handle quantifiers like const and static at the beginning of the declaration correctly, but the C and Haskell code are already both wrong in different ways, and the full syntax is really complicated so I don't care. I count 62 lines of code.

Many thanks for a very clear

Jim B's picture

Many thanks for a very clear tutorial. I'm some way from grokking monads but this should help...As you say, introductory articles on monads are usually a bit deep (The subtitle of the Programmers Guide to Monads, 'Don't Panic', sounds promising but 2 or 3 pages later sure enough I was panicking!) so congratulations on not mentioning Category Theory :-)

Your link to the syntax tour is broken. I suppose it's meant to be http://www.cs.uu.nl/~afie/haskell/tourofsyntax.html but that gives a 404 - another place is http://cs.anu.edu.au/Student/comp1100/haskell/tourofsyntax.html but I don't know if this is it's home.

thanks

Shannon -jj Behrens's picture

Thanks for your comment as well as mentioning the broken link.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState