Translate Haskell into English Manually

Write a program in Haskell that translates C type declarations into English.

Have you ever tried to learn Haskell (haskell.org) and hit a brick wall? Have you tried to read the main tutorial, "A Gentle Introduction to Haskell" (www.haskell.org/tutorial), and found it to be about as gentle as a Pan Galactic Gargle Blaster (en.wikipedia.org/wiki/Pan_Galactic_Gargle_Blaster)? Did you have to learn about monads before you could even write your first non-trivial Haskell program? Have you noticed that unless you already know Haskell, it's even less readable than Shakespeare? Have you searched for an example of a nontrivial Haskell program only to find you can't understand it?

In Expert C Programming: Deep C Secrets (which, by the way, is a must read for anyone who ever uses the letter C), Peter van der Linden teaches the reader how to write a small program that translates C-type declarations into English. It's a great little exercise. It's beautiful because it works without relying on powerful tools like Lex and Yacc, yet it's still small enough to be written in a couple hours. I've read stories about people writing Pascal compilers in Haskell, and there's even a version of Perl 6 written in Haskell (www.pugscode.org).

In my last article (www.linuxjournal.com/article/8850), I gave readers an introduction to monads and why they should care about them as well as Haskell in general. (Note: several readers of my last article wondered why I was interested in Haskell in contrast to other excellent languages such as OCaml, Dylan or Common Lisp. Currently, I'm fascinated by Haskell because it is a purely functional programming language, and because it has lazy evaluation. As strange as it might sound, I'm fascinated by Haskell because it's harder for me to wrap my head around than those other languages.) If you still don't believe that Haskell is worthy of attention, read what Tim Sweeney, the founder of Epic, the video game company that created Unreal, said about it at beust.com/weblog/archives/000375.html. Sweeney feels that Haskell may be the gateway to "The Next Mainstream Programming Language". Because video game developers are often on the cutting edge of our field simply out of necessity, his comments are noteworthy.

In this two-part series, starting with Peter van der Linden's exercise, my hope is to give readers a glimpse of the Zen of Haskell, without requiring that they already be Haskell converts. In this first part, I start by reviewing the C version, and I explain why translating imperative code into Haskell isn't an easy thing to do. I finish by explaining the concept of a parse pipeline.

The C Version

Let's start by reviewing the C version:


/* Translate C type declarations into English.

   This code was taken from "Expert C Programming: Deep C Secrets", p. 84.
   [sic: weird whitespace inconsistencies]

   Compiling: gcc cdecl.c -o cdecl
   Usage: echo -n "int *p;" | ./cdecl */

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#define MAXTOKENS 100
#define MAXTOKENLEN 64

enum type_tag { IDENTIFIER, QUALIFIER, TYPE };

struct token {
    char type;
    char string[MAXTOKENLEN];
};

int top=-1;
struct token stack[MAXTOKENS];
struct token this;

#define pop stack[top--]
#define push(s) stack[++top]=s

enum type_tag classify_string(void)
/* figure out the identifier type */
{
    char *s = this.string;
    if (!strcmp(s,"const")) {
	strcpy(s,"read-only");
	return QUALIFIER;
    }
    if (!strcmp(s,"volatile")) return QUALIFIER;
    if (!strcmp(s,"void")) return TYPE;
    if (!strcmp(s,"char")) return TYPE;
    if (!strcmp(s,"signed")) return TYPE;
    if (!strcmp(s,"unsigned")) return TYPE;
    if (!strcmp(s,"short")) return TYPE;
    if (!strcmp(s,"int")) return TYPE;
    if (!strcmp(s,"long")) return TYPE;
    if (!strcmp(s,"float")) return TYPE;
    if (!strcmp(s,"double")) return TYPE;
    if (!strcmp(s,"struct")) return TYPE;
    if (!strcmp(s,"union")) return TYPE;
    if (!strcmp(s,"enum")) return TYPE;
    return IDENTIFIER;
}

void gettoken(void) /* read next token into "this" */
{
    char *p = this.string;

    /* read past any spaces */
    while ((*p = getchar()) == ' ' ) ;

    if (isalnum(*p)) {
	/* it starts with A-Z,0-9 read in identifier */
	while ( isalnum(*++p=getchar()) );
	ungetc(*p,stdin);
	*p =  '\0';
	this.type=classify_string();
	return;
    }

    if (*p=='*') {
	strcpy(this.string,"pointer to");
	this.type = '*';
	return;
    }
    this.string[1] = '\0';
    this.type = *p;
    return;
}

/* The piece of code that understandeth all parsing. */
read_to_first_identifier() {
    gettoken();
    while (this.type!=IDENTIFIER) {
	push(this);
	gettoken();
    }
    printf("%s is ", this.string);
    gettoken();
}

deal_with_arrays() {
    while (this.type=='[') {
	printf("array ");
	gettoken(); /* an number or ']' */
	if (isdigit(this.string[0])) {
	    printf("0..%d ",atoi(this.string)-1);
	    gettoken();	/* read the ']' */
	}
	gettoken(); /* read next past the ']' */
	printf("of ");
    }
}

deal_with_function_args() {
    while (this.type!=')') {
	gettoken();
    }
    gettoken();
    printf("function returning ");
}

deal_with_pointers() {
    while ( stack[top].type== '*' ) {
	printf("%s ", pop.string );
    }
}

deal_with_declarator() {
    /* deal with possible array/function following the identifier */
    switch (this.type) {
    case '[': deal_with_arrays(); break;
    case '(': deal_with_function_args();
    }

    deal_with_pointers();

    /* process tokens that we stacked while reading to identifier */
    while (top >= 0) {
	if (stack[top].type == '(' ) {
	    pop;
	    gettoken(); /* read past ')' */
	    deal_with_declarator();
	} else {
	    printf("%s ",pop.string);
	}
    }
}

main()
{
    /* put tokens on stack until we reach identifier */
    read_to_first_identifier();
    deal_with_declarator();
    printf("\n");
    return 0;
}

Notice the stack of tokens and what the various functions do. By and large, I'll follow this closely in the Haskell version, so that the two versions can be compared. Again, notice that Lex and Yacc aren't used. Haskell has no shortage of powerful parsing libraries, but I've decided not to use any of them. After all, if I used a powerful parsing library in the Haskell version, trying to compare the C version to the Haskell version would be like trying to compare apples to peach cobbler.

Last of all, notice that the C version uses global variables for the token stack and so forth. It also prints output wherever convenient. This makes it easy to code, but it doesn't make it easy to translate to Haskell. This is the biggest difference between the C version and the Haskell version. After all, in a purely functional programming language like Haskell, a function can only take inputs and produce outputs. It can't update global state (there are no global variables), and it can't just print a string whenever it is convenient. However, staying purely functional has benefits that are hard to match in an imperative language. For instance, Haskell is lazy and non-strict. It evaluates expressions in whatever order it feels is necessary and only as necessary. For instance, it does not evaluate the arguments to a function until they're actually used in the function--you can pass 1/0 to a function, and as long as the function never uses that parameter, there will be no error. This style of programming can be hard to wrap your head around when you're coming from an imperative and/or object-oriented background. (The tutorial that really helped me finally understand "the Haskell way" was the "Haskell Tutorial for C Programmers" (www.haskell.org/~pairwise/intro/intro.html.)

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Part 2

Shannon -jj Behrens's picture

Let me try that again: part 2 is at:

http://www.linuxjournal.com/article/9242

Part 2

Shannon -jj Behrens's picture

output function

Oliver Mooney's picture

The output function isn't defined in the Haskell source code given. If it's operating on your ParseContext type, presumably you'd have to write it yourself? I know it'd be a pretty simple function, but I'm just checking that you do have to write it yourself, and that's it's not generated automatically via some form of Haskell magic?

output function

Shannon -jj Behrens's picture

It comes for free because of the words "deriving Show". There's a whole paragraph on this in the next part of the article.

Thanks for reading!

output != output

Shannon -jj Behrens's picture

Ugh, looking back, I got confused. When you said "output" function, you were literally talking about the function named "output", whereas I instantly jumped to the conclusion that you were talking about the function used to do output. Sorry for the confusion.

output function

Oliver Mooney's picture

So any type deriving from the Show class becomes compatible with the 'output' function which expects instances of the Show class? Or is it that the Show class allows each field of a type to be referred to by field name?

Using the following as a minimal example:

data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String -- The output generated so far.
} deriving Show

executing the following at the Hugs prompt:
output (ParseContext {input = "", output = "some text"})
gives "some text" as a response.

But changing the name of the output field to outputB gives an error when using the function 'output', resolved by using a corresponding function name of 'outputB'. So is it that field names become functions, presumably with the type ParseContext -> String ?

Incidentally, I tried dropping off the 'deriving Show' declaration and accessing fields by name still works, perhaps because String instances already derive Show?

I know I'm probably belabouring the obvious here, I'm just trying to pin the origin of specific functionality down, which has been my biggest bugbear with learning Haskell to date. Looking forward to part 2!

output function

Shannon -jj Behrens's picture

Using the following as a minimal example:

data ParseContext = ParseContext {
input :: String, -- The input that has not been parsed yet.
output :: String -- The output generated so far.
} deriving Show

Haskell automatically creates functions "input" and "output" of type "ParseContext -> String" as you suggested. This comes for free and has nothing to do with "deriving Show". As another commenter commented, what would be "ctx.output" in Java is "output ctx" in Haskell.

What "deriving Show" gives you is the ability to call "show ctx" which has type "something that is showable -> String". Imagine an interface in Java that has one method, "toString()". "deriving show" means that not only should ParseContext implement that interface, but the compiler should automatically figure out a reasonable implementation for what in Java would be the "toString()" method. What Java calls an interface, Haskell calls a type class (which is a horribly confusing re-use of the word class). ParseContext is a member of the Show type class. This is documented here.

My hope is that if you take your time studying my article, you won't have as hard a time reading other Haskell tutorials as I did ;)

By the way, I just asked the editor to hurry up and publish the second half ;)

output function

Oliver Mooney's picture

Thanks Shannon, that's much clearer now.

I'm a Haskell beginner, but

Luke Plant's picture

I'm a Haskell beginner, but I', pretty sure the 'output' function is defined implicitly in ParseContext. In a typical OO language, you would have:

instance_of_ParseContext.output

but in Haskell you have:

output instance_of_ParseContext

output function

Oliver Mooney's picture

That's my understanding too I think! It's funny, I'd implicitly associated field access in that way with OO programming, but I guess there's no reason it shouldn't apply to functional programming too. I'd gotten out of the habit of equating objects with types but it looks like I took it too far.

off to an excellent start

Martin DeMello's picture

looking forward to the second half!

Money Maker

Anonymous's picture

I think this is going to be a big miney maker I am sure he will have all sorts of compaines knockin gon his door asking to patent the program!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState