Designing and Implementing a Domain-Specific Language

“Like everything metaphysical the harmony between thought and reality is to be found in the grammar of the language.”—Ludwig Wittgenstein

is the same as the Python expression:

min(2, 6, min(10, 15))

Functions that do not require arguments are the exception. They are still called with trailing parentheses just as they are in Python:


Language extensions are written with defop statements. It is possible to add new prefix, postfix and infix operators as well as special mixfix operators like C's conditional expressions. It also is possible to add new keywords.

Operator definitions consist of an associativity specification, a binding value, the operator syntax and the implementation. Associativity is specified with a single letter, either l for left or r for right. If associativity is not specified, Logix automatically makes the new operator left associative. The binding value specifies operator precedence. The binding value syntax is one of the only things I really don't like about Logix. Even in languages with static syntax, operator precedence tends to confuse me. It's much more difficult to keep track of in a structurally dynamic language. Fortunately, precedence and associativity won't be all that important in simple tool languages.

The operator syntax consists of variables and constants. Constants are enclosed in quotations, and variables are specified by type: expr (expression), symbol, term, token, block and freetext. The operator implementation can be either a macro or a function. A function is evaluated at runtime, whereas macros perform code replacement at compile time.

Let's have a look at an example from the Logix documentation:

defop 50 expr "isa" expr func ob typ:
  isinstance ob typ

This describes an isa operator. The new isa operator consists of an expression, followed by the constant isa, followed by an expression. The func keyword indicates that the implementation is a function, and the two symbols that follow are the names of the variables. Each variable in the implementation is associated with a variable in the syntax definition. In this case, the first expr is ob, and the second expr is typ. The code within the func block is evaluated when the operator is called.

In the following line of code:

"test" isa str

the string test is the first expression passed to the isa operator, and the type str is the second expression. The operator then passes the arguments to the isinstance function and returns the result, which in this case, is the boolean True.

A Language Is Born

Now that we have worked through the basics, let's try a more complex example. Imagine a company with a veritable fleet of network-enabled printers featuring telnet-accessible administrative interfaces. This hypothetical company maintains a record of the current configuration for all its printers in a text file. When someone wants to change the configuration of a particular printer, they record the change in the text document, and then they connect to the printer and make the change. The company could design a simple DSL that treats the configuration record as a program. So, when someone wants to change the configuration of a printer, they simply could change the document and run it. When run, the text document would connect to all the printers and repopulate the configuration data.

First, let's have a look at the document:


accounting printers:
  - 10 hp5mo1
  - 28 lpt9
  - 29 lpt10
  - 48 lpt6

developer printers:
  - 26 lpt4
  - 27 lpt7

marketing printers:
  - 62 hpcolor5:
  - 154 lpt11

for department in
  [accounting, developer, marketing]:
  for printer in department:
    print ("Configuring %s..."

print "Finished!"

When you design your own DSL, you must consider the implications of the syntax you select. If you want to add more features, will you be able to? Inexperienced DSL developers monopolize common meta-characters in order to make the syntax as concise as possible. In the long run, that makes it harder to learn, harder to use and harder to extend.

The default block contains the default configuration options that will be set on all printers. Each of the printers blocks contains a description of all the printers in a single department. Each individual printer definition contains the end of the printer's IP address and the associated hostname. A printer definition optionally can be followed by a block that contains configuration options specific to that printer. Our DSL turns each printer block into a list of Printer objects and assigns that list to a variable bearing the name of the department. It then will be possible to manipulate these lists with code written in the standard Logix dialect.

Now, let's have a look at the implementation:

setlang logix.stdlang

from telnetlib import Telnet

class TelnetDebug:
  def write self txt: print "dbg:%s"%txt

class Printer:
  def __init__ self ip host data:
    self.ip = ip = host = Printer.default.copy() data

  def transmit self:
    #tn = Telnet "192.168.0.%s"%self.ip
    tn = TelnetDebug()

    tn.write "printer_password"
    tn.write ("host %s"

    for x,y in
      tn.write ("%s %s"%(x,y))

deflang printerdef:

  defop 50 expr ":" expr macro n v:
    str n, str v

  defop 0 "-" token expr [":" block]/-
    macro ip v *b:
      ["host":str v, "ip":str ip, "block":b]

deflang printlang(logix.stdlang):

  defop 0 expr "printers:" block@printerdef
    macro n *v:
      `\n = [\@.Printer p/ip p/host (dict p/block)
         for p in \v]

  defop 0 "default:" block@printerdef macro *b:
    `\@.Printer.default = dict \b

The implementation starts with a setlang directive that tells the interpreter to use the standard Logix dialect. Next, we define the Printer class. Every printer defined in a printers block eventually becomes an instance of the Printer class. The Printer class contains no code specific to the DSL and easily can be used in another project. The Printer initialization method takes three arguments: the last part of the printer IP address, the printer hostname and a dict that associates option names with option values. The init method also copies the default printer options from a class variable into an instance variable called data and updates it with the printer-specific options passed into the instance via the data argument.

Now we get to the good part, the language definition. In Logix, the deflang statement is used to start a new language block. Each language block contains a sequence of operator definitions. The first language block describes the syntax we will use in the individual printers blocks and the default block. The printerdef language's first operator is the colon, an infix operator that is used to parse individual options. The first expr is the option name, and the second expr is the option value. The colon operator implementation is a macro that converts the expressions into strings and puts them in a tuple.

The second operator in the printerdef language is the hyphen operator, a mixfix operator that is used to define individual printers. This one is a bit more complicated. The operator starts with a literal hyphen, which is followed by a variable token, an expression and an optional block. A token is a single value, in this case a number. A block, as one might guess, is a block of content that is parsed using Python's indentation rules.

In the definition, the literal colon and the block are enclosed in braces and followed by a /-. The braces group syntactic elements, and the /- following the group indicates that it is optional. This makes it possible to omit the block for printers that don't need to specify their own configuration options. The implementation is a macro that takes three arguments. The token is the IP address suffix, the expr is the printer hostname and the block contains the printer options. The asterisk in front of the b indicates that the variable is a sequence. If you don't specify that the block variable is a sequence, blocks with more than one line will not be parsable. The implementation returns a dict containing the hostname, the IP suffix and the block. The block contains options, which get transformed into tuples, so in the implementation, the b variable is a sequence of tuples.

The second language in the implementation contains the primary syntax for our DSL. After the language name, you can see a reference to the standard Logix dialect enclosed in parentheses. Like classes, Logix languages support inheritance. The stdlang reference within the parentheses indicates that our printlang inherits all the operators of stdlang. Developers now can use standard Logix syntax in addition to the specialized operators defined within the printlang. That is how the for loop at the end of the printer configuration program is possible.

The printers operator starts with an expression, followed by the literal printers: and then a block. In this definition, the block is immediately followed by @printerdef, which tells the interpreter that the contents of the block should be parsed by the printerdef language. The printers implementation is a macro with two operators: the name of the group and the block, which is a sequence of dicts that contain printer definitions.

The back tick at the beginning of the implementation macro replaces the escaped variables with their values and converts the expression into code data. We want to be able to make a variable that uses a name provided by the user. For instance, we want to assign the value of the first printers block to the variable accounting. If the implementation wasn't quoted, it would try to assign the value to the variable n, rather than creating a new variable that uses the name provided by the value. Quoting is like Python's exec function:

n = 'test'