Generating Music Notation in Real Time

by Kevin C. Baird

Composers from a variety of backgrounds have been interested in having greater freedom in musical performances. This article describes another attempt in this tradition.

In his Klavierstcke, Karlheinz Stockhausen allowed performers to order prewritten material as they saw fit. Earle Brown allowed conductors to mix and match material in his Available Forms pieces, and in his graphic scores, he allowed performers essentially to improvise whatever the images in the score inspired them to play. Improvisation obviously is critical to jazz, as well.

In all of these examples, the composer gives this increased freedom only to performers or the conductor, not to the audience. My doctoral dissertation piece No Clergy and its associated programs constitute one attempt to give similar freedoms to the audience.

In No Clergy, performers play on-screen notation presented through a Web browser. The audience has Web browsers open to pages with standard CGI forms, allowing them to react to what they're hearing by entering data. The piece's scripts then process the data, affecting the subsequent pages of notation presented to the performers.

No Clergy has specific technical, logistical and esthetic requirements. The processing must occur in real time. The audience's and performers' interfaces for the piece must be familiar and comfortable. The piece must be transportable—able to be performed in any location with minimal setup requirements.

The user, who serves as the “conductor” and is either me or someone filling a similar role in a performance where I am absent, starts the piece by running the bash script The script generates, with Python, a markup file for GNU Lilypond, a music typesetting program described in greater detail later in the article. It then processes that Lilypond file into a PNG image, as shown in Listing 1. Portions of the script that are not shown also perform cleanup actions to remove old data and ending actions to place images in the appropriate Web directories.

Listing 1. Excerpts from

python -O NoClergy/Python/ \
clar > lilypond/ly/
lilypond --png -o lilypond/out/ lilypond/ly/

Figure 1. Sample Initial Output for a Clarinet

Later, when user data are available, a similar bash script called (Listing 2) reads the previous data and generates subsequent pages of notation. The script reads the user data and updates a config file, while applies those changes to the musical material, performing actions similar to the script described above, but this time with a previous example of output from which to work.

Listing 2. Excerpts from

python -O NoClergy/Python/
python -O NoClergy/Python/ 'lilypond/ly/'

I chose an object-oriented paradigm because the musical material contains multiple instances of similar types of data. Initially, the top-level Class was Score, defined as one page of notation for one instrument. Each Score has a transposition level appropriate for its instrument and can contain an arbitrary number of Measures. I chose 20 as good number of Measures to fit comfortably on a single page with reasonable legibility, and as a good duration of musical time between each updated page of notation.

For Non-Musicians


Transposition is a musical term for changing the pitch of a piece of music. Some instruments sound lower or higher than normal and, therefore, require music written for them to be transposed. This practice allows performers to learn fingerings for an instrument family, rather than only one specific instrument.

Pitch Class

A note's pitch class is its pitch mod 12. Middle C is a pitch, whereas C is a pitch class. When musicians talk about a C or a B flat, they are referring to pitch class. A doubling of frequency is a rise of one octave, and differences in octave are what separate notes with different pitches but the same pitch class.

Each Measure has a meter, which determines how long it is and its internal rhythmic organization. It contains however many Notes fill that meter. The total number of Notes also varies based on each Note's duration.

A Note is defined as an individual sound or silence event within a Measure. It has a string called pitch, which is either r for a rest or an indication of its pitch class, such as c, cs for C sharp, d or ef for E flat. It also has an integer called octave, which is meaningful only for non-rests.

Notes also have durations, dynamics, which are variations in amplitude or volume, and articulations, which determine whether the note is accented, detached or sustained into the next note, among other things. Notes also can be tuplets, which are a particular type of rhythmic organization.

Tuplets are notes or rests that occur at a different rate than their note type would indicate. By far, the most common type of tuplet is the triplet. A set of three triplets occupy the same duration as two notes, so three 8th note triplets occur in the span of two normal 8th notes, or one beat in most instances. Quintuplets are five notes in the span of four, septuplets are seven in the span of four and so on. Unless otherwise indicated, a set of tuplets squeezes x notes in the span of y, where y is the highest power of 2 lower than x. Composers generally indicate deviations from this practice with x:y notation on the tuplet set, so a 7:8 tuplet set stretches seven notes across the span of eight.

No Clergy uses a non-unique list of tuplet types—the number indicated by x in the preceding paragraph. This allows me to weight in favor of triplets and quintuplets, as is common. Anyone who wants to use my program to sound like Frank Zappa or Brian Ferneyhough can alter the tuplet list as they wish, although musicians should know that this initial version of the program does not yet support nested tuplets. Non-musician programmers probably can discern from the name that nested tuplets are sets of tuplets that themselves contain one or more sets of tuplets. The program also does not yet support dotted notes, which use a notation convention to show that a note should last half again as long.

When the scripts generate notation, they read a configuration file, updated by the script mentioned above. The various pc variables shown in Listing 3 represent the percentage chance that a given note will have the characteristic in question, such as being a tuplet, being a rest as opposed to a note that sounds, having an explicit dynamic mark and having an explicit articulatory mark.

Listing 3. Configuration Variables

# No Clergy config.txt, written by script
tupletpc = 50
restpc = 25
dynpc = 25
artpc = 25
number_of_measures = 20

The scripts then generate a full Score's worth of musical material, constrained by the variables above. At this point, all of these data still exist only as Lists of Objects within a Python script. I chose MusicXML as the format for external storage.

MusicXML is a subset of XML specifically geared toward musical data. Developed by Recordare LLC, it is largely geared toward being an interchange format for music notation programs. It is also useful, however, as a generic storage format for musical data, as befits a dialect of XML.

Listing 4. Sample MusicXML Fragment for One Note


Musicians probably can figure out that this XML fragment represents a 16th note B flat in the third octave with no dynamic, articulatory or other special alterations.

The script stores the most recent MusicXML file for each instrument using the path convention inst/yyyy_mo_dd-hh_mi_ss.xml, where inst is the instrument name and other letter codes are time units. Upon reading an XML file, the script then moves it into a backup directory and compresses it with bzip2. Storage of old data is useful for documentation of specific performances, learning how audiences react to their role in the piece and debugging. Once the script has read in the XML data, it's ready to output for processing by Lilypond.

GNU Lilypond is a Scheme-based music typesetting program that uses a TeX-like backslash notation for formatting commands and has a particular focus on high-quality music engraving inspired by the best traditional hand engraving. It outputs to several high-resolution formats, including PostScript, DVI and PNG.

Listing 5. Sample of Lilypond Markup

\time 7/8 ef''8-\pp a'4 d'2-\marcato

In Listing 5, we see Lilypond markup for one measure of music. The vertical pipe represents a barline, the % MEASURE 2 is a comment, the \time indicates the 7/8 meter, the ef makes the first note an E flat, the '' raises the note two octaves higher than the baseline used by the program, the 8 gives the note a rhythmic value of an 8th note and the -\pp attaches a pianissimo dynamic indicator to the note. Two other notes follow. The measure is rendered as shown in Figure 2.

Figure 2. Rendering of Lilypond Markup

The storage of semantically meaningful musical data in MusicXML and presentation data in Lilypond markup mirrors the trend in the HTML or DocBook worlds, where structural information is kept independent of the final rendering. As a side benefit, these scripts also are useful in a pinch as a MusicXML to Lilypond to dvi/ps/pdf converter.

During the performance, the audience can and should input data that will direct the future course of the music. I wanted to use a GUI interface for the audience. The piece uses bash scripting as a glue language, but all of the real work is done by Python, making the various Python GUI options obvious candidates. For several reasons, I chose a Web browser interface with CGI forms for data input. What are the advantages of a Web interface over a Python-based GUI?

Web browsers are ubiquitous, among the most commonly used user applications in the world. Rather than requiring an installation of an X client and Python with GUI libraries and then needing to work out OS-specific issues, I simply can require performance sites to have machines with Web browsers—a trivial request.

It is largely due to browsers' ubiquity that they also are familiar. Self-described novice computer users are often far more comfortable operating a browser than some new GUI interface they've never seen before. In fact, even though page layout rendering of HTML often differs a great deal from one instance to another, it feels familiar to the user. This comfort level is especially critical for a performance situation like mine, where the audience needs to get over a natural reluctance to participate actively in a process traditionally reserved for the performers.

In addition, HTML is easy to code. Rob Pike's Rule 4 states that “fancy algorithms are buggier than simple ones”, and straightforward markup usually is even simpler. More information about the benefits of thin-client designs like this can be found in Hugh Williams and David Lane's Web Database Applications, published by O'Reilly and Associates.

Using a browser interface has clear drawbacks, but luckily, none are particularly relevant to my project. HTTP is slow, but I need to update only once per page of music, no faster than about once per minute. HTML/CSS style is relatively inflexible, but my design needs are simple. Video is problematic for pure HTML presentation without plugins that often are proprietary, but I need only still images.

Figure 3. Sample User Interface

Given that we have a browser interface and hopefully a participating audience, we need to process their data. I use a Python CGI script to capture variables. It writes values within a <pre> tag, values that the later processing script reads in order to do the actual modification of musical data. This generic, extensible setup means that the capturing script needs little to no alteration as I improve the piece's other scripts by adding variables or changing how they are interpreted at other stages in the piece.

Listing 6. Script Fragment to Capture User Data

import cgi, re
form = cgi.FieldStorage()
formS = '<pre>\n'
for field in form:
  formS += field + ' = '
  formS += form[field].value + '\n'
formS += '</pre>\n'

The script then inserts the formS string at the appropriate point within the feedback file. The variables are integers representing percentage chances that a given note will have a given characteristic. I chose to list the values within <pre> tags to allow easy observation of changes to variables during a performance using a Web browser. The scripts always write the most recent data immediately after a <!-- begin --> comment.

In addition to the audience data variables, the processing script also incorporates a second-order Markov chain (see the “Markov Chains” sidebar). This allows all pages of music after the first to share characteristics of the first page while still being shaped by the audience feedback.

Markov Chains

Markov Chains, named after Russian Mathematician A.A. Markov, are ordered collections of data based on samples. For each set element x, we find all transitions from it to any element, including a repetition of itself. From those data, we find the probability that x leads to each element that follows it. We then can use those probabilities to reconstruct a randomly generated set that resembles the source set without necessarily being identical to it.

The user then continues to run the script for as long as desired. Another wrapper shell script could keep the piece running until a specific condition is met, such one of the pc variables reaching a given minimum or maximum. It also could have a set number of iterations. It even could run as a long-term installation. No performer relishes the thought of sitting for eight or more hours and playing whatever appears on a screen in front of him or her. Therefore, for very long runs, I should alter the program such that it does not require performers. One option would be to output to a synthesizer such as Csound. Another option would be to display the notation for visual demonstration only.

Future plans for this project include a port to the Ruby language. This is mainly for self education, but there are other reasons. Ruby code tends to be more compact—the advantages of features like the each method add up with enough code. In informal benchmarking, I've also found Ruby faster than Python for some tasks. I'd like to test this more scientifically, with and without Python bytecode optimization. Finally, by porting, I'll learn both Python and Ruby better—always a good thing.

As mentioned previously, the scripts store previous runs of the piece in the MusicXML format. There is no reason that other music in that same format couldn't be run through the script. By using other source music, the script can create Markovian blends of Charlie Parker and Bartok, or whatever suits the user's tastes.

Finally, here's some technical trivia that may interest readers. I achieved much faster XML access when I did “dumb” file/string readline operations with my own convention for placement of \n characters. Conversion to standard DOM-aware XML libraries slowed the program down noticeably but made it both more robust and more usable by others. For a while, I even kept both versions of XML reading available. Eventually, I decided to use only the DOM-based reading, even thought it has become the slowest portion of the script by far.

In the end, I felt that arguments for using an HTTP-based audience interface had merit for the XML access issue. Using the standard DOM library made the script available to a wider user base. The XML read runs fast enough for usability today and will be even faster in the future with faster hardware. Therefore, I decided to comply with existing standards in the interest of robustness.

I plan to have a performance at the University at Buffalo in late 2004. Obviously, the program's collision with a real-world test should provide a great deal of useful information about how to improve it. The specifics of the performance are still in question, but I may try to use wireless browsers for the audience, which I hope will provide a less intimidating interface for them and will improve the overall experience for both performers and audience members.

Resources for this article:

Kevin Baird is pursuing a PhD in Music Composition at the University at Buffalo in Buffalo, New York. He is on-line at

Load Disqus comments