PracTcl Programming Tips

by Stephen Uhler

The other day I wrote a Tcl program, and it was really slow. Even after I got all of the bugs out, it was still slow, so it was time to optimize its performance. My first rule of program optimization is to do nothing; just wait for a faster machine. Unfortunately, in this case, I can't wait that long.

My second rule of optimization is to avoid spending any effort optimizing code that doesn't matter, either because it never gets run (don't laugh, it happens more than you think), or it doesn't take a significant fraction of the total run time of the program.

As an aid to program optimization, Tcl provides a time command that measures and reports the time it takes to execute a Tcl script. It's called as:

time script count

where script is the Tcl script to time, and count is the number of times to run the script, it's run once if no count is given. time reports its result in the form:

3000 microseconds per iteration

The time is the average elapsed time for each time the script is timed.

Occasionally, it is useful to use the time command as-is to time procedures as they are run on various inputs. Once you have pin-pointed the “time hog” procedure in your application, time can be used this way to measure the performance effects of recoding techniques or new algorithms (or faster machines).

However, for any serious program optimization, a more systematic approach is needed. We can write some procedures that use the time command to enable us to compute an execution time profile of an entire application.

One way to profile an application would be to re-write every procedure, surrounding its body with a time command, whose result can be tabulated in some way. Since Tcl lets us ask about the arguments and body of every procedure, the procedures can be rewritten automatically by the profiler.

The disadvantage of this approach is that it only works for procedures and not for the built-in commands. Considering that one of the primary ways of speeding up a Tcl application is recoding some of the procedures in C and turning them into built-in commands, it would be nice to be able to profile commands as well as procedures.

Instead of re-writing procedures, we'll rename them and write a new procedure with the original name that calls (and times) the renamed version. This way, profiling can be done on both commands and procedures.

Gathering the Data

The first step is to write the replacement procedure. It will be the same for all procedures and commands except for its name, and the name of the renamed original it will call. The easiest way to accomplish this is by writing a procedure body template and using the format command to substitute in the appropriate names.

 proc profile_proc {proc} {
    global Template
    rename $proc $proc:OLD
    set body [format $Template $proc]
    proc $proc args $body
 }

The procedure profile_proc takes the name of a Tcl procedure, renames it by adding the suffix :OLD, then uses the template (in Template) to write a new procedure that times the calls to the original. Although different commands and procedures will require differing numbers of arguments, by using args, we can arrange for them to be passed to the original (:OLD) procedure un-changed.

The global variable Template contains the format template for the new procedure body, which substitutes the procedure name in two places, one to call the original and another to log the results. The 1$ in the format specifier indicates the first parameter to format should be used both times.

set Template {
    global Profile_fd
    set time [lindex [time {set result \
            [uplevel [list %1$s:OLD] $args]}] 0]
    set level [expr [info level] - 1]
    if {$level > 0} {
        set caller [lindex [info level $level] 0]
        regsub -all {(.*):OLD} $caller {\1} \
        caller
    } else {
        set caller TopLevel
    }
    puts $Profile_fd [list %1$s $time $caller]
    return $result
}

The timing information is written to a file so that it may be analyzed off-line. We will make sure the variable Profile_fd contains a file handle to which we can write timing information. First, we run the “original” command with time. By using uplevel, we can insure the original command will run in the same stack level it expects, so if it uses any upvar commands they will work properly. Since nothing in Tcl prevents a procedure name from having special characters in it, the %s:OLD needs to be passed though the list command. The original arguments to the procedure, that were gathered up into the single list args, are expanded into their original form by uplevel. The lindex 0 extracts just the time portion of the time command output and saves it in the variable time.

To properly account for the time spent in each procedure, not only is the timed procedure's name required, but its caller's name is needed as well, as will be apparent when it comes time to account for all of the time.

The info level command is used to determine the caller's name (or TopLevel, if called from the top level). The regsub gets rid of the :OLD to make the book-keeping easier.

Finally, the procedure name, time spent, and caller name are all written to the logging file, and the result returned by the renamed procedure, result, is returned to the caller.

The procedure profile_start is used to turn on profiling.

proc profile_start {{pattern *}} {
    global Profile_fd
    set Profile_fd [open /tmp/prof.out.[pid] w]
    foreach i [info procs $pattern] {
        profile_proc $i
    }
}

First, it opens the file that will receive the timing information, and calls profile_proc for the procedures we wish to profile. If commands are to be profiled as well as procedures, the info procs could be changed to info commands. At this point, the application can be run as usual, except that for each profiled command or procedure, a line of timing information is written into the file.

To turn off the profiling, profile_stop is called.

proc profile_stop {} {
    global Profile_fd
    close $Profile_fd
    foreach proc [info procs *:OLD] {
        regsub {(.*):OLD} $proc {\1} restore
        rename $restore {}
        rename $proc $restore
    }
    profile_summarize /tmp/prof.out.[pid]
}

The procedure profile_stop closes the log file, removes the special profiling procedures, restores the names of the original procedures, and calls profile_summarize, which prints a summary of the profile data. Fancier versions of profile_start and profile_stop could check to make sure a procedure isn't profiled twice, or that time isn't wasted profiling the profiling routines.

Analyzing the results

Analyzing the profile data is a bit tricky. The time attributed to a particular procedure consists of not only time spent in that procedure, but the time spent in all of its children (“called” procedures) plus the time taken by the profiling code that times the child procedures.

We can approximate the time spent in the profiling code by running a procedure with—then without—the profiling code and computing the difference. The procedure profile_calibrate does just that.

proc profile_calibrate {}
    global Profile_fd
    proc profile_dummy {args} {return $args}
    set Profile_fd [open /tmp/[pid] w]
    time profile_dummy 10
    set before [lindex [time profile_dummy 10] 0]
    profile_proc profile_dummy<\n>
    set after [lindex [time profile_dummy 10] 0]
    close $Profile_fd
    rename profile_dummy {}
    rename profile_dummy:OLD {}
    return [expr ($after - $before)]
}

A dummy procedure profile_dummy is created and timed. Then profile_proc is called to add the profiling code, and profile_dummy is timed again. The result is an approximation of the timing overhead.

proc Incr {name {value 1}} {
    upvar $name var
    if {[info exists var]}  {
        set var [expr $var + $value]
    } else {
        set var $value
    }
}

The profile_summarize procedure uses a souped-up version of the Tcl incr command (called Incr that permits incring a variable before it is set by automatically initializing it to zero before incrementing it.

proc profile_summarize {file} {
    puts [format "%35s calls   ms   ms/call %%" \
          name]
    set fd [open $file r]
    set total 0
    set overhead [profile_calibrate]
    # read in the data, and accumulate the values
    while {[gets $fd line] > 0} {
        set name [lindex $line 0]
        set time [lindex $line 1]
        set parent [lindex $line 2]
        Incr count($name)
        Incr sum($name) $time
        if {$parent != "TopLevel"} {
            Incr sum($parent) \
                 "- ($time + $overhead)"
        } else {
            Incr total $time
        }
    }
    close $fd
    # sort and print the results
    foreach name [lsort [array names count]] {
        if {$count($name) == 0} continue
        set ms [expr $sum($name)/1000]
        puts [format "%35s %4d %7d %6d %4.1f%%" \
            $name $count($name)  $ms \
            [expr $ms / $count($name)] \
            [expr $sum($name) * 100.0 / $total]]
    }
}

After a bit of initialization, Profile_summarize works in two stages, reading and tabulating the timing information, then sorting and printing the results.

Each line of timing information is read into three variables: the procedure name, the elapsed time in µs, and the parent, or calling, procedure name. Two arrays, count and sum, both indexed by procedure name, are used to keep track of the number of times each procedure is called and the accumulated time for each procedure. Next, the time attributed to each procedure is subtracted from the time credited to its parent, along with the timing overhead. Only procedures that are called from the top level have their times included in the total time counter. Otherwise, the time would be counted twice, once for the procedure and again for its caller.

Once all of the data is tabulated, it is sorted by procedure name. The results are printed in the form of procedure name, number of times the procedure was called, the total elapsed time spent in the procedure, and the percentage of the total program execution time spent in the procedure. The time values are divided by 1000 to convert them from µs into ms. Note that after about 35 minutes, the number of µs overflows a 32 bit integer, so profiling runs shouldn't be much longer than a half hour.

Listing 1 at the bottom of this article shows a subset of the results of a profiling run for an HTML library package. For this test case, it's clear that HMrender accounts for a proportionally large share of the total running time of the program. Even if it could be recoded to run infinitely fast, the overall application would be sped up by, at most, 27%. The negative time attributed to HMtag_a is probably due to the variability of the profiling overhead calculation.

The data in the logging file looks like:

...
HMrender 152083 HMparse_html
HMmap_esc 553 HMrender
HMzap_white 629 HMrender
HMx_font 3056 HMcurrent_tags
HMset_font 2022 HMcurrent_tags
HMcurrent_tags 14424 HMrender
...
Conclusions

Even though there can be considerable variability in the elapsed times of procedures, depending upon the current load on the processor, these simple profiling routines can quickly point out those parts of an application that are consuming most of the running time and would be good candidates for optimization.

Stephen Uhler is a researcher at Sun Microsystems Laboratories, where he works with John Ousterhout improving Tcl and Tk. Stephen is the author of the MGR window system and of PhoneStation, a TCL-based personal telephony environment. He may be reached via e-mail at Stephen.Uhler@Eng.Sun.COM.

Listing 1: Sample Run

% source sample.tcl
Starting sample HTML viewer...
% source ~/column/profile/test
% profile_start
  [Run the application for a bit]
% profile_stop

HMextract_param

name

calls

ms

ms/call

%

HMcurrent_tags

465

1478

3

3.2%

HMdo_map

12

3

0

0.0%

186

773

4

1.7%

 

HMgot_image

12

1199

99

2.6%

HMlink_setup

25

59

2

0.1%

HMmap_esc

467

307

0

0.7%

HMparse_html

3

2970

990

6.4%

HMrender

453

12544

27

27.0%

HMreset_win

3

220

73

0.5%

HMset_font

465

6292

13

13.5%

HMset_image

12

1401

116

3.0%

HMset_state

3

5

1

0.0%

HMstack

306

295

0

0.6%

HMtag_/a

33

124

3

0.3%

HMtag_a

33

-13

-1

-0.0%

  

...

  
Load Disqus comments

Firstwave Cloud