Creating KVTML Files

December 31st, 2008 by Phil Hughes in

Your rating: None Average: 4.4 (9 votes)

KWordQuiz, KVocTrain, and other KDE-based programs use the KVTML file format for their data files. The format is just a fairly simple XML format but, unfortunately, there doesn't seem to be anything available to convert a text file to this format. So, once again, AWK to the rescue.

While I was extracting data from a fairly convoluted HTML file which required some more awk, the generic idea was to take some easy-to-make text format and build the needed KVTML file. I decided to use one line per record with a "|" as the field separator. There is nothing magic about using this character—you just need something that does not appear in the data itself.

My "knowledge" of the KVTML format comes from creating a file with KWordQuit and then taking a look at it. I don't really understand with rows and columns is used for but it doesn't seem important. Without further ado, here is the awk program.

# bar2kvtml.as -- convert |-separated lines into a kvtml file
# invoke as follows:
#   awk -f bar2kvtml.as [l1=first_label] [l2=second_label] filename(s)
#       l1 and l2 are optional column labels

BEGIN   {
        l1 = "Column 1"         # default labels
        l2 = "Column 2"
        FS = "|"                # field separator
        print "<?xml version=\"1.0\"?>"
        print "<!DOCTYPE kvtml SYSTEM \"kvoctrain.dtd\">"
        print "<kvtml"
        print "generator=\"bar2kvtml.as\""
        print "cols=\"2\""
        print "lines=\"50\""
        first++
        }

first   {               # output header with first line
        nfn = FILENAME
        sub(/\..*$/, ".kvtml", nfn)
        print "title=\"" nfn "\">"
        first = 0
        print " <e>"
        print "  <o width=\"250\" l=\"" l1 "\">" $1 "</o>"
        print "  <t width=\"250\" l=\"" l2 "\">" $2 "</t>"
        print " </e>"

        next
        }

        {               # all subsequent lines
        print " <e>"
        print " <o>" $1 "</o>"
        print " <t>" $2 "</t>"
        print " </e>"
        }

END     {
        print "</kvtml>"
        }

The BEGIN block in the awk script is executed before the input file is opened. First it sets default column labels. They can be overridden on the command line. It sets the field separator (FS) to "|" and outputs most of the boilerplate. It does not, however, output the line with the filename in it as I create that filename from the input filename (replacing everything after a dot with ".kvtml". This has to be done later as FILENAME is not yet set. The variable first is set to indicate this remaining work is yet to be done.

The code executed if first is set outputs the remainder of the boilerplate plus the first record data along with the column tags—either those supplied on the command line or the defaults that were set in the BEGIN block. Finally, first is set to zero so this block will not be executed again. The next statement causes awk to skip to the next record rather than continuing processing on the current one.

For each subsequent input line, the default (no condition) block of code is executed. When the end of file is reached, the END block of code is executed.

Output is sent to standard output. You can redirect it to wherever you want. You can change the value of FS and RS to handle different imput formats. If you need to process an input format that requires a bit more processing, just add a unconditional block right after the BEGIN block that massages the data into a reasonable format. Judicious use of getline and next along with sub and gsub functions should be able to handle most anything.
__________________________
Phil Hughes


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

August 2009, #184

If you're a culinary type you've probably heard of Pickled Capers. This month, we present you with an even tastier treat: Kerneled Kapers. That's right Linux so good that you can eat it for dinner. We've got two articles about kernel scheduling: one about real time scheduling and the other about the Completely Fair Scheduler which appeared in Linux 2.6.23. We also have an article on the new Ksplice technology that appeared on the scene just recently. Also in this issue: find out how to make root unprivileged.


And if Kapers aren't your cup of tea we have our usual buffet of articles: eyeOS which allows you to create your own cloud based desktops, using fixtures and factories with Rails, more on secure Squids, a review of the long awaited KOffice 2.0, Longomatch, and Kanatest.


But don't leave before we serve up the "piece de resistance": Point/Counterpoint on Twitter.


Apologies to Chef Marcel for borrowing his shtick.





Read this issue