Creating KVTML Files
KWordQuiz, KVocTrain, and other KDE-based programs use the KVTML file format for their data files. The format is just a fairly simple XML format but, unfortunately, there doesn't seem to be anything available to convert a text file to this format. So, once again, AWK to the rescue.
While I was extracting data from a fairly convoluted HTML file which required some more awk, the generic idea was to take some easy-to-make text format and build the needed KVTML file. I decided to use one line per record with a "|" as the field separator. There is nothing magic about using this character—you just need something that does not appear in the data itself.
My "knowledge" of the KVTML format comes from creating a file with KWordQuit and then taking a look at it. I don't really understand with rows and columns is used for but it doesn't seem important. Without further ado, here is the awk program.
# bar2kvtml.as -- convert |-separated lines into a kvtml file
# invoke as follows:
# awk -f bar2kvtml.as [l1=first_label] [l2=second_label] filename(s)
# l1 and l2 are optional column labels
BEGIN {
l1 = "Column 1" # default labels
l2 = "Column 2"
FS = "|" # field separator
print "<?xml version=\"1.0\"?>"
print "<!DOCTYPE kvtml SYSTEM \"kvoctrain.dtd\">"
print "<kvtml"
print "generator=\"bar2kvtml.as\""
print "cols=\"2\""
print "lines=\"50\""
first++
}
first { # output header with first line
nfn = FILENAME
sub(/\..*$/, ".kvtml", nfn)
print "title=\"" nfn "\">"
first = 0
print " <e>"
print " <o width=\"250\" l=\"" l1 "\">" $1 "</o>"
print " <t width=\"250\" l=\"" l2 "\">" $2 "</t>"
print " </e>"
next
}
{ # all subsequent lines
print " <e>"
print " <o>" $1 "</o>"
print " <t>" $2 "</t>"
print " </e>"
}
END {
print "</kvtml>"
}
The BEGIN block in the awk script is executed before the input file is opened. First it sets default column labels. They can be overridden on the command line. It sets the field separator (FS) to "|" and outputs most of the boilerplate. It does not, however, output the line with the filename in it as I create that filename from the input filename (replacing everything after a dot with ".kvtml". This has to be done later as FILENAME is not yet set. The variable first is set to indicate this remaining work is yet to be done.
The code executed if first is set outputs the remainder of the boilerplate plus the first record data along with the column tags—either those supplied on the command line or the defaults that were set in the BEGIN block. Finally, first is set to zero so this block will not be executed again. The next statement causes awk to skip to the next record rather than continuing processing on the current one.
For each subsequent input line, the default (no condition) block of code is executed. When the end of file is reached, the END block of code is executed.
Output is sent to standard output. You can redirect it to wherever you want. You can change the value of FS and RS to handle different imput formats. If you need to process an input format that requires a bit more processing, just add a unconditional block right after the BEGIN block that massages the data into a reasonable format. Judicious use of getline and next along with sub and gsub functions should be able to handle most anything.
Phil Hughes
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- Web & UI Developer (JavaScript & j Query)
- UX Designer
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Nice article, thanks for the
1 hour 38 min ago - I once had a better way I
7 hours 24 min ago - Not only you I too assumed
7 hours 42 min ago - another very interesting
9 hours 35 min ago - Reply to comment | Linux Journal
11 hours 28 min ago - Reply to comment | Linux Journal
18 hours 22 min ago - Reply to comment | Linux Journal
18 hours 39 min ago - Favorite (and easily brute-forced) pw's
20 hours 30 min ago - Have you tried Boxen? It's a
1 day 2 hours ago - seo services in india
1 day 6 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Hi, For some reason I get an
Hi,
For some reason I get an error on the line:
sub(/\..*$/, ".kvtml", nfn)
What is that line actually for? Thanks.
- hp 6735s