NoSQL Tutorial

 in
A comprehensive look at the NoSQL database.
Security

NoSQL works with UNIX, not in addition to it. Set correct permissions to files using groups: create a group for those users who can access these files. This is a great security wall.

Performance Tips

No matter what program you have to deal with, sooner or later you will have to deal with performance. I'm not a developer of the NoSQL Database System, but I can give you some useful advice.

First of all, keep your tables small. Don't keep all the data in a single table, this is a waste of performance. Using the method that best suits your environment, try splitting tables into several files and organizing them into directories. In our examples, we can keep track of our customers creating a single “phone directory” (i.e., customer.rdb), then creating a directory for each order status (received, waiting_for_bill, archive), and last a table for each customer (the file name will be the customer code). For example:

/var/tables/acmeinc/
customer.rdb
                    catalog.rdb
                    received/
                             ACM001.rdb
                             ACM003.rdb
                    w4bill/
ACM002.rdb
                             ACM003.rdb
                    archive/
                             ACM001.rdb
                             ACM002.rdb
                             ACM003.rdb

If you must do everything in a big table and you have to update it frequently, there's a trick for you if it's an indexed file: journaling.

Create a journaling table, say customer.j, with exactly the same header as customer.rdb, but containing only those records which we want to insert into, remove from or append to customer.rdb. The entries in customer.j must be in a format suitable for the nosql merge command.

Whenever we fetch data from customer.rdb we will have to do it in three logical steps. The first step is to use search on bigtable to take advantage of the indices. This will produce an intermediate output that will then be merged into customer.j, and the final output will undergo the original query statement again.

Any updates to customer.rdb will be done to customer.j with the syntax described in the documentation of the merge command (there you'll find how to use the operator to the optimum level). You will also have to make sure that customer.j is kept sorted on its primary key field after each update. For example, if you have an index on the CODE column in the customer.rdb table, you should use:

echo -e "CODE\n----\nACM004" | nosql search -ind customer.x.CODE |\
nosql merge CODE NAME PHONE EMAIL customer.j | \
nosql row 'CODE=="ACM004"'

As you can see, the trick is:

  1. Perform an indexed search on customer.rdb to quickly obtain a much smaller (possibly empty) subset of the table.

  2. Merge the first output with customer.j on the fly during the query.

  3. Do a sequential post-query on the final output.

If that's not enough, another trick to improve speed is to make your AWK use the ash (by Kenneth Almquis) for system()'s and pipes, since this shell is small and fast at startup. Suppose your AWK is /usr/bin/mawk and your shell is /bin/ash (a fairly common case, especially on Debian GNU/Linux), then you can do something like this:
  1. Create the following hard link: ln /bin/ash /bin/ah

  2. Modify AWK to make it use /bin/ah as opposed to /bin/sh, and write the modified AWK to /usr/local/bin/nsq-mawk: sed 's/\/bin\/sh/\/bin\/ah/g' /usr/bin/mawk\> /usr/local/bin/nsq-mawkchmod 755 /usr/local/bin/nsq-mawk

  3. Modify the NSQAWK value in the config file (/usr/local/lib/nosql/nosql.conf or $HOME/.nosql.conf) to /usr/local/bin/nsq-mawk. This will speed up your queries !

Going to the Web

Using NoSQL on the Web is a matter of seconds. Let's suppose that Acme Tools Inc. now has a web site, and you want your customer to search on database for their pending orders. We first create a small input form, the file getname.html show in Listing 2, on which we ask for customer name. I did not use any security at all in this example, but at least password should be asked in a production environment.

Since we are not good at graphics, we create a small template (result.html) that can be modified easily. This template, result.html, is shown in Listing 3.

In this template, some keywords are substituted by the CGI: our standard keywords start and end with a double hash (#) sign, for example ##KEYWORD##.

A special section of the template, named stanza, will be repeated as many times as the number of query rows. The stanza starts and ends with special comments that will be recognized by the CGI.

Now is the time for writing the CGI: Listing 4, result.pl, is a perl script that should perform the queries based on the input name and then creates a resulting page based on the previous template. Most of the queries used in this CGI are those used in previous examples, so we won't repeat them. Just have a look to the main query:

@cusdata = `nosql cat $datafile | nosql join -j PROD - $ctlgfile |
nosql addcol SUBTOTAL | nosql compute 'SUBTOTAL = QTY*PRICE' | nosql
column PROD DESC QTY PRICE SUBTOTAL | nosql body`;

The thing to notice is the array that will contain query rows: each row contains a tab separated list of fields, as the NoSQL table row specification. In fact, in the stanza keyword substitution, a split function against tab is used:

my ($prod, $desc, $qty, $price, $subtotal) = split(/\t/, $data);
All queries run with back ticks that, instead of the exec() and system(), return the STDOUT of the program. This output may be reused in the program using variables. A negative fact of this is security: you cannot run this program in tainted mode (-T switch in the perl command line), but this can be avoided with some tricks such as the ones I used. First of all, you should avoid buffer overruns by using a substring function ($cusname = substr($cusname, 0, 50)), then keeping out some escape characters (such as > Once the queries have been executed and we have all the necessary values, we load the template file and associate it with the default input and pattern-searching space. The template file, in which we transformed new lines and multiple spaces into a single space, is now divided into three parts using pattern matching: the header, the body (aka stanza) and the footer.
/(.+)<\s*!--\s*here\s+starts\s+nosql\s+stanza\s*--->(.+)<\s*!--\s*here
\s+ends\s+nosql\s+stanza\s*--\s*>(.+)/i ;
This search will identify the stanza into the template, using the keyword <!-- here starts nosql stanza --> as the beginning and <--! here ends nosql stanza --> as the end. As you can notice, these are simple HTML comments, so can be introduced easily by our graphic experts. All items before the beginning comment is considered the header, while the rest is the footer.

Before entering the keyword processing in the body, we will do it in the header and footer in order to set proper customer name and total amount due:

$header =~ s/##NAME##/$cusname/;
$footer =~ s/##TOTAL##/$total/;

The final part is the keyword substitution in the stanza. Here we will swap the original variable ($body) with a temporary one ($tmpbody), in order to leave the first unchanged for next loop. Here the fields are split using the method I described earlier, then substituted for the keywords in the template file. Of course, there are thousands of way of writing down this kind of CGI, be it in Perl or other languages. Write one in your favorite language and let your imagination be your guide: databases are plain ASCII files, so you can process them as you like, and you will get great results.

For a REAL example of NoSQL usage on the web, check out http://www.whoswho-sutter.com/, http://annunci-auto.repubblica.it/ and http://www.secondamano.it/ (the first is in English, while the others are in Italian).

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

dbms

srikanth.v's picture

my query is i want the thing as price * qty should give the amount and the total should be calculated for that what is the query in oracle sql(structured query language).

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState