NoSQL Tutorial

 in
A comprehensive look at the NoSQL database.
Security

NoSQL works with UNIX, not in addition to it. Set correct permissions to files using groups: create a group for those users who can access these files. This is a great security wall.

Performance Tips

No matter what program you have to deal with, sooner or later you will have to deal with performance. I'm not a developer of the NoSQL Database System, but I can give you some useful advice.

First of all, keep your tables small. Don't keep all the data in a single table, this is a waste of performance. Using the method that best suits your environment, try splitting tables into several files and organizing them into directories. In our examples, we can keep track of our customers creating a single “phone directory” (i.e., customer.rdb), then creating a directory for each order status (received, waiting_for_bill, archive), and last a table for each customer (the file name will be the customer code). For example:

/var/tables/acmeinc/
customer.rdb
                    catalog.rdb
                    received/
                             ACM001.rdb
                             ACM003.rdb
                    w4bill/
ACM002.rdb
                             ACM003.rdb
                    archive/
                             ACM001.rdb
                             ACM002.rdb
                             ACM003.rdb

If you must do everything in a big table and you have to update it frequently, there's a trick for you if it's an indexed file: journaling.

Create a journaling table, say customer.j, with exactly the same header as customer.rdb, but containing only those records which we want to insert into, remove from or append to customer.rdb. The entries in customer.j must be in a format suitable for the nosql merge command.

Whenever we fetch data from customer.rdb we will have to do it in three logical steps. The first step is to use search on bigtable to take advantage of the indices. This will produce an intermediate output that will then be merged into customer.j, and the final output will undergo the original query statement again.

Any updates to customer.rdb will be done to customer.j with the syntax described in the documentation of the merge command (there you'll find how to use the operator to the optimum level). You will also have to make sure that customer.j is kept sorted on its primary key field after each update. For example, if you have an index on the CODE column in the customer.rdb table, you should use:

echo -e "CODE\n----\nACM004" | nosql search -ind customer.x.CODE |\
nosql merge CODE NAME PHONE EMAIL customer.j | \
nosql row 'CODE=="ACM004"'

As you can see, the trick is:

  1. Perform an indexed search on customer.rdb to quickly obtain a much smaller (possibly empty) subset of the table.

  2. Merge the first output with customer.j on the fly during the query.

  3. Do a sequential post-query on the final output.

If that's not enough, another trick to improve speed is to make your AWK use the ash (by Kenneth Almquis) for system()'s and pipes, since this shell is small and fast at startup. Suppose your AWK is /usr/bin/mawk and your shell is /bin/ash (a fairly common case, especially on Debian GNU/Linux), then you can do something like this:
  1. Create the following hard link: ln /bin/ash /bin/ah

  2. Modify AWK to make it use /bin/ah as opposed to /bin/sh, and write the modified AWK to /usr/local/bin/nsq-mawk: sed 's/\/bin\/sh/\/bin\/ah/g' /usr/bin/mawk\> /usr/local/bin/nsq-mawkchmod 755 /usr/local/bin/nsq-mawk

  3. Modify the NSQAWK value in the config file (/usr/local/lib/nosql/nosql.conf or $HOME/.nosql.conf) to /usr/local/bin/nsq-mawk. This will speed up your queries !

Going to the Web

Using NoSQL on the Web is a matter of seconds. Let's suppose that Acme Tools Inc. now has a web site, and you want your customer to search on database for their pending orders. We first create a small input form, the file getname.html show in Listing 2, on which we ask for customer name. I did not use any security at all in this example, but at least password should be asked in a production environment.

Since we are not good at graphics, we create a small template (result.html) that can be modified easily. This template, result.html, is shown in Listing 3.

In this template, some keywords are substituted by the CGI: our standard keywords start and end with a double hash (#) sign, for example ##KEYWORD##.

A special section of the template, named stanza, will be repeated as many times as the number of query rows. The stanza starts and ends with special comments that will be recognized by the CGI.

Now is the time for writing the CGI: Listing 4, result.pl, is a perl script that should perform the queries based on the input name and then creates a resulting page based on the previous template. Most of the queries used in this CGI are those used in previous examples, so we won't repeat them. Just have a look to the main query:

@cusdata = `nosql cat $datafile | nosql join -j PROD - $ctlgfile |
nosql addcol SUBTOTAL | nosql compute 'SUBTOTAL = QTY*PRICE' | nosql
column PROD DESC QTY PRICE SUBTOTAL | nosql body`;

The thing to notice is the array that will contain query rows: each row contains a tab separated list of fields, as the NoSQL table row specification. In fact, in the stanza keyword substitution, a split function against tab is used:

my ($prod, $desc, $qty, $price, $subtotal) = split(/\t/, $data);
All queries run with back ticks that, instead of the exec() and system(), return the STDOUT of the program. This output may be reused in the program using variables. A negative fact of this is security: you cannot run this program in tainted mode (-T switch in the perl command line), but this can be avoided with some tricks such as the ones I used. First of all, you should avoid buffer overruns by using a substring function ($cusname = substr($cusname, 0, 50)), then keeping out some escape characters (such as > Once the queries have been executed and we have all the necessary values, we load the template file and associate it with the default input and pattern-searching space. The template file, in which we transformed new lines and multiple spaces into a single space, is now divided into three parts using pattern matching: the header, the body (aka stanza) and the footer.
/(.+)<\s*!--\s*here\s+starts\s+nosql\s+stanza\s*--->(.+)<\s*!--\s*here
\s+ends\s+nosql\s+stanza\s*--\s*>(.+)/i ;
This search will identify the stanza into the template, using the keyword <!-- here starts nosql stanza --> as the beginning and <--! here ends nosql stanza --> as the end. As you can notice, these are simple HTML comments, so can be introduced easily by our graphic experts. All items before the beginning comment is considered the header, while the rest is the footer.

Before entering the keyword processing in the body, we will do it in the header and footer in order to set proper customer name and total amount due:

$header =~ s/##NAME##/$cusname/;
$footer =~ s/##TOTAL##/$total/;

The final part is the keyword substitution in the stanza. Here we will swap the original variable ($body) with a temporary one ($tmpbody), in order to leave the first unchanged for next loop. Here the fields are split using the method I described earlier, then substituted for the keywords in the template file. Of course, there are thousands of way of writing down this kind of CGI, be it in Perl or other languages. Write one in your favorite language and let your imagination be your guide: databases are plain ASCII files, so you can process them as you like, and you will get great results.

For a REAL example of NoSQL usage on the web, check out http://www.whoswho-sutter.com/, http://annunci-auto.repubblica.it/ and http://www.secondamano.it/ (the first is in English, while the others are in Italian).

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

dbms

srikanth.v's picture

my query is i want the thing as price * qty should give the amount and the total should be calculated for that what is the query in oracle sql(structured query language).

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix