NoSQL works with UNIX, not in addition to it. Set correct permissions to files using groups: create a group for those users who can access these files. This is a great security wall.
No matter what program you have to deal with, sooner or later you will have to deal with performance. I'm not a developer of the NoSQL Database System, but I can give you some useful advice.
First of all, keep your tables small. Don't keep all the data in a single table, this is a waste of performance. Using the method that best suits your environment, try splitting tables into several files and organizing them into directories. In our examples, we can keep track of our customers creating a single “phone directory” (i.e., customer.rdb), then creating a directory for each order status (received, waiting_for_bill, archive), and last a table for each customer (the file name will be the customer code). For example:
/var/tables/acmeinc/ customer.rdb catalog.rdb received/ ACM001.rdb ACM003.rdb w4bill/ ACM002.rdb ACM003.rdb archive/ ACM001.rdb ACM002.rdb ACM003.rdb
If you must do everything in a big table and you have to update it frequently, there's a trick for you if it's an indexed file: journaling.
Create a journaling table, say customer.j, with exactly the same header as customer.rdb, but containing only those records which we want to insert into, remove from or append to customer.rdb. The entries in customer.j must be in a format suitable for the nosql merge command.
Whenever we fetch data from customer.rdb we will have to do it in three logical steps. The first step is to use search on bigtable to take advantage of the indices. This will produce an intermediate output that will then be merged into customer.j, and the final output will undergo the original query statement again.
Any updates to customer.rdb will be done to customer.j with the syntax described in the documentation of the merge command (there you'll find how to use the operator to the optimum level). You will also have to make sure that customer.j is kept sorted on its primary key field after each update. For example, if you have an index on the CODE column in the customer.rdb table, you should use:
echo -e "CODE\n----\nACM004" | nosql search -ind customer.x.CODE |\ nosql merge CODE NAME PHONE EMAIL customer.j | \ nosql row 'CODE=="ACM004"'
As you can see, the trick is:
Perform an indexed search on customer.rdb to quickly obtain a much smaller (possibly empty) subset of the table.
Merge the first output with customer.j on the fly during the query.
Do a sequential post-query on the final output.
Create the following hard link: ln /bin/ash /bin/ah
Modify AWK to make it use /bin/ah as opposed to /bin/sh, and write the modified AWK to /usr/local/bin/nsq-mawk: sed 's/\/bin\/sh/\/bin\/ah/g' /usr/bin/mawk\> /usr/local/bin/nsq-mawkchmod 755 /usr/local/bin/nsq-mawk
Modify the NSQAWK value in the config file (/usr/local/lib/nosql/nosql.conf or $HOME/.nosql.conf) to /usr/local/bin/nsq-mawk. This will speed up your queries !
Using NoSQL on the Web is a matter of seconds. Let's suppose that Acme Tools Inc. now has a web site, and you want your customer to search on database for their pending orders. We first create a small input form, the file getname.html show in Listing 2, on which we ask for customer name. I did not use any security at all in this example, but at least password should be asked in a production environment.
Since we are not good at graphics, we create a small template (result.html) that can be modified easily. This template, result.html, is shown in Listing 3.
In this template, some keywords are substituted by the CGI: our standard keywords start and end with a double hash (#) sign, for example ##KEYWORD##.
A special section of the template, named stanza, will be repeated as many times as the number of query rows. The stanza starts and ends with special comments that will be recognized by the CGI.
Now is the time for writing the CGI: Listing 4, result.pl, is a perl script that should perform the queries based on the input name and then creates a resulting page based on the previous template. Most of the queries used in this CGI are those used in previous examples, so we won't repeat them. Just have a look to the main query:
@cusdata = `nosql cat $datafile | nosql join -j PROD - $ctlgfile | nosql addcol SUBTOTAL | nosql compute 'SUBTOTAL = QTY*PRICE' | nosql column PROD DESC QTY PRICE SUBTOTAL | nosql body`;
The thing to notice is the array that will contain query rows: each row contains a tab separated list of fields, as the NoSQL table row specification. In fact, in the stanza keyword substitution, a split function against tab is used:
my ($prod, $desc, $qty, $price, $subtotal) = split(/\t/, $data);All queries run with back ticks that, instead of the exec() and system(), return the STDOUT of the program. This output may be reused in the program using variables. A negative fact of this is security: you cannot run this program in tainted mode (-T switch in the perl command line), but this can be avoided with some tricks such as the ones I used. First of all, you should avoid buffer overruns by using a substring function ($cusname = substr($cusname, 0, 50)), then keeping out some escape characters (such as > Once the queries have been executed and we have all the necessary values, we load the template file and associate it with the default input and pattern-searching space. The template file, in which we transformed new lines and multiple spaces into a single space, is now divided into three parts using pattern matching: the header, the body (aka stanza) and the footer.
/(.+)<\s*!--\s*here\s+starts\s+nosql\s+stanza\s*--->(.+)<\s*!--\s*here \s+ends\s+nosql\s+stanza\s*--\s*>(.+)/i ;This search will identify the stanza into the template, using the keyword <!-- here starts nosql stanza --> as the beginning and <--! here ends nosql stanza --> as the end. As you can notice, these are simple HTML comments, so can be introduced easily by our graphic experts. All items before the beginning comment is considered the header, while the rest is the footer.
Before entering the keyword processing in the body, we will do it in the header and footer in order to set proper customer name and total amount due:
$header =~ s/##NAME##/$cusname/; $footer =~ s/##TOTAL##/$total/;
The final part is the keyword substitution in the stanza. Here we will swap the original variable ($body) with a temporary one ($tmpbody), in order to leave the first unchanged for next loop. Here the fields are split using the method I described earlier, then substituted for the keywords in the template file. Of course, there are thousands of way of writing down this kind of CGI, be it in Perl or other languages. Write one in your favorite language and let your imagination be your guide: databases are plain ASCII files, so you can process them as you like, and you will get great results.
For a REAL example of NoSQL usage on the web, check out http://www.whoswho-sutter.com/, http://annunci-auto.repubblica.it/ and http://www.secondamano.it/ (the first is in English, while the others are in Italian).
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Astronomy for KDE
- Understanding Ceph and Its Place in the Market
- Maru OS Brings Debian to Your Phone
- OpenSwitch Finds a New Home
- Git 2.9 Released
- SoftMaker FreeOffice
- What's Our Next Fight?
- The Giant Zero, Part 0.x
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide