Beagle SQL, A Client/Server Database for Linux
This is probably the most straightforward piece of the client-server puzzle. The Connection Manager is simply a loop waiting for incoming messages from client processes. First a socket is opened for the “beagled” service (defined in your /etc/services file), so that the Connection Manager can listen for incoming connections. Then an endless loop is entered. Once the Connection Manager receives a signal from a client processes, a call to accept() returns the socket number that the client and server processes will communicate through. At this point, the Connection Manager fork()s the Database Manager, passing the socket number returned by accept(). After the Database Manager is successfully started, the Connection Manager starts listening for the next incoming connection.
The Database Manager does all the work. The basic components of the Database Manager are the expression parser, the query optimizer (currently, no query optimization is done in BSQL), the index manager, the locking manager and the low-level I/O manager. The SELECT statement is the most complex operation performed by the Database Manager. As BSQL supports explicit joins, a single SELECT statement can search through several tables to return the requested information. The expression parser must be intelligent enough to tell which fields in the SELECT list belong to which tables. If you are joining two tables with duplicate field names, the SELECT statement must explicitly state which field belongs to which tables. Wild cards are allowed. When the expression parser sees wild cards in the field list, it inserts the appropriate field names into the list.
There are three examples in the sidebar to give you an idea of how the expression parser does its work. The statement in Example 3 fails because field1 is ambiguous. The expression parser can't tell if it belongs to table A or table B as both have a field called field1.
When joining tables, the SELECT statement's WHERE clause can contain several different parts that need to be treated separately. When joining two tables these parts can include the conditionals for the first table, the conditionals for the second table and the join condition. The Database Manager searches each of the two tables using the appropriate conditionals from the WHERE clause. Next, it joins the two tables into a temporary table using the fields in the SELECT field list as well as the fields used in the join condition. Last, the records in the temporary table are matched with the join condition and the appropriate records are made available for retrieval by the client process.
This operation can get quite complicated and time consuming when dealing with large and multiple tables. This is where the query optimizer comes in. Its purpose is to determine the most efficient order to search and join the tables. BSQL currently doesn't do query optimization and joins the tables from left to right as they appear in the SELECT statement. This leaves it up to the person writing the SQL statement to put some thought into the order the tables appear in the join.
When performing searches, the Database Manager uses a set of low-level I/O routines to retrieve records from the database. Most commercial database vendors use proprietary file systems to house their databases. In the case of BSQL, the Linux file system sufficed. A future enhancement will be a flexible file format that can allow for such things as BLOBs, images, text documents and anything else. (A BLOB is a large binary datatype used to store images, sound bites, programs, etc. in a database table.)
The method used to store these variable-length records will significantly impact the performance of the Database Manager. When a record is written to the database, it is broken down into fixed-size segments. The database administrator can set the size of these segments for each database. If a record containing 850 bytes is written to a table that uses 256 byte segments, it is broken down into four segments that are chained together. If at a later time the record size is changed to 1200 bytes, an additional segment is added to the chain. If the record is reduced to 700 bytes, the unneeded segments are marked for reuse. One drawback here is that over time the database can become fragmented. Routine maintenance using a de-fragmenting utility should be performed on databases that see a lot of UPDATEs and DELETEs. This utility will be provided with the first official release of BSQL v1.0.
Hopefully, I've given you some insight into how client server databases work and the many late nights that go into their development. For more information on Beagle SQL, point your web browser to http://www.beaglesql.org/. Here you can follow its development history from day one to present as well as download the code to try it out for yourself. Also, be sure to look into the other freely available database resources on the Web.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Android's Limits
- Reachli - Amplifying your
36 min 57 sec ago
1 hour 25 min ago
- good point!
1 hour 28 min ago
- Varnish works!
1 hour 37 min ago
- Reply to comment | Linux Journal
2 hours 7 min ago
- Reply to comment | Linux Journal
4 hours 33 min ago
- Reply to comment | Linux Journal
8 hours 33 min ago
- Yeah, user namespaces are
9 hours 49 min ago
- Cari Uang
13 hours 20 min ago
- user namespaces
16 hours 14 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?