Linux and Databases
One of the things often forgotten when new Linux users are considered is the fact that many of these people are also, for all intents and purposes, new to computers. By this, I don't mean that they have never used a computer, or don't know what a hard drive is or what RAM means. What I mean is effective knowledge about computers: what it is that really makes them work, what the different components do to and for each other, and what makes one computer different from another. If there is one drawback in the current rage over web appliances, it is in the fact that users are becoming more and more alienated from the inner workings of the device. A situation that leads to more power, perhaps, but less empowerment, and a sad combination of increased ignorance and increased dependency.
This situation isn't necessarily bad for most users. In the same way that most people don't really care what is under the hoods of their automobiles (as long as someone will spend a week fixing their car when it breaks down), most people don't want or need to know what's under the gray shell of their PC. But for those who are curious, those who do want to have some leverage of information against a cold, speechless machine, Linux represents an opportunity to learn about not just a new, different operating system, but also a chance to, in a sense, learn about computers all over again. And often from the ground up.
This week on Linux Buzz, I want to talk about databases.
A database is really any set of structured data or information. An address book, for example, or the American League team standings could both be considered "databases". Once we understand what a database is, fundamentally speaking, it becomes easier to scale upwards toward more and more complex databases, such as a financial general ledger or the stock charts in the Wall Street Journal. When we consider the immense volumes of information handled by computers, from the days of mainframes to the days of embedded systems, we also need to consider increasingly complex and powerful ways to enter, store and retrieve data. Computerized databases, unlike the "databases" mentioned above, provide this power to arrange and manipulate in ways far more versatile, efficient and powerful than ever before.
However, computerized databases tend to be only part of an overall system of data storage, management and retrieval. The entire system is often referred to as a Database Management System or DBMS. A DBMS is a set of frequently complex software programs that are responsible for two main tasks. First, a DBMS controls and manages the overall storage, configuration and retrieval of data in a database. Second, the Database Management System is responsible for ensuring the security and integrity of the database, by granting and restricting access based on the guidelines provided by the administrator of the database. Very often, when someone mentions a new database or talks about the open sourcing of a database (which we will soon talk about ourselves), that person is talking about the Database Management System as a whole, not just the database component.
There are three categories of database worth looking at that are, or have been, in common usage. Importantly, many contemporary DBMSes may offer one, two or a combination of all three databasing methods.
Hierarchical databases. These are the oldest of the modern computer databases (as opposed to the ancient computer databases, right?) and may be what most people, when asked, imagine a database to be. A hierarchical database is arranged something like an inverted family tree; access to the database begins at the top of the hierarchy and moves downward. For example, moving from "author" to "works" or "articles" would represent the access pathway in a hierarchical database. The links between these records tend to be fixed ahead of time, either as data is entered or when the database parameters are being established. This contrasts strongly with the relational databases discussed below, where links or "relationships" among data, fields and records is not determined until the data is retrieved. What hierarchical databases once gained in speed by having fixed connections, they lost in flexibility and overall efficiency. And, as computer processors became faster and programming languages more capable, the speed advantage of hierarchical databases became less and less of an advantage. Hierarchical databases were the first databases run on mainframe computers, and as mainframes have given way to more efficient computers, so have hierarchical databases been replaced largely by relational databases.
Relational databases. Relational databases are all the rage in terms of database efficiency and power. In fact, most of the databases mentioned toward the end of this discussion - PostgreSQL, Interbase, IBM DB2 - are relational databases. The most important thing to keep in mind about relational databases is that whereas hierarchical databases use fixed relationships between data forms, relational databases are able to compare characteristics of data, fields or records to make new files that may match the criteria of the data retriever. By way of analogy, if a hierarchical database represents "getting out only what you put in (and only in the form in which the data was entered)", then a relational database offers the ability to get, effectively, information that was not specifically "entered" into the database. While this might sound complicated (which it is, somewhat) from the point of the data retriever, a relational database can be a godsend. For example, where a hierarchical database will tell you that "Tom" placed X number of orders for Y number of Z products, a relational database would be able to tell you that "Dick" and "Harry" also bought Z products, or that those who purchased Z products tended to make X number of orders over a given period of time. It should be said that, given the range over which relational databases operate, relational databases can tend to be slow. Because of this, a variety of indexing techniques are used to target "key fields" that may be consulted frequently. Such indexing can significantly speed up the data delivery of relational databases.
Network databases. There are a number of definitions for network databases, but the one that is probably most relevant here is the one that explains network databases as data organization methods in which data relationships have a "net-like" or hyperlinked character. By this, I mean that one item in a network database can link to a number of data elements, and can itself be linked and be a variety of data elements. In some ways, a network database resembles a set of hierarchical databases, save for the ability of information to move in more than one direction.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Control Your Linux Desktop with D-Bus
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Google's SwiftShader Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide