Linux and Databases

A preview and summary of the May 2 episode of Phil Hughes' weekly radio program featured on Wall Street News Hour.

One of the things often forgotten when new Linux users are considered is the fact that many of these people are also, for all intents and purposes, new to computers. By this, I don't mean that they have never used a computer, or don't know what a hard drive is or what RAM means. What I mean is effective knowledge about computers: what it is that really makes them work, what the different components do to and for each other, and what makes one computer different from another. If there is one drawback in the current rage over web appliances, it is in the fact that users are becoming more and more alienated from the inner workings of the device. A situation that leads to more power, perhaps, but less empowerment, and a sad combination of increased ignorance and increased dependency.

This situation isn't necessarily bad for most users. In the same way that most people don't really care what is under the hoods of their automobiles (as long as someone will spend a week fixing their car when it breaks down), most people don't want or need to know what's under the gray shell of their PC. But for those who are curious, those who do want to have some leverage of information against a cold, speechless machine, Linux represents an opportunity to learn about not just a new, different operating system, but also a chance to, in a sense, learn about computers all over again. And often from the ground up.

This week on Linux Buzz, I want to talk about databases.

What are databases?

A database is really any set of structured data or information. An address book, for example, or the American League team standings could both be considered "databases". Once we understand what a database is, fundamentally speaking, it becomes easier to scale upwards toward more and more complex databases, such as a financial general ledger or the stock charts in the Wall Street Journal. When we consider the immense volumes of information handled by computers, from the days of mainframes to the days of embedded systems, we also need to consider increasingly complex and powerful ways to enter, store and retrieve data. Computerized databases, unlike the "databases" mentioned above, provide this power to arrange and manipulate in ways far more versatile, efficient and powerful than ever before.

However, computerized databases tend to be only part of an overall system of data storage, management and retrieval. The entire system is often referred to as a Database Management System or DBMS. A DBMS is a set of frequently complex software programs that are responsible for two main tasks. First, a DBMS controls and manages the overall storage, configuration and retrieval of data in a database. Second, the Database Management System is responsible for ensuring the security and integrity of the database, by granting and restricting access based on the guidelines provided by the administrator of the database. Very often, when someone mentions a new database or talks about the open sourcing of a database (which we will soon talk about ourselves), that person is talking about the Database Management System as a whole, not just the database component.

There are three categories of database worth looking at that are, or have been, in common usage. Importantly, many contemporary DBMSes may offer one, two or a combination of all three databasing methods.

  • Hierarchical databases. These are the oldest of the modern computer databases (as opposed to the ancient computer databases, right?) and may be what most people, when asked, imagine a database to be. A hierarchical database is arranged something like an inverted family tree; access to the database begins at the top of the hierarchy and moves downward. For example, moving from "author" to "works" or "articles" would represent the access pathway in a hierarchical database. The links between these records tend to be fixed ahead of time, either as data is entered or when the database parameters are being established. This contrasts strongly with the relational databases discussed below, where links or "relationships" among data, fields and records is not determined until the data is retrieved. What hierarchical databases once gained in speed by having fixed connections, they lost in flexibility and overall efficiency. And, as computer processors became faster and programming languages more capable, the speed advantage of hierarchical databases became less and less of an advantage. Hierarchical databases were the first databases run on mainframe computers, and as mainframes have given way to more efficient computers, so have hierarchical databases been replaced largely by relational databases.

  • Relational databases. Relational databases are all the rage in terms of database efficiency and power. In fact, most of the databases mentioned toward the end of this discussion - PostgreSQL, Interbase, IBM DB2 - are relational databases. The most important thing to keep in mind about relational databases is that whereas hierarchical databases use fixed relationships between data forms, relational databases are able to compare characteristics of data, fields or records to make new files that may match the criteria of the data retriever. By way of analogy, if a hierarchical database represents "getting out only what you put in (and only in the form in which the data was entered)", then a relational database offers the ability to get, effectively, information that was not specifically "entered" into the database. While this might sound complicated (which it is, somewhat) from the point of the data retriever, a relational database can be a godsend. For example, where a hierarchical database will tell you that "Tom" placed X number of orders for Y number of Z products, a relational database would be able to tell you that "Dick" and "Harry" also bought Z products, or that those who purchased Z products tended to make X number of orders over a given period of time. It should be said that, given the range over which relational databases operate, relational databases can tend to be slow. Because of this, a variety of indexing techniques are used to target "key fields" that may be consulted frequently. Such indexing can significantly speed up the data delivery of relational databases.

  • Network databases. There are a number of definitions for network databases, but the one that is probably most relevant here is the one that explains network databases as data organization methods in which data relationships have a "net-like" or hyperlinked character. By this, I mean that one item in a network database can link to a number of data elements, and can itself be linked and be a variety of data elements. In some ways, a network database resembles a set of hierarchical databases, save for the ability of information to move in more than one direction.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState