When I first began to use Linux for basic Web development, I saw that my three primary tools—Perl, GNU Emacs and Apache—were already included. But, at least one thing was missing, namely a relational database. I had used databases (mostly Sybase) for about a year when I began working with Linux, and knew that I would need a good database server in order to create sophisticated web sites.
I had heard of PostgreSQL and, after learning a bit more about it, decided to install it. Unfortunately, my installation experience was less than pleasant, and I gave up after several hours. Another reason I gave up was that I found another database server, MySQL. MySQL did not have all of the features I wanted in a database, but it was close enough; I implemented many web applications for clients using it.
Things have changed quite a bit in the last five years: MySQL has been rereleased under the GNU General Public License and is slated to include basic support for transactions, while PostgreSQL is now remarkably easy to install and includes a wealth of features and functionality. Both are well known in the Free Software community as powerful programs that can help get your work done.
I still use MySQL for a variety of tasks and expect to continue doing so, but, increasingly, I find that PostgreSQL is a better fit for my needs. This month, we will look at PostgreSQL, starting with its basic features, to create a small web-based application that uses transactions. Along the way, I will try to compare it with MySQL, describing where one product might be better suited than the other.
As of this writing, the latest version of PostgreSQL is 7.0.2, released in the spring of 2000. As with all open-source software, you can download the PostgreSQL source code from the Internet and compile it yourself. My office uses the Red Hat distribution, and I generally prefer to install software with RPMs for easier maintenance. We downloaded the RPMs from the PostgreSQL web site, installed them, and were up and running in almost no time.
Like all modern database systems, PostgreSQL contains a server that can handle connections from multiple clients. Typically, only one computer in a network is designated to be the database server, with the remaining computers configured as clients. (The server is often configured to be a client as well, in order to facilitate debugging and system configuration.) RPMs for the server typically begin with the name “postgresql-server”, while the client RPMs are named “postgresql” followed by the version number.
PostgreSQL clients exist—in source form and as RPMs—for most of the programming languages that people use in designing web applications, including Perl, Python, Java and PHP. If you intend to build any of these from scratch, you will need to install the PostgreSQL development libraries, either from source code or from the RPMs. I compiled Perl and its modules from the source code but, otherwise, took advantage of the RPMs and installed the precompiled binaries.
Like MySQL and many other relational databases, tables within PostgreSQL are grouped into a single “database”, much as files are grouped within a directory. PostgreSQL offers a great deal of flexibility when it comes to security configurations. Databases can allow or deny access based on IP address or user name, while individual tables and other objects allow various levels of access based on the user name. These configurations are performed in the pg_hba.conf file, which was installed by default in /var/lib/pgsql/data on my system.
By default, only one user, called “postgres”, is authorized to create new users or to create new databases. When you first start to use PostgreSQL, you will probably have to use su to change your identity to root, then su again to change your identity to “postgres”. Once you have become the postgres user, you can use the createuser program to create one or more new users. You can then indicate whether they are allowed to create new databases and/or new users.
The psql database client which comes with PostgreSQL is an excellent interactive tool for working with the database. Like the MySQL client program, it uses GNU ReadLine to provide Emacs-compatible keybindings for entering SQL queries directly. psql also provides an extensive number of help commands that begin with backslashes, such as \h (which displays help for any SQL command), \d (which lists available objects and details about those objects), and \l (which lists available databases). I use psql constantly to double-check that program-generated queries worked correctly; it is easy and quick to use.
PostgreSQL implements a large portion of standard SQL and, therefore, is easy to learn if you have previously worked with a relational database. We can create a simple table, as shown in Listing 1.
The serial data type is similar to MySQL's AUTO_INCREMENT tag. It provides us with a unique number each time we insert a new row into the table. serial columns use a PostgreSQL data type called a “sequence” on which the PostgreSQL nextval and currval functions operate. It is also possible to use a sequence directly (see Listing 2).
PostgreSQL distinguishes between single and double quotes, so be sure to say nextval(`people_id') and not nextval("people_id").
Inside of psql, the semicolon is a synonym for \g, meaning “go and execute this query”. Outside of psql, it has no meaning and may even cause an error in your database driver.
PostgreSQL, unlike MySQL, is case insensitive when it comes to table and column names. However, I prefer to follow Joe Celko's capitalization rules for SQL: keywords in ALL CAPS, table names in LeadingCaps, and column names in all_lowercase_with_underscores.
As in MySQL, PostgreSQL allows us to designate which column is the primary key. PostgreSQL also allows for unique columns (and combinations of columns), as well as the creation of indices with the create index statement.
PostgreSQL's data types are slightly different from those in MySQL, but relatively easy to understand if you have worked with another database. varchar and numeric types are supported. Perhaps the most confusing difference between data types in MySQL and PostgreSQL is timestamp. Under MySQL, a timestamp column is automatically set to the value of the latest insert or update. In PostgreSQL, a timestamp column simply contains a date/time value.
PostgreSQL supports many standard SQL functions and many of the extensions that MySQL contains. The difference, of course, is in how they are implemented. For example, MySQL has a regexp function similar like function. PostgreSQL, by contrast, has implemented regexp functionality with the ~ (tilde) operator, perhaps because of Perl's popularity.
PostgreSQL also makes it possible to create new datatypes with the create type function. Indeed, one of PostgreSQL's claims to fame is that it is a hybrid of object-oriented and relational databases. I have never needed to use this functional, but it does seem to be an intriguing and powerful feature.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- RSS Feeds
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Weechat, Irssi's Little Brother
- Senior Perl Developer
- Technical Support Rep
- UX Designer
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?