PostgreSQL—the Linux under the Databases

A close look at the PostgreSQL database, including programming interfaces and using it for WWW applications.

One milestone in the spreading of Linux for general use is the availability of office applications. Besides text-processing, spreadsheets and graphic applications, databases are an important part of this type of application. Until now, the big players in the database market like Sybase and Oracle haven't released a Linux version of their products; nevertheless, there are several database packages available for Linux, mostly commercial applications. There is one database which deserves special attention, because it is now developed similarly to Linux. This database is PostgreSQL, formerly known as Postgres95. Last October, version 6.2.1 was published, and this is a good time to take a closer look at this project.

PostgreSQL has a long history beginning in the year 1986 when Michael Stonebraker began its development as a successor to the Ingres database system. The main goal of his project was to show that a relational database system could cope with modern demands of extensibility as well as an object-oriented system. The resulting product was called an object-relational database because it was a relational database system with some object-oriented features such as inheritance and user-defined functions, operators and access methods. In 1994, with the release of version 4.2, the work on Postgres stopped because Stonebraker's project ended. However, some staff members decided to continue with their work and in 1995 released Postgres95 with SQL as the query language rather than PostQuel. Now the development is carried on under the name PostgreSQL by a team of volunteers on the Internet, coordinated by Marc G. Fournier.

The package is not released under the GPL. It has its own license that allows the distribution of modified releases even without source code, as long as the copyright notice remains untouched.

Features of PostgreSQL
  • Client/Server Database for a multi-user environment

  • Networking with TCP/IP

  • Three Authentication methods: Kerberos, host/user based and username/password authentication

  • SQL as query language (It is not fully ANSI SQL92 compliant; options not available are nested subqueries, HAVING clause in an ORDER BY statement, OUTER JOIN, PRIMARY and FOREIGN KEY statements during table creation.)

  • Multiple index types, unique indexes and multi-column indices

  • User-defined functions (SQL, C), operators and data types

  • User-defined sequences and trigger functions

  • Language interfaces for C, C++, Objective-C, Java, Perl, Tcl/Tk and Python

  • Available third party ODBC driver

  • Ported for Linux/Intel, Linux/Solaris, Linux/Alpha, AIX, DEC Alpha AXP, FreeBSD, BSDI, DG/UX, HP-UX, Nextstep, Solaris x86, Solaris Sparc, SunOS Sparc, SGI Irix, SCO, Intel SVR4, Ultrix

The Client/Server Model

Client/Server databases are different from the monolithic systems familiar to users of MS-DOS and Windows machines. It is not sufficient to start only one program to work with the database. There are at least three different applications running simultaneously. One is the server, which in PostgreSQL is called the postmaster. It is a daemon which observes a TCP port (usually port 5432) for a connection from the second: a user client. This user client could be psql, a command-line tool for sending SQL queries that comes with PostgreSQL, or specialized applications, like a CGI program that wishes to store data gathered from an HTML form. Once a client connects, the postmaster starts the third application: the back-end server. This is the actual database engine, because only the back-end server has direct access to the stored records of the database. The postmaster connects the client with the back-end server and then waits for other connections. In the meantime, the client sends its queries to the back-end and receives its reply, usually a lot of data. When the client has no other query, the connection drops and the back-end server exits.

The consequence of this procedure is that the client, the program that the user works with, has no direct access to the database. It doesn't know how many records the back end sends in response to a query. It doesn't know if the inserted data were successfully processed, unless the back end signals success. Therefore, fancy table editors such as those in dBase or Access are difficult to implement and don't currently exist.

Installation

The installation of PostgreSQL is a little tedious at present, but it becomes easier with every release. Version 6.2.1 comes with configure-support and detailed and comprehensive installation instructions with specific notes for Linux. I will mention some possible pitfalls during the installation process. Make sure you have a recent version of flex installed on your system. In particular v2.5.3 does not work with PostgreSQL whereas v2.5.2 and v2.5.4 work fine. For the client psql, you need the developer libraries from GNU Readline, which are not installed by default on some Linux distributions.

On Linux, PostgreSQL is compiled with a shared library named libpq.so, which is used by all client applications. You need to make sure that the dynamic linker finds this library by adding the directory $POSTGRESDIR/lib to the file ld.so.config. Especially with older versions of ld-linux.so (e.g., v1.8.5), it is necessary to also add this directory to the environment variable LD_LIBRARY_PATH. I do this in my rc.local file, which starts up the postmaster, so it looks like this:

echo "starting PostgreSQL postmaster..."
export LD_LIBRARY_PATH=/opt/lib/pgsql/lib
su postgres -c "postmaster -D/opt/lib/pgsql/data \
-d 1 &> /opt/lib/pgsql/postmaster.log &"

In the contrib/linux directory of the source distribution, a more sophisticated startup script named postgres.init can be found, that integrates the postmaster startup into the RedHat runlevel system.

Before starting the postmaster, edit the file pg_hba.conf in the data directory. This file holds information, indicating which hosts and users are allowed to connect to the database back end. The minimal file necessary for local operation is:

# TYPE  DATABASE  IP_ADDRESS  MASK       USERAUTH  MAP
  host   all      127.0.0.1   255.255.255.255   trust
# The above allows any user on the local system to
# connect to any database under any user name.

As mentioned in the documentation, it is essential not to start the postmaster as root, but rather to start it in a special postgres user ID.

For clients to work, they must know where to find the databases. Therefore, you have to set the environment variables PGDATA and PGLIB for every user. I do it in /etc/profile. Additionally, every user who should have access to PostgreSQL needs to have appropriate access permissions in the system databases. This is done with the createuser program, run by the postgres account. createuser offers you the option of giving the user the right to create databases and/or the right to add other users. It is possible to say no to both questions, thus giving the user access only to the existing databases. If you are running PostgreSQL as a WWW database back end, you must set access permissions for the user ID of the web server.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix