PostgreSQL—the Linux under the Databases
One milestone in the spreading of Linux for general use is the availability of office applications. Besides text-processing, spreadsheets and graphic applications, databases are an important part of this type of application. Until now, the big players in the database market like Sybase and Oracle haven't released a Linux version of their products; nevertheless, there are several database packages available for Linux, mostly commercial applications. There is one database which deserves special attention, because it is now developed similarly to Linux. This database is PostgreSQL, formerly known as Postgres95. Last October, version 6.2.1 was published, and this is a good time to take a closer look at this project.
PostgreSQL has a long history beginning in the year 1986 when Michael Stonebraker began its development as a successor to the Ingres database system. The main goal of his project was to show that a relational database system could cope with modern demands of extensibility as well as an object-oriented system. The resulting product was called an object-relational database because it was a relational database system with some object-oriented features such as inheritance and user-defined functions, operators and access methods. In 1994, with the release of version 4.2, the work on Postgres stopped because Stonebraker's project ended. However, some staff members decided to continue with their work and in 1995 released Postgres95 with SQL as the query language rather than PostQuel. Now the development is carried on under the name PostgreSQL by a team of volunteers on the Internet, coordinated by Marc G. Fournier.
The package is not released under the GPL. It has its own license that allows the distribution of modified releases even without source code, as long as the copyright notice remains untouched.
Client/Server Database for a multi-user environment
Networking with TCP/IP
Three Authentication methods: Kerberos, host/user based and username/password authentication
SQL as query language (It is not fully ANSI SQL92 compliant; options not available are nested subqueries, HAVING clause in an ORDER BY statement, OUTER JOIN, PRIMARY and FOREIGN KEY statements during table creation.)
Multiple index types, unique indexes and multi-column indices
User-defined functions (SQL, C), operators and data types
User-defined sequences and trigger functions
Language interfaces for C, C++, Objective-C, Java, Perl, Tcl/Tk and Python
Available third party ODBC driver
Ported for Linux/Intel, Linux/Solaris, Linux/Alpha, AIX, DEC Alpha AXP, FreeBSD, BSDI, DG/UX, HP-UX, Nextstep, Solaris x86, Solaris Sparc, SunOS Sparc, SGI Irix, SCO, Intel SVR4, Ultrix
Client/Server databases are different from the monolithic systems familiar to users of MS-DOS and Windows machines. It is not sufficient to start only one program to work with the database. There are at least three different applications running simultaneously. One is the server, which in PostgreSQL is called the postmaster. It is a daemon which observes a TCP port (usually port 5432) for a connection from the second: a user client. This user client could be psql, a command-line tool for sending SQL queries that comes with PostgreSQL, or specialized applications, like a CGI program that wishes to store data gathered from an HTML form. Once a client connects, the postmaster starts the third application: the back-end server. This is the actual database engine, because only the back-end server has direct access to the stored records of the database. The postmaster connects the client with the back-end server and then waits for other connections. In the meantime, the client sends its queries to the back-end and receives its reply, usually a lot of data. When the client has no other query, the connection drops and the back-end server exits.
The consequence of this procedure is that the client, the program that the user works with, has no direct access to the database. It doesn't know how many records the back end sends in response to a query. It doesn't know if the inserted data were successfully processed, unless the back end signals success. Therefore, fancy table editors such as those in dBase or Access are difficult to implement and don't currently exist.
The installation of PostgreSQL is a little tedious at present, but it becomes easier with every release. Version 6.2.1 comes with configure-support and detailed and comprehensive installation instructions with specific notes for Linux. I will mention some possible pitfalls during the installation process. Make sure you have a recent version of flex installed on your system. In particular v2.5.3 does not work with PostgreSQL whereas v2.5.2 and v2.5.4 work fine. For the client psql, you need the developer libraries from GNU Readline, which are not installed by default on some Linux distributions.
On Linux, PostgreSQL is compiled with a shared library named libpq.so, which is used by all client applications. You need to make sure that the dynamic linker finds this library by adding the directory $POSTGRESDIR/lib to the file ld.so.config. Especially with older versions of ld-linux.so (e.g., v1.8.5), it is necessary to also add this directory to the environment variable LD_LIBRARY_PATH. I do this in my rc.local file, which starts up the postmaster, so it looks like this:
echo "starting PostgreSQL postmaster..." export LD_LIBRARY_PATH=/opt/lib/pgsql/lib su postgres -c "postmaster -D/opt/lib/pgsql/data \ -d 1 &> /opt/lib/pgsql/postmaster.log &"
In the contrib/linux directory of the source distribution, a more sophisticated startup script named postgres.init can be found, that integrates the postmaster startup into the RedHat runlevel system.
Before starting the postmaster, edit the file pg_hba.conf in the data directory. This file holds information, indicating which hosts and users are allowed to connect to the database back end. The minimal file necessary for local operation is:
# TYPE DATABASE IP_ADDRESS MASK USERAUTH MAP host all 127.0.0.1 255.255.255.255 trust # The above allows any user on the local system to # connect to any database under any user name.
As mentioned in the documentation, it is essential not to start the postmaster as root, but rather to start it in a special postgres user ID.
For clients to work, they must know where to find the databases. Therefore, you have to set the environment variables PGDATA and PGLIB for every user. I do it in /etc/profile. Additionally, every user who should have access to PostgreSQL needs to have appropriate access permissions in the system databases. This is done with the createuser program, run by the postgres account. createuser offers you the option of giving the user the right to create databases and/or the right to add other users. It is possible to say no to both questions, thus giving the user access only to the existing databases. If you are running PostgreSQL as a WWW database back end, you must set access permissions for the user ID of the web server.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Technical Support Rep
- Senior Perl Developer
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- Weechat, Irssi's Little Brother
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?