Open-Source Databases, Part III: Choosing a Database
MySQL and PostgreSQL offer a very large number of built-in features, many of which have been added in response to community requests and reactions. Both offer a large number of data types, which can be mixed and matched within a single row without restrictions. (The very limiting row-length restriction that plagued versions of PostgreSQL is now ancient history, I'm happy to say.) Both databases now support Unicode characters; MySQL supports both UCS-2 and UTF-8 encoding, and PostgreSQL supports only the latter.
Both databases also offer a very large number of functions that manipulate data, including strings and dates. It's quite convenient to be able to compare and sort dates or to find all rows whose timestamp was within the last 24 hours. PostgreSQL's interval data type, which describes a length of time (rather than a particular point in time), has proven to be particularly useful. MySQL has a number of different types that database purists like myself frown upon, such as SET and ENUM, but that are undoubtedly popular for many users.
In many areas where PostgreSQL has had an advantage, MySQL is beginning to catch up. PostgreSQL users have long been able to create new data types and functions that operate on those types. Indeed, PostgreSQL offers developers the unusual ability to write server-side functions in a number of languages, including SQL, Perl, Python, Java, Tcl and the R statistical language. MySQL does not allow for the creation of new data types, but recent versions do provide the ability to write server-side functions and stored procedures.
MySQL has offered a built-in solution for full-text search, accomplished by using a special type of index on text fields. However, there are some important restrictions on this index, such as the fact that it works only with MyISAM tables. Given that these tables support neither foreign keys nor transactions, I am a bit nervous about suggesting them as a solution.
PostgreSQL's full-text search solution (tsearch2) has the opposite problem. Although it is robust and works well within PostgreSQL's standard transactional tables, it requires some work to configure and install. Most administrators and programmers will be able to install it successfully within a short period of time, but nonetheless, there is a difference between a built-in capability and one that needs to be added.
PostgreSQL has a number of built-in features that MySQL either has yet to implement or that are scheduled for future releases. Among these are the ability to use subselects anywhere in a query, the use of sequences (rather than simple auto-increment columns), rules that allow users to modify the way queries are interpreted on a given table and CHECK constraints on column values. Recent versions of MySQL now include features that were previously available only in PostgreSQL, such as triggers and views.
In general, the PostgreSQL development group seems to emphasize SQL standards more than MySQL does, although the MySQL developers appear to be increasingly sensitive to this need and now offer an --ansi command-line switch for those people who want to work in a standards-compatible mode all of the time.
Both MySQL and PostgreSQL are extremely easy to use. Each comes with a command-line client program that is packed with features, allowing you to manipulate your database by sending SQL queries. I have become spoiled by some of the features of the PostgreSQL command line, such as the expanded output (\x).
The command-line interfaces for both databases have grown more useful over time. Although the MySQL interface might appear to have fewer commands, that's partly because MySQL has made some data available via SQL queries (for example, SHOW TABLES), which would require more complicated queries in PostgreSQL, leading to the creation of a shorthand command, \dt. Both command-line interfaces use GNU readline, making it easy to edit and re-issue queries. Both also allow users to edit the previous query using the \e command.
Overall, it's probably fair to say that PostgreSQL offers a superset of MySQL's capabilities, aside from a few issues (for example, built-in text indexing). Those capabilities that PostgreSQL does not have, such as new data types and functions, are added into the system easily, without needing to recompile or otherwise modify the core PostgreSQL server. That said, I believe MySQL's capabilities are nothing to sneeze at and are likely more than adequate for most applications you might be writing.
Both MySQL and PostgreSQL are amazingly easy to administer, especially in small- and medium-size cases. You (optionally) change a few configuration options, start the server and then walk away. There's really not much more to do than that. For anyone who has worked with a larger database system, such as Oracle, this is a refreshing change. However, there are slight differences in the ways the two systems operate.
PostgreSQL relies on several external UNIX-level commands to create and manage databases and users, as well as the activity of the PostgreSQL server. There is no central PostgreSQL administrative program. MySQL, by contrast, has a central mysqladmin program that handles most functions having to do with server startup and shutdown, as well as the creation and destruction of databases. The creation and management of users is handled by manipulating tables in the mysql database.
PostgreSQL's counterparts to the mysql database are special system tables and views, all of which begin with the pg_ prefix. These tables, although necessary for the system to run, easily can be ignored by most programmers and come into play only when trying to tune the system or figure out how to optimize queries.
GUI-based administration tools are available for both programs, as well as Web-based tools written in PHP. To be honest, I haven't used these tools much during the years, given my familiarity with (and preference for) command-line systems for working with databases. However, my experience with both sets of GUI programs has been positive, and my impression is that they are both stable and secure, as well as useful.
Another aspect of administration unique to PostgreSQL is the need to “vacuum” dead rows from the database to return them to the operating system or to other rows that could benefit from the space. In addition, PostgreSQL's vacuum function visits the rows of the dead and uses the statistics it collects to inform the optimizer and query planner. Nowadays, the auto-vacuum dæmon takes care of this automatically for most people, removing the long-dreaded need to schedule it in cron.
One administrative area that is particularly hot right now is replication. Many Web sites and other applications are pushing the limits of their database servers, and it would be useful to split the work among multiple servers. Of course, this raises issues of data integrity and synchronization among distributed processes. The simple solution to the problem is to have a master/slave relationship among the different servers, with UPDATEs and INSERTs taking place only on the master server, and SELECTs taking place on the slave servers. Solutions for this exist under both MySQL and PostgreSQL, although the PostgreSQL solution (Slony) is external to the standard package and apparently can be difficult to install and configure.
A more complicated setup involves the use of two master database servers. MySQL appears to have taken the lead on this front with a relatively new clustering tool. But, PostgreSQL users, who have been clamoring for such tools for several years now, appear to be on the verge of getting their wishes fulfilled.
Finally, no database server would be worthwhile if it weren't possible to perform regular backups. pg_dump and mysqldump are command-line programs that turn the current contents of a database into a text file. Such dump files are quite useful and can be used to rebuild the database when necessary.
I would argue that when it comes to administration, the two database products are identical—unless you need replication, in which case you'll probably benefit from MySQL's greater experience and replication integration.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
- Tech Tip: Really Simple HTTP Server with Python
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- SourceClear Open
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide