In a world that appears to be governed by Murphy's Laws, anything can go wrong. Accidents ranging from machine crashes, media failures, operator errors and random data corruption to such catastrophes as floods and earthquakes can result in lost data, both temporarily and permanently. The risk of data accidents cannot be eliminated, so one must plan to minimize it. There are certain known techniques for this.
Users can tolerate varying intervals of loss of data availability. An e-business may not tolerate more than a few minutes of down time, but a user may tolerate lack of access to stored clip art for a week. Data becomes unavailable when any critical component fails, such as power, processor, memory, or disk. If a copy of the data is available, it may be possible to restore access to data from that copy. If the copy is off-line, such as on tape, it may take several minutes or hours to restore it to disk. If the copy is on disk, access can be failed over in seconds or minutes. If no data copy is available, and it is not possible to reconstruct the data, we have permanent data loss.
The key to protecting data is to have more than one copy. As data keeps changing, the copy must change accordingly. Disk mirroring is a technique that keeps the data copy always up to date. Mirroring works well if both disks are equally fast. If one disk is connected over a network, however, the application slows down because it must wait until the data updates are completed on both disks.
Keeping a copy at a distance somewhere off-site helps data survive accidents that cause large-scale damage. This technique is called replication, discussed next.
Replication is a technique of maintaining identical physical copies or replicas of a master set of data at two or more geographically separate locations. The most common data replication techniques are point-in-time copies and real-time copies. Point-in-Time copies involve capturing snapshots of any critical data and storing them safely at a remote location. Snapshots can be taken on tape, and in the event of a disaster, data is restored from the tape. This technique has several drawbacks, though; the recovered data typically is 24 hours to a few days old. Also, the time needed for a snapshot capture, as well as that for recovery is lengthy, causing longer application outages.
Real-time copy propagates updates to the copy immediately or soon after they are applied to the original set of data. These techniques fall into two categories.
Synchronous Replication: duplicates every write over several disks or volumes and blocks the original write until all updates have been completed successfully. Therefore, the performance impact on the application is directly proportional to network speeds and distance. It thus can be used only over small distances and is practical only for fast and reliable local networks. This practice commonly is employed by enterprises that cannot afford any data loss, such as banks and stock exchanges.
Asynchronous Replication: Snapshots and synchronous replication fall on opposite ends of the replication time versus recovery time spectrum. Asynchronous replication compromises some timeliness of data for higher performance and minimal application impact. It decouples application writes from replication writes. Application writes return soon after the replicator has logged them. The main advantage is the imperceptible impact on the application performance; hence, replication can take place over larger distances. A major challenge in asynchronous replication, though, is the write sequencing, write order fidelity problem.
Pratima (meaning reflection or image in Sanskrit) provides block-level, real-time replication of one or more block devices on a client computer. The devices are replicated to a server computer. A local device (say /dev/sda4) is placed under control of Pratima, which then offers access through its own block device (say /dev/srr0). In addition to hard disk partitions, any underlying block device, such as a logical volume manager (LVM) device, can be replicated.
Pratima provides methods for initial synchronization, fast on-line resynchronization and automatic reconnection. The product also supports chaining for higher flexibility and reliability.
Pratima software components run on both client and server computers. A Pratima device driver captures updates on the client computer, and a dæmon on the server computer receives replication data over the network and writes it down to replica devices.
The client module is a stacked device driver interposed below the filesystem and above the storage device driver. The driver exports a block interface and can be accessed through system calls, including open, close, read, write and stat. Additionally, it supports ioctls for such control operations as enable, disable, clean, kill, reconnect and status.
Figure 1: Client Side Data Flow for Asynchronous Replication Mode
The server side listener module basically is a user-space daemon that passively waits for client side packets to arrive. These packets correspond to the different system calls and ioctls the client interface supports. For example, an enable ioctl corresponds to an enable packet. Upon receiving one, the server enables the remote volume for replication. Similarly, if a write packet arrives, it writes it down to the remote volume and returns the success status of this operation to the client.
Figure 2: Server Side Data Flow
A filesystem may be mounted on a replicator device and initially synchronized with the remote volume. Once the local and remote volumes are in sync, reads and writes are directed to the replicator device driver. It treats reads as transparent and passes them on to the underlying device driver. On the other hand, all incoming writes are bifurcated. The block number is recorded in memory and also replicated on the remote server. The write then is passed to its respective driver; if successful, it is queued and sent over to the remote server.
The design for Pratima had to address several interesting issues, which are described below.
1. Write Order Fidelity
Write order fidelity means the writes on the replicated device must be applied in exactly the same order as on the original device. If this ordering is not preserved, the replica may not be usable. FIFO queues containing private data buffers had to be used to provide write order fidelity.
2. Block Number Logging for Fast Resynchronization
What if the network or server fails for a while, but the client computer is functional? The Pratima driver queues some number of block writes, but if the buffers cannot be flushed out to the replica, the queue fills up. Now, it is not desirable to block the application until the server becomes accessible. The driver gives up at this point, allowing the replica to go out of sync.
Bringing the replica back in sync can be painful and generally requires stopping the application. Block number logging can be used to speed up resynchronization. All the block numbers of blocks to be written are logged to disk. Then, resynchronization is accomplished quickly by replicating only the logged blocks. However, logging block numbers consumes local device bandwidth.
My solution is based on the reasonable assumption that the client machine never undergoes transient failures. This solution uses an in-memory list called the block write table (BWT), in which only the block numbers for all in-flight writes are stored. Thus, if a network outage causes the queue to overflow and loose write data, we can read these blocks from the local volume and replicate them as soon as possible.
3. Recovery and Fail Over
If client machine crashes, we lose the queue and the block write table. The client has to undergo a complete synchronization to make all replicas consistent.
If the server machine or the network suffers a transient failure, we then can use the block write table (BWT) on the client side for resynchronization. However if the server or network outage is long enough to overflow the BWT, the situation cannot be saved. Complete synchronization is required before replication can be restarted.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide