VIA PadLock—Wicked Fast Encryption
Probably everyone who has used encryption soon realised that the demand for processor power grew instantly. On older systems, the trade-off for using encrypted filesystems is slower file operations; on newer systems, the trade-off is, at minimum, significantly higher CPU loads. Encrypting network traffic with the IPsec protocol also slows things down, and sometimes you may encounter performance problems even on the standard 100Mbps network.
Options exist, however, for working around these encryption/performance trade-offs:
Don't encrypt: apparently the cheapest solution, but this can become very expensive in the long run.
Accept the slowdown: the typical approach.
Use a standalone cryptography accelerator: a PCI card, for example, doesn't help as much as you might expect, however, because the data must traverse the PCI bus more often than necessary.
Use a CPU with VIA PadLock technology. What's VIA PadLock? Read on.
A while back, VIA introduced a simple but slightly controversial approach: select some cryptographic algorithms and wire them directly in to the CPU. The result was the introduction of an i686 class processor that understands some new instructions dedicated to cryptographic functions. This technology is called VIA PadLock, and the processor is fully compatible with AMD Athlons and Intel Pentiums.
The PadLock features available on your machine's processor are determined by its version. Processor versions usually are written as a family-model-stepping (F/M/S) triplet. Family is always 6 for i686 class CPUs. If the model is 9, your CPU has a Nehemiah core; if the model is 10, it has an Esther core. The stepping denotes a revision of each model. You can find your processor's version in /proc/cpuinfo.
Nehemiah stepping 3 and higher offers an electrical noise-based random number generator (RNG) that produces good random numbers for different purposes. The instruction for accessing the RNG is called xstore. As in Intel and AMD processors, the random number generator in VIA processors is supported by the hw_random device driver.
Nehemiah stepping 8 and higher contains two independent RNGs and the Advanced Cryptography Engine (ACE). The ACE can encrypt and decrypt data using the Advanced Encryption Standard (AES) algorithm with three standard key lengths—128, 192 and 256 bytes—in four different modes of operation: electronic codebook (ECB), cipher block chaining (CBC), cipher feedback (CFB) and output feedback (OFB) modes (see the on-line Resources). The appropriate instructions are called xcryptecb, xcryptcbc and so on. Later in this article, I predominately use their common group name, xcrypt, instead of the mode-specific instruction names.
Esther stepping 0 and higher inherited two RNG units from Nehemiah. ACE was extended with counter (CTR) mode support and MAC (Message Authentication Code) computation. And there are two new acronyms, PHE and PMM. PadLock Hash Engine (PHE) is used for computing a cryptographic hash, also known as a digest, of a given input block, using the SHA1 or SHA256 algorithm. The proposed instruction name is xsha.
The PadLock Montgomery Multiplier (PMM) is responsible for speeding up one of the most time-consuming computations used in asymmetric, or public-key, cryptography: AB mod M, where A, B and M are huge numbers, usually 1,024 or 2,048 bits. This instruction is called montmul.
As noted above, in the rest of this article I mostly speak about the xcrypt instruction. Principles described further mostly are valid for other units as well, and xcrypt serves only as an example. Also, the terms and concepts covered in this encryption discussion apply to decryption as well.
In contrast to the external cryptography accelerators usually plugged in to PCI slots, the PadLock engine is an integral part of the CPU. This fact significantly simplifies its use, because it is not necessary to bother with accessing the bus or with interrupts, asynchronous operations and so on. Encrypting a block of memory with xcrypt is as easy as copying it over with the movs instruction.
At this point, encryption is almost an atomic operation. Before executing the instruction, the buffer contains plain-text input data; a few clock cycles later, when the execution finishes, we have ciphertext. If a task requested processing of a single block, which is 16 bytes in the case of the AES algorithm, the operation is fully atomic. That is, the CPU doesn't interrupt it in the middle and doesn't do anything else until the encryption is finished.
But what if the buffer contains a gigabyte of plain text to be processed? It isn't good to stop all other operations and wait for the encryption to finish when it's this large. In such a case, the CPU can interrupt the encryption after every single block of 16 bytes. The current state is saved, and whatever else can be done is done—interrupts can be handled and processes switched. As soon as the encrypting process is restarted, the instruction continues from the point at which it was suspended. That's why I say this is almost an atomic operation: for the calling process it looks atomic, but it can be interrupted by a higher-priority event. The current processing state then is saved into the memory and registers of the running process, which enables multiple tasks to do encryption simultaneously, without the risk of mixing their data. Again, it is an analogous situation to copying memory blocks with the movs instruction.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Linux Systems Administrator
- Dynamic DNS—an Object Lesson in Problem Solving
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Using Salt Stack and Vagrant for Drupal Development
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




5 hours 10 min ago
5 hours 44 min ago
6 hours 42 min ago
7 hours 32 min ago
11 hours 34 min ago
15 hours 22 min ago
15 hours 30 min ago
17 hours 44 min ago
20 hours 14 min ago
1 day 6 hours ago