VIA PadLock—Wicked Fast Encryption

This inexpensive processor offers support for the Advanced Encryption Standard, so you can do state-of-the-art encryption at wire speed.
How Fast Is It?

According to VIA, the maximum throughput on 1.2GHz processors exceeds 15Gb/s, which is almost 1.9GB/s. The benchmarks I have run confirm that such a speed could be achieved in real-world applications and not only in VIA marketing papers, which definitely was a nice surprise.

The actual encryption speed depends on two factors, cipher mode and data alignment. ECB is the fastest, while the most widely used CBC mode runs at about half of the ECB speed. PadLock requires the data to be aligned at 16-byte boundaries, so unaligned data must be moved to proper addresses first, which takes some time. In some cases, the Esther CPU can realign the data automatically, but this still causes some slowdown.

Table 1 shows some numbers from my testing. The OpenSSL benchmark for VIA Nehemiah 1.2GHz produced the following results in kB/s.

Table 1. The Open SSL Benchmark for VIA Nehemiah 1.2GHz, in kB/s

Type16 bytes64 bytes256 bytes1,024 bytes8,192 bytes
aes-128-ecb (software)11,274.5314,327.7914,608.6414,672.5514,693.72
aes-128-ecb (PadLock)66,892.82346,583.52910,704.211,489,932.591,832,151.72
aes-128-cbc (software)8,276.2712,915.7513,264.1313,313.0213,322.92
aes-128-cbc (PadLock)48,542.30241,898.79523,706.28745,157.61846,402.90

The bigger the blocks are, the better the results are, because the overhead of the OpenSSL library itself is eliminated. Encryption of 8kB blocks in ECB mode can run at about 1.7GB/s; in CBC mode, we get about 800MB/s. In comparison to software encryption, PadLock in ECB mode is 120 times faster on the same processor, and CBC mode is 60 times faster.

Thanks to this speedup, the IPsec on 100Mbps network runs at almost full speed somewhere around 11MB/s. Similar speedups can be seen on encrypted filesystems. The Bonnie benchmark running on a Seagate Barracuda in UDMA100 mode produced plain-text throughput at a rate of 61,543kB/s; with PadLock, it was 49,961kB/s, and a pure software encryption ran at only 10,005kB/s. In other words, PadLock was only about 20% slower, while the pure software was almost 85% slower than the non-encrypted run. See Resources for a link to my benchmark page with more details and more numbers.

Linux Support

So far I have developed Linux support for the following packages only for the AES algorithm provided by the xcrypt instruction, because I haven't used the Esther CPU yet. As soon as I get the new processor, I will add support for the other algorithms where appropriate.

Kernel

When the kernel needs the AES algorithm, it loads by default the aes.ko module, which provides its software implementation. To use PadLock for AES, you must load the padlock.ko module instead. You can do this either by hand with modprobe or by adding a single line to /etc/modprobe.conf:

alias aes padlock

Now, every time the kernel requires AES, it automatically loads padlock.ko too.

Patches are available for kernel version 2.6.5 and above; see the PadLock in Linux home page in Resources. Also, the basic driver will be available in the vanilla 2.6.11 kernel without any patching.

OpenSSL

Those amongst us who are brave enough to use recent CVS versions of OpenSSL already have PadLock support. Users of OpenSSL 0.9.7 have to patch and rebuild the library, or they can use a Linux distribution that has the patch already included in its packages, such as SuSE Linux 9.2.

To see if your OpenSSL build has PadLock support, run this simple command:

$ openssl engine padlock
(padlock) VIA PadLock (RNG, ACE)

If instead of (RNG, ACE) you see (no-RNG, no-ACE), it means that your OpenSSL installation is PadLock-ready, but your processor is not. You also could see an ugly error message saying that there is no such engine. In that case, you should upgrade or patch your OpenSSL library.

For programs that use OpenSSL for their cryptography needs to enjoy PadLock support, they must use the so-called EVP_interface and initialize hardware accelerator support somewhere at the beginning of their runs:


#include <openssl/engine.h>
int main ()
{
    [...]
    ENGINE_load_builtin_engines();
    ENGINE_register_all_complete();
    [...]
}

See the evp(3) man page from the OpenSSL documentation for details.

In SUSE Linux 9.2, for example, OpenSSH has a similar patch to let you experience much faster scp transfers over the network.

______________________

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions