Exploring RSA Encryption in OpenSSL
When sending your credit card number
through a public medium, such as the Internet, your financial
credibility may be compromised if the number is not first
encrypted. It is impossible to tell who may be listening in on your
connection as you shop for new CDs or books. The RSA encryption
method often is used to hide your credit card number from would-be
thiefs on the Internet, because it uses a public key to hide your
information and a private key to reveal it. This article banishes
the mystery surrounding RSA encryption and explains how a realistic
implementation of RSA works in the OpenSSL library. The only files
you need to concern yourself with appear in the openssl/crypto/bn
and openssl/crypto/rsa subdirectories. Feel free to download and
take a look at the code to get a feel for it before
continuing.Symmetric vs. Asymmetric EncryptionThe best way to understand asymmetric encryption is to think
of a box that has two kinds of keys: one key locks it and the other
unlocks it. Anybody who has a copy of the locking key (aka public
key) can put a secret in the box. Because only you have the
unlocking key (aka private key), nobody else can reveal a secret
that someone else locked in the box. This is different from
symmetric key encryption, in which the same key is used for locking
and unlocking.Here is an example of encryption using the same key. First,
we choose a number and call it the key. We want to encrypt a credit
card number, so we subtract each number of the credit card from the
key. For instance, take the number 1424 3135 2435 1556 and choose
the number 12. We could encrypt it like so:
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 -1 -4 -2 -4 -3 -1 -3 -5 -2 -4 -3 -5 -1 -5 -5 -6 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 11 8 10 8 9 11 9 7 10 8 9 7 11 7 7 6
The bottom list of numbers then would be the encrypted credit
card number. To decrypt the credit card number, we need only to
subtract each number from the key once again:
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 11 8 10 8 9 11 9 7 10 8 9 7 11 7 7 6 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 1 4 2 4 3 1 3 5 2 4 3 5 1 5 5 6
Modern encryption algorithms such as AES and Triple DES are
much more complicated--not to mention more secure--than this
technique, but the example shows how the same key is used for both
encryption and decryption. Before two parties can use this system,
they both need to know the key 12. The algebra equations that
symbolize this algorithm are:
C = K - M
M = K - C
where M is the number to encrypt, K is the key (12 in the
above example) and C is the encrypted cipher-text number. If two
parties can agree on a key before beginning communications, then
symmetric encryption is enough. This prearrangement, however, is
not always feasible.Say you log on to a bookstore for the first time and decide
to buy something with a credit card. How do you know which key to
use? Either you or the on-line bookstore has to choose the key (K),
but it cannot be sent over the Internet because snoopers may see
it. This is where RSA comes in handy. The equations for encrypting
and decrypting are only a little more complicated than they are for
the subtraction technique:
C = ME mod N
M = CD mod N
If you are not used to seeing mod, it means to take the
remainder if the number is divided by N. In the first equation, we
raise M to the power E and divide the result by N, then store the
remainder in C.Again, M is the number to encrypt and C is the cipher- text
number. There are two keys: E is for encryption, while D is for
decryption. The best way to think of N is its the box in which the
secret is hidden. As an example, say we wish to encrypt the number
6. We choose an encryption key of E=3, a decryption key of D=17 and
a box of N=25. You want to send the message M=6 to the bookstore,
so the encryption happens like this:
C = 63 mod 25 = 16
When you log in to the on-line bookstore, the seller sends
you key E=3 and box N=25 so you can encrypt your credit card
number. However, decryption is impossible without knowing that
D=17:
M = 1617 mod 25 = 6
Because nobody else knows that D=17, it is impossible for
anybody except the bookstore to decrypt messages. Hence, you can
contact anybody on the Internet and feel safe that your sensitive
info is secure from theft.This technique is so simple that encoding and decoding for
32-bit numbers can be implemented in a single line of C code. The
real complications arise when you ask such questions as "How do I
generate an RSA key pair?" or "How large do the numbers need to be
for security?". The answers to these questions complicate RSA
implementations a hundred times over. The rest of this article
illustrates how these obstacles are tackled in the OpenSSL library.
You can download the latest version from
www.openssl.org. Click
on the tab labeled Source to download the latest version (0.9.7b at
the time of this writing).RSA Key GenerationGenerating the numbers N, E and D requires the generation of
primes and relative primes. To generate N, you need to find two
large primes and multiply them together. We can define N as a math
equation:
N = P * Q
where P and Q are both random, prime numbers. The encryption
key E and decryption key D both must be relative prime to
(P-1)*(Q-1). This means they must satisfy the equation:
(E * D) mod (P-1)*(Q-1) = 1
If two numbers are relative primes, it means their prime
factorizations share nothing in common. For instance, 9 and 40 are
relative primes to each other because they have no common
factors:
9 = 3 * 3
40 = 2 * 2 * 2 * 5
Generating a random prime number for an RSA key is nothing
more than a guess-and-check process. The files
openssl/bn/bn_prime.* implement a quick sieve testing approach to
choosing primes. The quick sieve approach asks the question "Is
this number prime?" but only receives the answers:
- No
- Maybe
OpenSSL generates random numbers and then runs a test-prime
function multiple times to weed out any false positives. If the
test fails, the random number is discarded and the process begins
anew. There are no guarantees that the generated number really is
prime, but math gurus have shown that a number can be shown prime
with 99.9% probability by using this test.The encryption key E usually is chosen to be a constant,
prime number for a protocol (such as SSL). Therefore, only the
number N needs to be sent across the Internet. Common choices are
the numbers 3 or 65537. Finding the decryption key D is a little
more difficult because you need to find the modulo inverse. This
involves calculating the greatest common divisor by using Euclid's
algorithm as detailed in Knuth's book (see Resources). If you look
in the file openssl/crypto/rsa/rsa_gen.c, you can get an idea of
how it works in OpenSSL.Basic Math for Big NumbersA secure RSA implementation requires rather large numbers.
Most secure protocol standards recommend key sizes of 1,024-bit
numbers, which 32-bit processors, such as the Pentium III or
PowerPC, cannot handle natively. To get around this barrier,
OpenSSL implements a new data type called a BIGNUM. The BIGNUM data
struct is implemented in file openssl/crypto/bn/bn.h and is defined
in Listing 1.Listing 1. BIGNUM Data
StructThe generic computer can handle 1,024-bit numbers only if
they are stored as arrays of smaller 32-bit numbers. The variable d
in the BIGNUM structure stores a pointer to a dynamically allocated
array. The length of the array is stored in dmax so that routines
for handling BIGNUM variables can avoid buffer overrun
errors.Basic arithmetic on BIGNUMs is as simple as elementary math
class, but with an added twist. Let us look at simple addition of
two numbers:
1 1 <== carry
1 4 7
+ 2 6 3
-------
4 1 0
Humans break addition of multidigit numbers into smaller
pieces by adding together single digits at a time, from right to
left. Computers can add large numbers in the same manner. If we
separate each digit into a separate array element, we can implement
elementary addition like this:
#define BASE 10
void add(int *a, int *b, int *c, int top) {
int i, carry;
for (i=carry=0; i < top; i++) {
c[i] = a[i] + b[i] + carry;
if (c[i] >= BASE) {
carry = 1;
c[i] -= BASE;
} else {
carry = 0;
}
}
}
The variable top represents the number of
digits to add. By using this technique to break up numbers, we can
define a number that is thousands of digits in size if we so
desired. It is not memory efficient to define BASE as ten, so why
not use base sixteen or even base four billion? The answer is we
can use any base that we like up to the maximum value of a 32-bit
integer:
void add(unsigned int *a, unsigned int *b,
unsigned int *c, int top) {
int i, carry;
for (i=carry=0; i < top; i++) {
c[i] = a[i] + b[i] + carry;
/* is there an overflow? */
if ((c[i] < a[i]) || (c[i] < b[i])) {
carry = 1; /* yes overflow, so carry */
} else {
carry = 0; /* no overflow, no carry */
}
}
}
Because the largest number that a 32-bit integer can store is
232-1, anything larger wraps around with an overflow error. We can
use this "error" to our advantage, because it has the same effect
as subtracting the base. This is the principle that OpenSSL uses to
implement arithmetic on the BIGNUM structure.If you look in openssl/crypto/bn/bn_add.c and in
openssl/crypto/bn/bn_asm.c, you can find hyper-optimized addition
and subtraction functions. These functions use fancy pointer
techniques to reference array elements in the BIGNUM variables, but
the principles are the same. Some other details they keep track of
include:
- negative numbers, necessary with subtraction
- different values for the top variable
- buffer overrun errors
- microprocessor independence
All the error checks and minor optimizations that address
these issues may make BN_add() and BN_sub() daunting at first, but
they simply are fancy C code wrappers for the old tried and true
method.Multiplication also generalizes quite well with the BIGNUM
data structure. When a 32-bit unsigned integer is used to represent
a single digit, we can use the classic elementary school algorithm.
For example:
2 4 3
* 4 1 5
-------
1 2 1 5 <== 243 * 5
2 4 3 <== 243 * 1
9 7 2 <== 243 * 4
-----------
1 0 0 8 4 5
If we allow for 232 total digits instead of the usual ten, we
can implement multiplication in terms of the addition function
discussed previously.
void mul(unsigned int *a, unsigned int *b,
unsigned int *c, int top) {
unsigned long long prod; /* 64-bit num in GNU C */
unsigned int i,j, hi32, lo32, carry;
unsigned int acc[top]; /* array of size top */
/* initialize c[] to all zeros */
for (j=0; j < top; j++) {
/* initialize acc[] to all zeros */
for (i=carry=0; i < top; i++) {
prod = a[i] * b[j];
lo32 = prod & 0x00000000FFFFFFFFULL;
hi32 = prod & 0xFFFFFFFFr0000000ULL;
acc[i] = lo32+carry;
carry = hi32;
}
add(acc,c+i,c+i,top);
}
}
Notice that we need 64 bits when multiplying two 32-bit
numbers. To handle a 64-bit quantity properly, we need to split it
into two 32-bit quantities. The number a is multiplied by a digit
in b, and the result is added to the total. To optimize
multiplication, OpenSSL uses a trick called Karatsuba's recursive
algorithm. It reduces the time complexity from O(n2) to O(n1.5) by
using a divide-and-conquer approach.Division is a beast similar to multiplication, but one that
requires a little extra thought. Whereas converting addition to
subtraction simply requires a sign change, division has to account
for the remainder, or modulo. Although OpenSSL implements division
in openssl/crypto/bn/bn_div.c, it is not needed directly in RSA
because RSA uses Montgomery multiplication instead (more to
come).OpenSSL implements exponentiation based on a
multiplication-modulo algorithm. Directly raising a 1,024-bit
number to a 1,024-bit number power requires about 10300 Gigabytes
of memory. So, how does the computer compute the RSA operation ME
mod N? The trick lies in distributing the mod operation within the
multiplications. For example:
436 mod 13 =
(432 mod 13) * (44 mod 13)
By distributing the mod operation, it is unnecessary to ever
store more than 1,024 extra bits on an intermediate
operation.Another subtlety to exponentiation makes it possible to
compute the operation in a short amount of time. Let us take a look
at openssl/crypto/bn/bn_exp.c:
for (i=1; i<bits; i++)
{
if (!BN_sqr(v,v,ctx)) goto err;
if (BN_is_bit_set(p,i))
{
if (!BN_mul(rr,rr,v,ctx)) goto err;
}
}
This function computes vp and then stores the result in rr.
Because a number is represented in binary, it is possible to
represent an exponentiation as a product of powers of two. Take a
look at this breakdown:
432 = 416 * 416 <==
416 = 48 * 48
48 = 44 * 44
44 = 42 * 42 <==
436 = 432 * 44
If we examine the excerpt from BN_exp() above, we see that v
is squared at every step. Then, if a bit is set in p, it is
multiplied into the total. The number 36 is 00100010 in binary, so
exponentiation is nothing more than a shift-test- multiply
operation.To exponentiate BIGNUM variables, each step requires a
multiplication operation and a modulo operation. Rather than go
through the basic algorithms, it is possible to combine the
multiplication and modulo into a single step called modular
multiplication (aka Montgomery multiplication). OpenSSL uses this
method to accelerate the modular exponentiation of a BIGNUM
variable. The file openssl/crypto/bn/bn_mont.c has an
implementation that requires some time to comprehend fully. If you
have further interest, reviews
this
paper.Converting a text string to a BIGNUM variable and back is the
last piece necessary to implement the RSA_public_encrypt() and
RSA_private_decrypt() functions. Both are defined in
openssl/crypto/rsa/rsa_lib.c. If you are interested to see how to
include RSA in your own programs, take a look at
openssl/crypto/rsa/rsa_test.c. This example illustrates how to do
everything with the OpenSSL RSA package you may ever need.Where to Go from HereThis is by no means a comprehensive explanation of how RSA
works, nor is it meant to be. Hopefully, it explained some of the
more obscure details. The security of RSA is based on the
difficulty of factoring large numbers, which is next to impossible
for 1,024-bit numbers today. This could change tomorrow, however,
as technology develops. The RSA factoring challenge from RSA Labs
has the latest public information on factoring (see
Resources).The OpenSSL library is used in several open-source packages.
Some prominent ones you might be familiar with include
Samba,
Apache-SSL and
OpenSSH. If you are
interested in learning more about how to implement encryption
algorithms or their security, some Resources are listed
below.ResourcesKernighan & Ritchie, The C Programming
LanguageKnuth, The Art of Computer Programming, Vol.
2Schneier, Applied CryptographyMenezes, Alfred J., Van Oorschot, Paul C. and Vanstone, Scott
A., Handbook of Applied CryptographyOpenSSL
LibraryRSA
Factoring ChallengeMontgomery
MultiplicationJames Tandon currently
consults for Computer Motion and likes dogs better than cats. His
home page is
www.antinomian.net.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
Re: Exploring RSA Encryption in OpenSSL
A little confused at an equation like C = 63 mod 25 = 16. I tried every key with N=25 and M=6, and got it to work. But, when M= any other number, trying to decrypt it fails. Should N be changed?
Re: Exploring RSA Encryption in OpenSSL
M has to be the same as M in the first equation, here, check this out
C = M^E mod N
M = C^D mod N
in the second equation, M is equal to whatever M is in the first equation, so if it is 6, then it would look like this
C = 6^E mod N
6 = C^D mod N
Re: Exploring RSA Encryption in OpenSSL
Very good article. I am doing a course in Cryptography and I wanted to have a real-world example of what's been taught. I can very easily understand this and also the other comments.
thanks guys.
Re: A better reference
Practical Cryptography by Schneier and Ferguson is a better
book to read than Applied Cryptography for an accessible
explaination of RSA encryption. If you naively employ
RSA you probably aren't going to get it right. Practical
Cryptography explains about a lot of the things you need
to worry about when using RSA.
Re: Exploring RSA Encryption in OpenSSL
Okay, this was a pretty good explanation as to how RSA works and how to code for it, but it oversimplifies a couple of real-world things.
1. First off, asymmetric systems like RSA are rarely used to pass "user data" like a credit card number. Rather, RSA is used to exchange symmetric keys for algorithms such as DES or AES, since symmetric algorithms are significantly faster to compute.
2. There are several sins of omission in the example of exchanging public keys: "Because nobody else knows that D=17, it is impossible for anybody except the bookstore to decrypt messages. Hence, you can contact anybody on the Internet and feel safe that your sensitive info is secure from theft."
This example completely ignores the man-in-the-middle attack that exists if an adversary is able to substitute HIS public key for one or the other party (Alice or Bob). This is why SSH asks you to kindly verify the public key of the other side before accepting it (which most people do blindly, anyway).
Post new comment