Byte and Bit Order Dissection

Discussing the differences between big and little endianness, bit and byte order and what it all means.
Endianness of Bus

The bus we refer to here is the external bus we showed in the figure above. We use PCI as an example below. The bus, as we know, is an intermediary component that interconnects CPUs, devices and various other components on the system. The endianness of bus is a standard for byte/bit order that bus protocol defines and with which other components comply.

Take an example of the PCI bus known as little endian. It implies the following: among the 32 address/data bus line AD [31:0], it expects a 32-bit device and connects its most significant data line to AD31 and least significant data line to AD0. A big endian bus protocol would be the opposite.

For a partial word device connected to bus, for example, an 8-bit device, little endian bus-like PCI specifies that the eight data lines of the device be connected to AD[7:0]. For a big endian bus protocol, it would be connected to AD[24:31].

In addition, for PCI bus the protocol requires each PCI device to implement a configuration space. This is a set of configuration registers that have the same byte order as the bus.

Just as all the devices need to follow bus's rules regarding byte/bit endianness, so does the CPU. If a CPU operates in an endianness different from the bus, the bus controller/bridge usually is the place where the conversion is performed.

An alert reader nows ask this question, "so what happens if the endianness of the device is different from the endianness of the bus?" In this case, we need to do some extra work for communication to occur, which is covered in the next section.

Endianness of Devices

Kevin's Theory #1: When a multi-byte data unit travels across the boundary of two reverse endian systems, the conversion is made such that memory contiguousness to the unit is preserved.

We assume CPU and bus share the same endianness in the following discussion. If the endianness of a device is the same as that of CPU/bus, then no conversion is needed.

In the case of different endianness between the device and the CPU/bus, we offer two solutions here from a hardware wiring point of view. We assume CPU/bus is little endian and the device is big endian in the following discussion.

Word Consistent Approach

In this approach, we swap the entire 32-bit word of the device data line. We represent the data line of device as D[0:31], where D(0) stores the most significant bit, and bus line as AD[31:0]. This approach suggests wiring D(i) to AD(31-i), where i = 0, ..., 31. Word Consistent means the semantic of the whole word is preserved.

To illustrate, the following code represents a 32-bit descriptor register in a big endian NIC card:

After applying the Word Consistent swap (wiring D[0:31] to AD[31:0]) , the result in the CPU/bus is:

Notice that it automatically is little endian for CPU/bus. No software byte or bit swapping is needed.

The above example is for those simple cases where data does not cross a 32-bit memory boundary. Now, let's take a look at a case where it does. In the following code, vlan[0:24] has a value of 0xabcdef and crosses a 32-bit memory boundary.

After the Word Consistent swap, the result is:

Do you see what happened? The vlan field has been broken into two noncontiguous memory spaces: bytes[1:0] and byte(7). It violates Kevin's Theory #1, and we are not able to define a nice C structure to access the in-contiguous vlan fields.

Therefore, the Word Consistent solution works only for data within word boundaries and does not work for data that may cross a word boundary. The second approach solves this problem for us.

Byte Consistent Approach

In this approach, we do not swap bytes, but we do swap the bits within each byte lane (bit at device bit-offset i goes to bus bit-offset (7-i), where i=0...7) in hardware wiring. Byte Consistent means the semantic of the byte is preserved.

After applying this method, the big endian NIC device value in above results in this CPU/bus value:

Now, the three bytes of the vlan field are in contiguous memory space, and the content of each byte reads correctly. But this result still looks messy in byte order. However, because we now occupy a contiguous memory space, let the software do a byte swap for this 5-byte data structure. We get the following result:

We see that software byte swapping needs to be performed as the second procedure in this approach. Byte swapping is affordable in software, unlike bit swapping.

Kevin's Theory #2: In a C structure that contains bit fields, if field A is defined in front of field B, then field A always occupies a lower bit address than field B.

Now that everything is sorted out nicely, we can define the C structure as the following to access the descriptor in the NIC:

struct nic_tag_reg {
        uint64_t vlan:24 __attribute__((packed));
        uint64_t rx  :6  __attribute__((packed));
        uint64_t tag :10 __attribute__((packed));


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Memory bit order of int

Anonymous's picture

On a system with little endian and int holding 32 bits. If one want to bit twiddle and set bits by memory order, would this be correct?

* To set bit 1,2,8,29 of int my_int in memory, in c, one would set bit:

bit 1: my_int |= 1 << 7;
10000000 00000000 00000000 00000000 (0x80000000)
bit 2: my_int |= 1 << 6;
11000000 00000000 00000000 00000000 (0xc0000000)
bit 8: my_int |= 1 << 15;
11000000 10000000 00000000 00000000 (0xc0800000)
bit 29: my_int |= 1 << 26;
11000000 10000000 00000000 00000100 (0xc0800004)

If so, is there any trick to calculate the left-shift from a given value, i.e. "Set bit 29" - give 26.

As the left-shift order would be:

set  left
bit  shift
 0    7
 1    6
 2    5
 3    4
 4    3
 5    2
 6    1
 7    0
 8   15
 9   14
10   13
11   12
12   11
..   ..

have tried with various ((set_bit%BITS_PER_INT / CHAR_BIT) * CHAR_BIT) + ...; variants but always become a big huge pile of mods and so on.

Any good trick for this?

(I'm perhaps thinking wrong here, - starting to get to late)

Bit order?

Anonymous's picture

What is bit order? In which machines do bits have addresses?

Ethernet Address Endianess

Nick Collins's picture

I think your description of Ethernet Addressing is mistaken. In your example where the MAC address 12:34:56:78:9a:bc, you say that "12" will appear on the line first. This is not correct. The "bc" will appear first. Refer to section 3.2 of the 802.3 spec. It explicitly states the byte ordering of the Length field and the CRC are high-order byte first. So, I'm led to believe that the SA and DA are low-order byte first.

This would make sense because we know that the first bit on the wire determines multicast or unicast and that this is the LSB of the entire field...which is the last byte (not of the 1st byte).


Nick Collins's picture

Nevermind my previous posting. It was a late night.

Errata: dot2ip() function

kingchurch's picture

It's incomplete in this on-line version which should be:

/* dot2ip - convert a dotted decimal string into an
* IP address
uint32_t dot2ip(char *pdot)
uint32_t i,my_ip;

for (i=0; i
my_ip = my_ip*256+atoi(pdot);
if ((pdot = (char *) index(pdot, '.')) == NULL)
return my_ip;

Re: Errata: dot2ip() function

kingchurch's picture

Already fixed by LJ, thanks!

Errata for ASCII Graphs

kingchurch's picture

Most of the ASCII graphs inlined in this on-line version of
the article are not formatted properly.

I'm contacting LJ to correct the format. In the meanwhile
you can reference my original article here if you get
confused of the ASCII graphs:

Re: Errata for ASCII Graphs

kingchurch's picture

Already fixed. by LJ. Thanks!

Re: Byte and Bit Order Dissection

feiyunw's picture

1. A great article;
2. I suggest you create a HOWTO in the Linux Documentation Project ( so that more people can benefit from your article;
3. As I know, bit0 is the MSB in Motorola PowerPC Manual; maybe you should clarify your bit numbering explicitly;

Re: Byte and Bit Order Dissection

kingchurch's picture

Thank you!
I'll consider the HOWTO suggestion.
About the 3rd comment, have you seen the "Typo"
discussion thread other readers brought up ?
Hopefully my correction to the typo can address your
doubt too.

- kevin


Anonymous's picture

"That is, in a big endian system the most significant bit is stored at the lowest bit address; in a little endian system, the least significant bit is stored at the lowest bit address."

Re: Typo?! -- -Yes, it's an error

kingchurch's picture

In fact, it is an error. In the original article I submitted
to LJ , I wrote:

"That is, in a big endian system, the most significant
bit is stored at the lowest bit address and in a
little endian system, the least significant bit is
stored at the lowest bit address." ---- Correct
But somehow it was changed to the following
in the on-line version without notifying me.

"That is, in a big endian system, the most significant
bit is stored at the lowest bit address and in a
little endian system, the least significant bit is
stored at the highest bit address." --- Wrong
I'm contacting LJ to correct this error now,
in the meanwhile please reference my original



Big and Little Endians

Anonymous's picture

Thank you for the pow wow concerning big endians and little endians. One thing is clear, although there are several kinds of endians, there are neither good endians nor bad endians. It would be nice to have but one type of endian, but uniting all endian tribes of thought under one teepee is not likely for the forseeable future. Nevertheless, it would be nice to hold a big council, so let me know when and where, and I'll make a reservation to attend.

Re: Typo?!

kingchurch's picture

The sentence you quoted follows "Bit order usually follows the same endianness as the byte order for a given computer system. ".

So I'm illustrating what the bit order will look like if it follows
the byte order on the same architecture. In another
word, in some systems where bit order doesn't follow
byte order, the quoted sentence is not applicable.



Re: Typo?!

Anonymous's picture

No not really. What is meant is that in big-endian, bit 0 is the most significant bit and in little-endian, bit 7 is the most significant bit (for a single byte).

In most RISC architectures a 64-bit bus would be represented as 64bus

In an Intel system a 64-bit bus would be represented as 64bus

Re: Byte and Bit Order Dissection

Anonymous's picture

You left out everyone's favorite forgotten case: Middle endian! And a mention of the origin of "endian" (we have the Lilliputians to thank for this).

Seriously, though -- good article.

Re: Byte and Bit Order Dissection

kingchurch's picture

Thank you for the input. It must be more complete to include
"Middle Endian" in the discussion.
On the other hand, I have a word count limitation for the article
which forces me include only the most typical cases :p


Re: Byte and Bit Order Dissection

Anonymous's picture

No, I was kidding about Middle Endian. It's an obsolete format (or rather, _they're_ obsolete formats). But no byte order discussion is complete without a mention of "Gulliver's Travels". Right after "First introduced by Danny Cohen in 1980, it describes the method a computer system uses to represent multi-byte integers." should be something like, "This was a reference to the disagreement about which side of an egg was the proper side to crack first."