DAV's Endian FAQ

updated 1999-01-06.
The big-endian v. little-endian controversy, how to avoid stupid mistakes in hardware and software, ON HOLY WARS AND A PLEA FOR PEACE.

contents:

ON HOLY WARS AND A PLEA FOR PEACE by Danny Cohen
Dealing with endianness Should bridges automatically convert for you ? No !
PCI Is PCI inherently big-endian or little-endian ? How should PCI boards deal with the existence of both little endian hosts and big-endian hosts ?
details endianness of various architectures and protocols
bibliography
floating-point number format [FIXME: should this be in a different file ?]
XDR
misc unsorted cruft

Related local pages:

ON HOLY WARS AND A PLEA FOR PEACE

by Danny Cohen

Resent-From: pci-sig-request@znyx.com
Resent-Date: Wed, 24 Jan 1996 23:50:25 GMT
Date: Wed, 24 Jan 1996 23:50:25 GMT
From: Tim@tile.demon.co.uk (Tim Eccles)
Reply-To: Tim@tile.demon.co.uk
Subject: Re: Big Endian question
Lines: 850
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients <pci-sig-request@znyx.com>

Alan Deikman writes:
> I wrote:
>
> >Is there an example of a 32-bit processor that stores bytes 0123 as 0132
> >for a 32-bit number?
>
> Oops, I hit the send key too soon.  I meant "0123" as "1032", where
> the 16-bit parts of a 32-bit number were swapped.  "Endian" discussions
> always make me bug-eyed.  But I remember reading that byte order some-
> where but forgot where.

Long ago, in UNIX times, this was the NUXI byte order.

Herewith Danny Cohen's original paper, as posted recently to
comp.arch, etc.  Still makes a good read on a wet afternoon.

>gnu> From: gnu@hoptoad.uucp (John Gilmore)
>gnu> Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel
>gnu> Subject: Byte Order: On Holy Wars and a Plea for Peace
>gnu> Date: 30 Nov 86 01:29:46 GMT

>gnu> [Not a single person objected to my posting this, so here it is.
>gnu> Mod.sources.doc seems to be dead, so I am posting this here.  Factual
>gnu> followups to comp.arch, please.  Send flames to yourself via email.
>gnu> Note that the date of the article is 1980, so there are a few things
>gnu> that have changed since then; nevertheless, the spirit of the article
>gnu> is still relevant. --gnu]

IEN 137                                              Danny Cohen
                                                     U S C/I S I
                                                    1 April 1980

           ON HOLY WARS AND A PLEA FOR PEACE

                      INTRODUCTION

This  is  an  attempt to stop a war.  I hope it is not too late and that
somehow, magically perhaps, peace will prevail again.

The latecomers into the arena believe that the issue is:  "What  is  the
proper byte order in messages?".

The root of the conflict lies much deeper than that.  It is the question
of  which  bit  should  travel first, the bit from the little end of the
word, or the bit from the big end of the  word?  The  followers  of  the
former  approach are called the Little-Endians, and the followers of the
latter are called the Big-Endians.  The details of the holy war  between
the  Little-Endians  and  the  Big-Endians  are  documented  in  [6] and
described, in brief, in the Appendix. I recommend that you  read  it  at
this point.  [I have inserted it -- gnu]

                                   13

                            A P P E N D I X

Some notes on Swift's Gulliver's Travels:

Gulliver finds out that there is a law, proclaimed by the grandfather of
the  present  ruler,  requiring  all citizens of Lilliput to break their
eggs only at the little ends.  Of course, all those citizens  who  broke
their  eggs at the big ends were angered by the proclamation.  Civil war
broke out between the Little-Endians and the Big-Endians,  resulting  in
the  Big-Endians  taking  refuge  on  a  nearby  island,  the kingdom of
Blefuscu.

Using Gulliver's unquestioning point of view, Swift satirizes  religious
wars.    For  11,000  Lilliputian  rebels  to  die over a controversy as
trivial as at which end eggs have to be broken seems not only cruel  but
also  absurd,  since Gulliver is sufficiently gullible to believe in the
significance  of  the  egg  question.    The  controversy  is  important
ethically  and  politically  for the Lilliputians.  The reader may think
the issue is silly, but he should consider what Swift is making  fun  of
the actual causes of religious- or holy-wars.

In  political  terms,  Lilliput  represents England and Blefuscu France.
The religious  controversy  over  egg-breaking  parallels  the  struggle
between  the  Protestant  Church  of  England and the Catholic Church of
France, possibly referring to some differences about what the Sacraments
really mean.  More specifically,  the  quarrel  about  egg-breaking  may
allude  to  the  different  ways that the Anglican and Catholic Churches
distribute communion, bread and wine for the Anglican, but  bread  alone
for  the  Catholic.   The French and English struggled over more mundane
questions as well, but in this part of Gulliver's Travels, Swift  points
up  the  symbolic  difference  between  the  churches  to  ridicule  any
religious war.

    For ease of reference please note that Lilliput  and  Little-Endians
    both start with an "L", and that both Blefuscu and Big-Endians start
    with a "B".  This is handy while reading this note.]

[End of appendix -- gnu]

The  above  question  arises  from  the  serialization  process which is
performed on messages in order to send them through communication media.
If the communication unit is a message - these problems have no meaning.
If the units are computer "words" then one may ask in which order  these
words  are sent, what is their size, but not in which order the elements
of these words are sent, since they are sent virtually  "at-once".    If
the unit of transmission is an 8-bit byte, similar questions about bytes
are  meaningful,  but  not  the  order of the elementary particles which
constitute these bytes.

If the units of communication  are  bits,  the  "atoms"  ("quarks"?)  of
computation,  then  the  only  meaningful question is the order in which
bits are sent.

Obviously, this is actually the case  for  serial  transmission.    Most
modern  communication  is  based  on  a  single  stream  of  information
("bit-stream").  Hence, bits, rather than bytes or words, are the  units
of  information  which  are  actually transmitted over the communication
channels such as wires and satellite connections.

Even though a great deal of effort, in both hardware  and  software,  is
dedicated  to  giving  the appearance of byte or word communication, the
basic fact remains:  bits are communicated.

Computer memory may be viewed as a linear sequence of bits, divided into
bytes, words, pages and so on.  Each unit  is  a  subunit  of  the  next
level.  This is, obviously, a hierarchical organization.

                                   2

If  the  order  is  consistent, then such a sequence may be communicated
successfully while both parties maintain their freedom to treat the bits
as a set of groups of any arbitrary size.  One party may treat a message
as a "page", another as so many "words", or so many "bytes" or  so  many
bits.    If  a  consistent  bit order is used, the "chunk-size" is of no
consequence.

If an inconsistent bit order is used, the chunk size must be  understood
and  agreed  upon  by all parties.  We will demonstrate some popular but
inconsistent orders later.

In a consistent order, the bit-order, the  byte-order,  the  word-order,
the  page-order, and all the other higher level orders are all the same.
Hence, when considering a serial bit-stream, along a communication  line
for example, the "chunk" size which the originator of that stream has in
mind is not important.

There  are  two  possible  consistent  orders.  One is starting with the
narrow end of each  word  (aka  "LSB")  as  the  Little-Endians  do,  or
starting with the wide end (aka "MSB") as their rivals, the Big-Endians,
do.

In  this note we usually use the following sample numbers: a "word" is a
32-bit quantity and is designated by a "W", and a  "byte"  is  an  8-bit
quantity  which  is  designated  by  a  "C"  (for "Character", not to be
confused with "B" for "Bit)".

                              MEMORY ORDER

The first  word  in  memory  is  designated  as  W0,  by  both  regimes.
Unfortunately, the harmony goes no further.

The Little-Endians assign B0 to the LSB of the words and B31 is the MSB.
The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB.

By  the  way,  if  mathematicians had their way, every sequence would be
numbered from ZERO up, not from ONE, as is traditionally done.   If  so,
the first item would be called the "zeroth"....

Since  most  computers  are not built by mathematicians, it is no wonder
that some computers designate  bits  from  B1  to  B32,  in  either  the
Little-Endians'  or the Big-Endians' order.  These people probably would
like to number their words from W1 up, just to be consistent.

Back to the main theme.  We would like to illustrate the  hierarchically
consistent  order  graphically,  but  first  we have to decide about the
order  in  which  computer  words are written on paper.  Do they go from
left to right, or from right to left?

                                   3

The English language, like most modern languages, suggests that  we  lay
these computer words on paper from left to right, like this:

                 |---word0---|---word1---|---word2---|....

In  order  to  be  consistent,  B0 should be to the left of B31.  If the
bytes in a word are designated as C0 through C3 then C0 is also  to  the
left of C3.  Hence we get:

                 |---word0---|---word1---|---word2---|....
                 |C0,C1,C2,C3|C0,C1,C2,C3|C0,C1,C2,C3|.....
                 |B0......B31|B0......B31|B0......B31|......

If  we  also  use  the  traditional  convention,  as  introduced  by our
numbering system, the wide-end is on the left and the narrow-end  is  on
the right.

Hence, the above is a perfectly consistent view of the world as depicted
by  the  Big-Endians.    Significance [consistently] decreases as the item
numbers (address) increases.

Many computers share with the Big-Endians this view  about  order.    In
many  of  their  diagrams the registers are connected such that when the
word W(n) is shifted right, its LSB moves into the MSB of word W(n+1).

English text strings are stored  in  the  same  order,  with  the  first
character in C0 of W0, the next in C1 of W0, and so on.

This order is very consistent with itself and with the English language.

On  the  other  hand,  the  Little-Endians  have  their  view,  which is
different but also self-consistent.

They believe that one should start with the narrow end  of  every  word,
and  that  low  addresses  are  of  lower  order  than  high  addresses.
Therefore they put their words on paper  as  if  they  were  written  in
Hebrew, like this:

                   ...|---word2---|---word1---|---word0---|

When they add the bit order and the byte order they get:

                   ...|---word2---|---word1---|---word0---|
                  ....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0|
                 .....|B31......B0|B31......B0|B31......B0|

In  this regime, when word W(n) is shifted right, its LSB moves into the
MSB of word W(n-1).

                                   4

English  text  strings  are  stored  in  the  same order, with the first
character in C0 of W0, the next in C1 of W0, and so on.

This order is very consistent with itself, with the Hebrew language, and
(more importantly) with mathematics, because significance increases with
increasing item numbers (address).

It has the disadvantage that English  character  streams  appear  to  be
written backwards; this is only an aesthetic problem but, admittedly, it
looks funny, especially to speakers of English.

In  order  to  avoid  receiving  strange  comments about this orders the
Little-Endians pretend that they are Chinese, and write the  bytes,  not
right-to-left but top-to-bottom, like:

                        C0: "J"
                        C1: "O"
                        C2: "H"
                        C3: "N"
                        ..etc..

Note that there is absolutely no specific significance whatsoever to the
notion  of  "left"  and  "right" in bit order in a computer memory.  One
could think about it as "up" and "down" for example,  or  mirror  it  by
systematically  interchanging  all  the  "left"s and "right"s.  However,
this notion  stems  from  the  concept  that  computer  words  represent
numbers,  and from the old mathematical tradition that the wide-end of a
number (aka the MSB) is called "left" and the narrow-end of a number  is
called "right".

This mathematical convention is the point of reference for the notion of
"left" and "right".

It  is  easy to determine whether any given computer system was designed
by Little-Endians or by Big-Endians.  This is done by watching  the  way
the  registers  are connected for the "COMBINED-SHIFT" operation and for
multiple-precision arithmetic like integer products;  also  by  watching
how  these  quantities  are  stored in memory; and obviously also by the
order in which bytes are stored within words.  Don't let  the  B0-to-B31
direction  fool  you!!  Most computers were designed by Big-Endians, who
under the threat of criminal prosecution pretended to be Little-Endians,
rather than seeking exile in  Blefuscu.    They  did  it  by  using  the
B0-to-B31   convention   of   the   Little-Endians,  while  keeping  the
Big-Endians' conventions for bytes and words.

The PDP10 and the 360, for example, were designed by Big-Endians:  their
bit  order, byte-order, word-order and page-order are the same. The same
order also  applies  to  long  (multi-word)  character  strings  and  to
multiple precision numbers.

                                   5

Next,  let's consider the new M68000 microprocessor.  Its way of storing
a 32-bit number, xy, a 16-bit number, z, and the string  "JOHN"  in  its
16-bit words is shown below (S = sign bit, M = MSB, L = LSB):

        SMxxxxxxx yyyyyyyyL SMzzzzzzL  "J" "O"   "H" "N"
       |--word0--|--word1--|--word2--|--word3--|--word4--|....
       |-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|.....
       |B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|......

The  M68000  always  has on the left (i.e., LOWER byte- or word-address)
the wide-end of numbers in any of the various sizes which it may use:  4
(BCD), 8, 16 or 32 bits.

Hence,  the  M68000  is  a  consistent  Big-Endian,  except  for its bit
designation, which is used to camouflage its  true  identity.  Remember:
the Big-Endians were the outlaws.

Let's  look next at the PDP11 order, since this is the first computer to
claim to be a Little-Endian. Let's again look at the way data is  stored
in memory:

               "N" "H"   "O" "J"  SMzzzzzzL SMyyyyyyL SMxxxxxxL
         ....|--word4--|--word3--|--word2--|--word1--|--word0--|
        .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
       ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

The  PDP11  does  not  have  an instruction to move 32-bit numbers.  Its
multiplication products  are  32-bit  quantities  created  only  in  the
registers,  and  may  be  stored  in  memory in any way.  Therefore, the
32-bit quantity, xy, was not shown in the above diagram.

Hence, the above order is a Little-Endians' consistent order.  The PDP11
always stores on the  left  (i.e.,  HIGHER  bit-  or  byte-address)  the
wide-end of numbers of any of the sizes which it may use:  8 or 16 bits.

However,  due to some infiltration from the other camp, the registers of
this Little-Endian's marvel are  treated  in  the  Big-Endians'  way:  a
double  length  operand  (32-bit)  is  placed  with its MSB in the lower
address register and the LSB in the higher  address  register.    Hence,
when depicted on paper, the registers have to be put from left to right,
with  the  wide  end  of  numbers  in  the LOWER-address register.  This
affects the integer multiplication and division, the combined-shifts and
more. Admittedly, Blefuscu scores on this one.

Later, floating-point hardware was introduced for the PDP11/45.

Floating-point  numbers  are  represented  by  either  32-   or   64-bit
quantities,  which are 2 or 4 PDP11 words.  The wide end is the one with
the sign bit(s), the exponent and the MSB of the  fraction.  The  narrow
end is the one with the LSB of the fraction.  On paper these formats are
clearly shown with the wide end on the left and the narrow on the right,
according  to  the centuries old mathematical conventions.  On page 12-3

                                   6

of  the  PDP11/45  processor  handbook,  [3],  there is a cute graphical
demonstration of this order, with the word "FRACTION" split over all the
2 or the 4 words which are used to store it.

However, due to some oversights in the security screening  process,  the
Blefuscuians  took  over,  again.  They assigned, as they always do, the
wide end to the LOWer addresses in memory, and the narrow to the  HIGHer
addresses.

Let   "xy"   and  "abcd"  be  32-  and  64-bit  floating-point  numbers,
respectively.  Let's look how these numbers are stored in memory:

          ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx
     ....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--|
    .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
   ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

Well, Blefuscu scores many points for this. The above reference  in  [3]
does not even try to camouflage it by any Chinese notation.

Encouraged by this success, as minor as it is, the Blefuscuians tried to
pull  another fast one.  This time it was on the VAX, the sacred machine
which all the Little-Endians worship.

Let's look at the VAX order. Again, we look at the way  the  above  data
(with xy being a 32-bit integer) is stored in memory:

               "N" "H"   "O" "J"  SMzzzzzzL SMxxxxxxx yyyyyyyyL
          ...ng2-------|-------long1-------|-------long0-------|
         ....|--word4--|--word3--|--word2--|--word1--|--word0--|
        .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
       ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

What a beautifully consistent Little-Endians' order this is !!!

So,  what  about  the infiltrators? Did they completely fail in carrying
out their mission?  Since the integer  arithmetic  was  closely  guarded
they  attacked  the  floating  point  and the double-floating which were
already known to be easy prey.

                                   7

Let's  look, again, at the way the above data is stored, except that now
the 32-bit quantity xy is a floating point  number:  now  this  data  is
organized in memory in the following Blefuscuian way:

               "N" "H"   "O" "J"  SMzzzzzzL yyyyyyyyL SMxxxxxxx
          ...ng2-------|-------long1-------|-------long0-------|
         ....|--word4--|--word3--|--word2--|--word1--|--word0--|
        .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
       ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

Blefuscu  scores  again.    The  VAX  is  found guilty, however with the
explanation that it tries to be compatible with the PDP11.

Having found themselves there, the  VAXians  found  a  way  around  this
unaesthetic   appearance:  the  VAX  literature  (e.g.,  p. 10  of  [4])
describes this order by using the Chinese top-to-bottom notation, rather
than an embarrassing left-to-right or right-to-left one.  This page is a
marvel.  One has to admire the skillful way in which some quantities are
shown in columns 8-bit wide, some in 16 and other in 32, all in order to
avoid the egg-on-the-face problem.....

By the way, some engineering-type people complain  about  the  "Chinese"
(vertical)  notation  because usually the top (aka "up") of the diagrams
corresponds to "low"-memory (low addresses).  However,  anyone  who  was
brought  up by computer scientists, rather than by botanists, knows that
trees grow downward, having their roots at the top of the page and their
leaves down below. Computer scientists seldom remember  which  way  "up"
really is (see 2.3 of [5], pp. 305-309).

Having   scored   so  easily  in  the  floating  point  department,  the
Blefuscuians moved to new territories: Packed-Decimal.  The VAX is  also
capable of using 4-bit-chunk decimal arithmetic, which is similar to the
well known BCD format.

The  Big-Endians struck again, and without any resistance got their way.
The decimal number 12345678 is stored in the VAX memory in this order:

                           7 8  5 6  3 4  1 2
                      ...|-------long0-------|
                     ....|--word1--|--word0--|
                    .....|-C1-|-C0-|-C1-|-C0-|
                   ......|B15....B0|B15....B0|

This ugliness cannot be hidden even by the standard Chinese trick.

                                   8

                 SUMMARY (of the Memory-Order section)

To  the best of my knowledge only the Big-Endians of Blefuscu have built
systems with a consistent order  which  works  across  chunk-boundaries,
registers,   instructions   and   memories.      I   failed  to  find  a
Little-Endians' system which is totally consistent.

                           TRANSMISSION ORDER

In either of the consistent orders the first bit (B0) of the first  byte
(C0)  of the first word (W0) is sent first, then the rest of the bits of
this byte, then (in the same order) the rest of the bytes of this  word,
and so on.

Such  a sequence of 8 32-bit words, for example, may be viewed as either
4 long-words, 8 words, 32 bytes or 256 bits.

For example, some people treat the ARPA-internet-datagrams as a sequence
of 16-bit words whereas others treat them as either 8-bit  byte  streams
or  sequences  of  32-bit  words.    This  has  never  been  a source of
confusion, because the Big-Endians' consistent order has been assumed.

There are many ways to devise inconsistent orders.  The two most popular
ones are the following and its mirror image.  Under this order the first
bit to be sent is the LEAST significant bit (B0) of the MOST significant
byte (C0) of the first word, followed by the rest of the  bits  of  this
byte,  then  the  same  right-to-left bit order inside the left-to-right
byte order.

Figure 1 shows the transmission  order  for  the  4  orders  which  were
discussed above, the 2 consistent and the 2 inconsistent ones.

Those who use such an inconsistent order (or any other), and only those,
have  to  be  concerned with the famous byte-order problem.  If they can
pretend that their communication medium is really a  byte-oriented  link
then this inconsistency can be safely hidden under the rug.

A  few  years ago 8-bit microprocessors appeared and changed drastically
the way we do business.  A few years  later  a  wide  variety  of  8-bit
communication  hardware  (e.g., Z80-SIO and 2652) followed, all of which
operate in the Little-Endians' order.

                                   9

Now   a  wave  of  16-bit  microprocessors  has  arrived.    It  is  not
inconceivable that 16-bit communication hardware will become  a  reality
relatively soon.

Since  the  16-bit communication gear will be provided by the same folks
who brought us the 8-bit communication gear, it is safe to expect  these
two modes to be compatible with each other.

The  only  way to achieve this is by using the consistent Little-Endians
order, since all the existing gear is already in Little-Endians order.

We have already observed that the Little-Endians do not have  consistent
memory orders for intra-computer organization.

IF  the 16-bit communication link could be made to operate in any order,
consistent or not, which would give it the appearance of being  a  byte-
oriented link, THEN the Big-Endians could push (ask? hope? pray?) for an
order  which transmits the bytes in left-to-right (i.e., wide-end first)
and use that as a basis for transmitting all quantities (except BCD)  in
the  more  convenient  Big-Endians  format,  with  the  most significant
portions  leading  the  least  significant,  maintaining   compatibility
between 16- and 32-bit communication, and more.

However, this is a big "IF".

Wouldn't  it  be nice if we could encapsulate the byte-communication and
forget all about the idiosyncrasies of the past, introduced by RS232 and
TELEX, of sending the narrow-end first?

I believe that it would be nice, but  nice  things  do  not  necessarily
occur, especially if there is so much silicon against them.

Hence,  our  choice now is between (1) Big-Endians' computer-convenience
and (2) future compatibility between  communication  gear  of  different
chunk size.

I believe that this is the question, and we should address it as such.

Short  term  convenience  considerations are in favor of the former, and
the long term ones are in favor of the latter.

Since  the  war  between  the  Little-Endians  and  the  Big-Endians  is
imminent, let's count who is in whose camp.

The founders of the Little-Endians party are RS232 and TELEX, who stated
that  the  narrow-end  is  sent  first.  So  do  the  HDLC  and the SDLC
protocols, the Z80-SIO, Signetics-2652,  Intel-8251,  Motorola-6850  and
all  the  rest  of  the  existing communication devices.  In addition to
these protocols and chips the PDP11s and the VAXes have already  pledged
their allegiance to this camp, and deserve to be on this roster.

                                   10

The HDLC protocol is a full fledged member of this camp because it sends
all  of its fields with the narrow end first, as is specifically defined
in Table 1/X.25 (Frame formats) in section 2.2.1 of  Recommendation X.25
(see [2]).  A close examination of this table reveals that the bit order
of  transmission  is  always  1-to-8.  Always, except the FCS (checksum)
field, which is the only 16-bit quantity in the byte-oriented protocol.

The FCS is sent in the 16-to-1 order.  How did the  Blefuscuians  manage
to  pull  off  such a fiasco?!  The answer is beyond me.  Anyway, anyone
who designates bits as 1-to-8 (instead of 0-to-7) must  be  gullible  to
such tricks.

The Big-Endians have the PDP10's, 370's, ALTO's and Dorado's...

An  interesting  creature  is the ARPANet-IMP.  The documentation of its
standard host interface (aka "LH/DH") states that "The high order bit of
each word is  transmitted  first"  (p. 4-4  of  [1]),  hence,  it  is  a
Big-Endian.    This  is very convenient, and causes no confusion between
diagrams which are either 32- (e.g., on p. 3-25) and 16-bit wide  (e.g.,
on p. 5-14).

However, the IMP's Very Distant Host (VDH) interface is a Little-Endian.

The  same  document  ([1],  again,  p. F-18), states that the data "must
consist of an even number of 8-bit bytes. Further, considering each pair
of bytes as a 16-bit word, the less significant  (right)  byte  is  sent
first".

In  order  to make this even more clear, p. F-23 states "All bytes (data
bytes too) are transmitted least significant (rightmost) bit first".

Hence, both camps may claim to have this schizophrenic  double-agent  in
their camp.

Note  that  the  Lilliputians'  camp  includes  all the who's-who of the
communication world, unlike the Blefuscuians' camp which  is  very  much
oriented toward the computing world.

Both  camps  have  already  adopted  the  slogan "We'd rather fight than
switch!".

I believe they mean it.

                                   11

              SUMMARY (of the Transmission-Order section)

There  are two camps each with its own language.  These languages are as
compatible with each other as any Semitic and Latin languages are.

All Big-Endians can talk to each other with relative ease.

So can all the Little-Endians, even though there  are  some  differences
among the dialects used by different tribes.

There is no middle ground. Only one end can go first.

                               CONCLUSION

Each  camp  tries  to convert the other.  Like all the religious wars of
the past, logic is not the decisive tool. Power is.  This  holy  war  is
not the first one, and probably will not be the last one either.

The  "Be reasonable, do it my way" approach does not work.  Neither does
the Esperanto approach of "let's all switch to yet a new language".

Our communication world may split according to the  language  used.    A
certain  book  (which  is  NOT  mentioned in the references list) has an
interesting story about a similar phenomenon, the Tower of Babel.

Little-Endians are Little-Endians and Big-Endians  are  Big-Endians  and
never the twain shall meet.

We  would like to see some Gulliver standing up between the two islands,
forcing a unified communication regime on all of us.  I do hope that  my
way  will  be chosen, but I believe that, after all, which way is chosen
does not make too much difference.  It is more important to  agree  upon
an order than which order is agreed upon.

How about tossing a coin ???

                                   12

                          time          time
                            |           |
           \                |           |                /
            \               |           |               /
             \              |           |              /
              \             |           |             /
               \            |           |            /
                \           |           |           /
                 \          |           |          /
                  \         |           |         /
                   \        |           |        /
                    \       |           |       /
                     \      |           |      /
                      \     |           |     /
                       \    |           |    /
                        \   |           |   /
                         \  |           |  /
                          \ |           | /
       <-MSB---------------LSB-       -MSB---------------LSB->
               order (1)    |           |    order (2)

                          time         time
                            |           |
            /               |           |               \
           /                |           |                \
              /             |           |             \
             /              |           |              \
                /           |           |           \
               /            |           |            \
                  /         |           |         \
                 /          |           |          \
                    /       |           |       \
                   /        |           |        \
                      /     |           |     \
                     /      |           |      \
                        /   |           |   \
                       /    |           |    \
                          / |           | \
                         /  |           |  \
       <-MSB---------------LSB-       -MSB---------------LSB->
               order (3)    |           |    order (4)

 Figure 1: Possible orders, consistent: (1)+(2), inconsistent: (3)+(4).

                                   14

                          R E F E R E N C E S

[1]   Bolt Beranek & Newman.
      Report No. 1822: Interface Message Processor.
      Technical Report, BB&N, May, 1978.

[2]   CCITT.
      Orange Book. Volume VIII.2:  Public Data Networks.
      International Telecommunication Union, Geneva, 1977.

[3]   DEC.
      PDP11 04/05/10/35/40/45 processor handbook.
      Digital Equipment Corp., 1975.

[4]   DEC.
      VAX11 - Architecture Handbook.
      Digital Equipment Corp., 1979.

[5]   Knuth, D. E.
      The Art of Computer Programming. Volume I:  Fundamental
         Algorithms.
      Addison-Wesley, 1968.

[6]   Swift, Jonathan.
      Gulliver's Travel.
      Unknown publisher, 1726.

                                   15

               OTHER SLIGHTLY RELATED TOPICS (IF AT ALL)

               not necessarily for inclusion in this note

Who's on first?   Zero or One ??

People  start  counting  from  the  number  ONE.  The very word FIRST is
abbreviated into the symbol "1st" which indicates ONE,  but  this  is  a
very modern notation.  The older notions do not necessarily support this
relationship.

In  English  and  French - the word "first" is not derived from the word
"one" but from an  old  word  for  "prince"  (which  means  "foremost").
Similarly,  the  English  word  "second"  is not derived from the number
"two" but from an old word which means "to follow".  Obviously there  is
an  close  relation between "third" and "three", "fourth" and "four" and
so on.

Similarly, in Hebrew, for example, the word "first" is derived from  the
word  "head",  meaning  "the foremost", but not specifically No. 1.  The
Hebrew word for "second" is specifically derived from  the  word  "two".
The same for three, four and all the other numbers.

However,  people have,for a very long time, counted from the number One,
not from Zero.  As a  matter  of  fact,  the  inclusion  of  Zero  as  a
full-fledged  member  of  the  set of all numbers is a relatively modern
concept.

Zero is one of the most important numbers mathematically.  It  has  many
important properties, such as being a multiple of any integer.

A  nice mathematical theorem states that for any basis, b, the first b^N
(b to the Nth power) positive integers  are  represented  by  exactly  N
digits  (leading zeros included).  This is true if and only if the count
starts with Zero (hence, 0 through b^N-1), not with One (for  1  through
b^N).

This theorem is the basis of computer memory addressing.  Typically, 2^N
cells  are  addressed by an N-bit addressing scheme.  Starting the count
from One, rather than Zero, would cause either the loss  of  one  memory
cell,  or  an  additional  address  line.    Since  either  price is too
expensive, computer engineers agree to use the mathematical notation  of
starting with Zero.  Good for them!

The  designers  of  the 1401 were probably ashamed to have address-0 and
hid it from the users, pretending that the memory started at address-1.

                                   16

This  is  probably the reason that all memories start at address-0, even
those of systems which count bits from B1 up.

Communication engineers, like most "normal" people, start counting  from
the  number One.  They never suffer by having to lose a memory cell, for
example.  Therefore, they are happily counting 1-to-8, and not 0-to-7 as
computer people learn to do.

ORDER OF NUMBERS.

In English, we write numbers  in  Big-Endians'  left-to-right  order.  I
believe  that  this is because we SAY numbers in the Big-Endians' order,
and because we WRITE English in Left-to-right order.

Mathematically there is a lot to be said for the Little-Endians' order.

Serial comparators and dividers prefer the former.   Serial  adders  and
multipliers prefer the latter order.

When was the common Big-Endians order adopted by most modern languages?

In  the  Bible,  numbers  are  described  in words (like "seven") not by
digits (like "7") which were "invented" nearly a  thousand  years  after
the  Bible  was  written.  In  the  old  Hebrew  Bible  many numbers are
expressed in the  Little-Endians  order  (like  "Seven  and  Twenty  and
Hundred") but many are in the Big-Endians order as well.

Whenever  the  Bible is translated into English the contemporary English
order is used.  For example, the above number appears in that  order  in
the  Hebrew  source  of  The  Book  of  Esther (1:1).  In the King James
Version it is (in English) "Hundred and  Seven  and  Twenty".    In  the
modern  Revised  American  Standard  Version of the Bible this number is
simply "One Hundred and Twenty-Seven".

INTEGERS vs. FRACTIONS

Computer designers treat fix-point multiplication in one of two ways, as
an integer-multiplication or as a fractional-multiplication.

The reason is that when two 16-bit numbers, for example, are multiplied,
the result is a 31-bit number in a 32-bit field.    Integers  are  right
justified;  fractions are left justified.  The entire difference is only
a single 1-bit shift.    As  small  as  it  is,  this  is  an  important
difference.

Hence,   computers   are   wired   differently   for   these   kinds  of
multiplications.  The addition/subtraction operation  is  the  same  for
either integer/fraction operation.

                             17

If  the  LSB  is  B0  then the value of a number is SIGMA<B(i)*[(2)^i]>,
for i=0,15, in the above example.  This is, obviously, an integer.

If the MSB is B0 then the value of a  number  is  SIGMA<B(i)*[(1/2)^i]>,
for i=0,15.  This is, obviously, a fraction.

Hence, after multiplication the Integerites would typically keep B0-B15,
the  LSH  (Least Significant Half), and discard the MSH, after verifying
that there is no overflow into it.  The  Fractionites  would  also  keep
B0-B15, which is the MSH, and discard the LSH.

One  could  expect Integerites to be Little-Endians, and Fractionites to
be Big-Endians.  I do not believe that the world is that consistent.

SWIFT's POINT

It may be interesting to notice that  the  point  which  Jonathan  Swift
tried  to  convey  in  Gulliver's Travels in exactly the opposite of the
point of this note.

Swift's point is that the difference between breaking  the  egg  at  the
little-end  and  breaking  it  at the big-end is trivial.  Therefore, he
suggests, that everyone does it in his own preferred way.

We agree that the difference between sending eggs with  the  little-  or
the  big-end first is trivial, but we insist that everyone must do it in
the same way, to avoid anarchy.  Since the difference is trivial we  may
choose either way, but a decision must be made.

*****

An editied version of this note appears in Computer Magazine (IEEE)
of October 1981.

*****

--
Regards
Tim Eccles

Other copies of this essay (variously known as "IEN 137" or "ON HOLY WARS AND A PLEA FOR PEACE") on the net:

http://www.op.net/docs/RFCs/ien-137
http://phobos.illtel.denver.co.us/cdrom/inet/ien/ien_137.txt (not accessible to Windows machines)
http://www.funet.fi/pub/netinfo/dialup-ip/FTP-sofware/nic/drafts/ien137.txt
http://www.cis.ohio-state.edu/htbin/ien/ien-137.html (has a link to all the Internet Engineering Notes (or IEN) documents)

(With so many copies, why did I feel I had to put this one online ? Because none of the others give it enough context.)

Resent-From: pci-sig-request@znyx.com
Resent-Date: Thu, 25 Jan 1996 14:51:14 -0800
From: John R Pierce 
Cc: "'pci-sig@znyx.com'" 
Subject: RE: Big Endian question
Date: Thu, 25 Jan 1996 14:51:14 -0800
Encoding: 41 TEXT
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

ROTFL!

Ya know that article misses one 'cute' tweak that DEC was doing with the
VAX to resolve the
'backwardness' of ascii printed in little-endian words...  They did their
HEX dumps with
byte 0 on the right, but the *-A-x-xxx-A-* stuff on the right was left to
right....  They put the address between the two.  This way words and dwords
read l-r, and you could read the ascii too...

 F E D C  B A 9 8  7 6 5 4  3 2 1 0    -ADDR-   0123456789ABCDEF
504F4E4D 4C4B4A49 48474645 44434241   00230040 *ABCEDFGHIJKLMNOP*
706F6E6D 6C6B6A69 68676665 64636261   00230050 *abcdefghijklmnop*

It took a little getting used to (about 5 minutes ;)

-jrp

----------

Dealing with endianness

Should bridges automatically convert for you ? No !

BLlib http://members.tripod.com/~szanella/bllibeng.htm is a C++ library that can be used to transparently exchange alphanumeric and binary information between machines that have different architectures (Big endian - Little endian), hence its name (Big-Little library).

Resent-From: pci-sig-request@znyx.com
Resent-Date: Tue, 14 Apr 1998 15:05:55 -0700
Date: Tue, 14 Apr 1998 14:57:49 -0700 (PDT)
From: Phil Ronzone 
Subject: Big and little endian issues
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

  > Philip Ronzone wrote:
  >
  > -- snip --
  >
  > > Having worked with more than one piece of hardware that tried to "help"
  > > by doing byte swapping, I can only say "DON'T DO THAT!".
  >
  > I disagree.
  >
  > At V3 Semiconductor we have a line of bridge chips that perform on-the-fly
  > endian conversion for data moving between the PCI bus and a local processor
  > bus, or vice versa (sorry for the plug; I just wanted to have substance for my
  > example).

OK. I'm sure that you do. Great. Etc.

Now, unless your bridge chips are transferring ONLY monotonic data, also
useless.

Just HOW does your bridge chip KNOW which fields in the data stream are
little-endian 16 and should NOT be swapped, big-endian 16 and DO need
to be swapped, and 8-bit ASCII text, should not be touched?

Oh, and how about 32-bit big-endian? That's NOT a case of just doing
adjacent 16-bit swaps. And 64-bit big endian integer?

Does your bridge handle these cases? I doubt it. If it does, PLEASE let
me know -- it would be a MARVEL!

No hardware can ever handle endian issues, UNLESS IT KNOWS the context of
the data -- what is big-endian, what is little endian, what is text data,
what is 16/32/64 bits, and so on.

Too often I've seen a hardware "designer" mumble something like "oh,
PCI is little endian, this is a big-endian processor, so I'll swap
those bytes".

Wrong.

There are some very nice chips out there that can be programmed to both
accept big/little endian addresses AND big/little descriptors in memory.

They work really nice.

Except when a hardware guy throws in a gratuitous swap.

  > More importantly, we have many customers that are happy with the endian-
  > conversion features of our bridges. They would surely disagree with your
  > advice.

OK. Good for you. Since I'm not buying your product, then you should ignore
me. Right?

However, are you talking to the hardware guys that selected the chip,
or did you talk to the SOFTWARE guys programming the chip?

  > > Byte swapping is inherently contextual. No hardware mechanism can ever
  > > know if those "next 4 bytes" are a text string and should NOT be swapped,
  > > or if those 4 bytes are "wrong endian" and should be swapped.
  > >
  > > The net effect (of hardware-provided swapping) is sometimes horrible.
  >
  > I agree, endian conversion is inherently contextual.
  >
  > However, in many applications, it usually fairly simple for software to arrange
  > to transfer blocks of similar-kind data (bytes, words, etc), and then engage
  > our DMA engines (with endian-conversion hardware) to deliver it without any
  > loss in performance. Where it is not possible (or not practical) to ensure that a
  > significant block has a uniform data type, software may be required to massage
  > a *portion* of the data, and then let the bridge's swapping hardware do the rest.

Yes, altho simple is misleading. Time consuming is more apprropriate. Since
EITHER WAY, software is going to have to swap something, somewhere, why
noy just drop the hardware swap. It just adds to confusion.

  > For example, when a device chooses to DMA a block of string data, the
  > device can be programmed to perform swapping on a byte boundary. When
  > transferring a block of 16-bit shorts, the device can be programmed to swap
  > on a 16-bit boundary, etc. Furthermore, by having endian-conversion for
  > DMA data programmable through the DMA descriptor, the bridge does not
  > have to be re-programmed before transferring each block of data. This coupled
  > with DMA-chaining permits the transfer of several blocks of data with like
  > intra-block data types, but unlike inter-block data types to be converted and
  > delivered efficiently.

Well, monotonic data (like, all 32-bit x-edian values) is rare. But, just how
would a low level device drive KNOW what data is what?

A disk driver operates at the level of "read or write block N. Here's
the buffer address".

This *IS* the whole point. I perhaps should have expanded on it first.

I'm a software guide. HARDWARE PEOPLE, MOST OF THE TIME, DO NOT UNDERSTAND
WHAT IS GOOD FOR PROGRAMMING (and vice versa I suppose). HARDWARE BYTE
SWAPPING DOES MORE HARM THAN GOOD.

I personally think that homocide should be legal ( :-) ) if one more
hardware guys tells a software guy to "make it correct in software!

Kaboom - like, just what is DMA for in the first place?

Sure, a program CAN do all that extra, data massaging stuff, and burn lots
of CPU cycles and delay things, but, I rather think hardware should do
it right.

(As an allied example, consider DMA hardware that requires I/O to start on
some power of two, and have a length a power of two -- IDE controllers
for example. Do you know how LONG it takes to align a buffer when a 120K
byte IDE drive read starts on an odd byte address? A lot!)

  >
  > > In a system that swaps to/from PCI, but not to/from memory, I found
  > > that bi-endian devices (i.e., the DEC2114x Ethernet chips) were made
  > > "slow". The DEC2114x can specify either endian for both data and
  > > descriptor lists, but since the hardware didn't consistently swap,
  > > either the data or the descriptors always had to be swapped in
  > > software.
  > >
  > > Ugh.
  > >
  > > Thus, I assert that hardware swapping is useless.
  >
  > I have to disagree with Phillip's assertion. It is unfortunate that he has had
  > such bad experiences with devices that perform hardware endian conversion.
  > I would agree that, for the contextual reasons mentioned above, endian
  > conversion hardware is not the holy-grail for endian conversion -- it is simply
  > an accelerator to perform simple endian conversion cases without losing any
  > performance to those cases.

It's not. If you're JUST transferring monotonic data over your chips, like
a series of 32-bit big-endian integers, then you're fine. But, from your
description, you'd be seeing all kinds of data. Data that has all kinds
of formats, should that can be swapped, and some that can't. So why not leave
it alone?

It just makes it confusing.

  > Certainly, not every application will be able to get away from doing some endian
  > conversion in software. Nevertheless, even if you could only use the endian-
  > conversion hardware on 50% of the data being moved across the bridge (you
  > should be able to do much better), then you only take the hit in performance
  > on the unaccelerated 50% (or less) of the data.Please, don't discount the
  > performance gains that are to be had simply because
  > the hardware cannot handle every possible case.
  >
  > Michael Tresidder
  > V3 Semiconductor
  > tresidd@vcubed.com

Just HOW is this "50%" to be determined?

The low-level driver has NO knowledge of the data structure, so it can't
do it.

The application can't always do it, and it shouldn't do it (for example,
you embed say, an Adobe postscript engine, which comes in object only
format), because the source code should be the same for a big and
little endian machine, so who would do it?

Also, if I read your post correctly, you seem to imply that your bridge
chip does only (??) 16-bit endian swapping. Is this right? "Cause that
would screw up 32-bit big endian numbers, right?

SUMMARY
-------
Hardware, such as bridge chips per above, should NOT attempt to be smart
and endian swaps. It only scrambles the data more.

Hardware is better when I/O can be done by descriptor chains.

Those descriptors are best when any byte alignment and any lenght can
be any value. Please, end the alignment & modulo length restrictions.

REFOCUS
-------
Of course, perhaps I'm all wrong. So tell me, if one is reading a little-endian
data IDE disk drive, with a DOS/W95 FAT-16 partition, on a MIPS big-endian
system, just HOW would a bridge chip with endian swapping be useful?

  1 - 16-bit only swapping will screw up 32-bit values.
  2 - 32-bit only swapping will screw up 16-bit values.
  3 - Swapping at all screws up byte fields (strings etc.).

You haven't lived until you've programmed code on a big-endian machine
to "reaasemble" bytes strings that were 32-bit endian swapped BUT
read/written as 16-bit numbers. Confusion and complexity that would
have been avoided by NOT trying to be helpful.

Resent-From: pci-sig-request@znyx.com
Resent-Date: Thu, 25 Jan 1996 23:29:08 -0800
Mime-Version: 1.0
Date: Thu, 25 Jan 1996 23:29:08 -0800
From: David B Gustavson 
Subject: Resolving the Big endian question
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

I don't want to repeat my earlier posting on this, but just to alert the
newcomers to the discussion:

After many years of discussing these issues in the IEEE Bus Standards, it
finally became clear (once one looks from a large-system viewpoint) that
the solution has to be:

The bridge between any two buses preserves the relative byte addresses of
all data flowing through. (This is sometimes called the Address Invariance
principle).

That's what it takes to preserve character strings as they flow through a
complex system.

The point of this axiom is not to imply that character strings are
particularly important--most users strongly want to preserve m-bit
addresses or n-bit integers or whatever happens to be important for their
application's efficiency.

But the bridge never has enough information to be able to do this correctly
in all situations. The best you can do is stop bridges from scrambling the
data at each crossing in vain attempts to fixup certain kinds of data at
the expense of others.

If bridges obey the Address Invariance principle, then all you have to know
is the meaning of the data where created, and the format such data needs on
the destination machine, for software to correctly interpret all the data.
Without that principle, you also have to know the state of all byte
swappers in all bridges the data moved through at the time that data passed
by, and also the path the data happened to take on that occasion. That is
impossible in realistic complex systems.

The missing piece was: how to tell the compiler on your destination machine
what the original meaning of the data was. That problem was solved by
ANSI/IEEE Std 1596.6-1993, which adds the syntax needed to handle the
Endian and some other similar problems (like atomicity) that can break
software drivers etc if not handled correctly.

One of the beauties of this solution is that it is quite efficient: if data
moves through an arbitrary world back into a processor like its originator,
there is zero overhead. If it moves to a processor with different
conventions, conversion costs are minimal, only affect the subset of the
data that are used, and allow the data (plus possible computational
enhancements) to flow onward to other destinations without further
confusion.

There's a sink hole under this well-plowed ground. Don't fall in. Don't pay
the price of rediscovering all this again--it costs a LOT. It's very hard
to have a broad enough perspective while designing the early chips in any
new technology to see the downstream implications.

We had to resolve these issues for SCI LAMP, because it is fundamentally
based on a heterogeneous world with links and buses of many widths,
processors of all types, growing like Topsy over time and technologies but
interoperating and evolving. Fortunately, the IEEE groups had wrestled with
the problems for many years, with S-100, Multibus, Multibus 2, VME,
FutureBus, NuBus, SerialBus, etc etc etc., and the solution finally came
together.

(This is a very slippery problem because it is so confusing--many things
that seem like Endian problems are really not, are merely relabeling of
buses and data. Here and there labeling gets clarified, wherever addresses
are multiplexed with data, but even that is not really fundamental. The
fundamental thing is really how the externally addressed bytes end up in
multibyte processor registers. The carry propagation direction in the
register defines significance unambiguously.)

--David B. Gustavson            phone 415/961-0305 fax 415/961-3530
SCI (ANSI/IEEE Std 1596 Scalable Coherent Interface) chairman
Exec. Director, SCIzzL: Assoc. SCI Local-Area MultiProcessor Users
1946 Fallen Leaf Lane, Los Altos, CA 94024-7206 dbg@sunrise.scu.edu
For more info on SCI etc., see the Web: http://sunrise.scu.edu

http://sunrise.scu.edu/

Resent-From: pci-sig-request@znyx.com
Resent-Date: Tue, 23 Jan 1996 18:24:40 GMT
From: ingvar.berg.swe3650@oasis.icl.co.uk
Date: Tue, 23 Jan 1996 18:24:40 GMT
Reply-To: ingvar.berg.swe3650@oasis.icl.co.uk
Subject: RE: Endian strategy (was Re: PCI and bi-endian hosts)
Priority: NORMAL
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

I saved the following explanation that appeared on this mail-list some
time ago. Seems to me to be the FAQ needed.
/Ingvar
----------
>
We wrestled with these problems in various IEEE bus standards for over
10 years before we understood that there is only one solution that
works in a general way, without being an ad hoc kluge that breaks when
the system configuration gets changed.

It becomes really obvious when you look beyond a single bridge and think
of more general interconnected systems that use serial buses or
cable/fiber buses of various widths to connect subsystems.

The bridge never has enough information to do the conversions properly.
If you cared enough, you could give it the information, by using fully
tagged data as is done in some network protocols, but that will be
slow and expensive, and requires a lot of software and processor
architecture support to deliver that information to the hardware.
That's not a practical solution.

The solution is to never swap bytes in intermediate hardware, always
delivering each byte in a multi-byte data type at its original relative
address. I.e., byte 0 of a 64-bit integer on a little endian machine
ends up in byte 0 on a big endian machine too.
This clearly works fine for character strings, and scrambles everything
else.

The key is, it scrambles everything in a simple predictable way, which
is not true if intermediate hardware reorders the bytes--then you have
to know the behavior of every byte-swapper in the data path at the
moment your data were transferred, a really impossible problem in
complex systems.

So, let the hardware preserve relative byte addresses.

Then augment the data description information you feed your compiler so
that it knows the original format of the data.
For example, "integer" is uselessly vague--you need to know how wide
the integer is, and whether it is big-endian or little-endian on the
source machine.

With that information, the compiler, knowing what kind of machine it
is running on, can generate the right kind of loads/stores and swaps
to make the computations work correctly.
Without that information, or with hardware that swaps bytes, the job
of untangling this always ends up in the hands of the programmer, and
in my experience is a messy ad hoc job that is very hard to get right
and keep right as the system evolves. It's a major time sink.

So, we created a standard for the extra descriptive info you need to
give the compiler. I think some experimental compilers have implemented
these features, but I don't know their availability. Macros might be
sufficient initially, even without compiler support.

The standard is:
ANSI/IEEE 1596.5-1993
  IEEE Standard for Shared Data Formats Optimized for Scalable Coherent
    Interface (SCI) Processors

Don't let the title fool you--the only thing SCI-specific in this is
the name (limiting the scope made the project finite so it could actually
be completed in time to be useful).

The significance of SCI here is only that SCI supports distributed
multiprocessing with heterogeneous processors, mixtures of big and little
endian, with various bus widths, with arbitrary-width interconnects, with
bridges, switches, adapters, etc., all evolving over time, with any mix
of message-passing, shared memory, DMA, etc.

SCI also places a very high premium on efficiency and low latency,
every nanosecond counts in the memory path of a supercomputer. If
there were a hardware solution, we would have used it.

I see a lot of bridges that have swapping built in, and often swapping that
can be turned on and off. If those ever get incorporated into larger
systems that allow distributed I/O, data striping, etc they are going to
create an incredible mess. Job security for driver writers...
and added latency that will let cleaner-living competitors outrun you.

Dave Gustavson

p.s. Endianness isn't the only problem. In order to compute correctly, one
has to convert data formats in many (context dependent) ways. E.g.,
ASCII-EBCDIC, integer widths (& signed/unsigned), floating point (IBM,
DEC, IEEE, etc).

Also, software needs to understand which transfers are "atomic"--it can
break the OS if it writes an 8-byte pointer without realizing that the
hardware breaks that into two 4-byte stores that might be separated by
a read from a DMA controller or another processor.

1596.5 deals with all these issues too.

--David B. Gustavson            phone 415/961-0305 fax 415/961-3530
SCI (ANSI/IEEE Std 1596 Scalable Coherent Interface) chairman
Exec. Director, SCIzzL: Assoc. SCI Local-Area MultiProcessor Users
1946 Fallen Leaf Lane, Los Altos, CA 94024-7206 dbg@sunrise.scu.edu
For more info on SCI etc., see the Web: http://sunrise.scu.edu

http://sunrise.scu.edu/

PCI

Is PCI inherently big-endian or little-endian ? While PCI is inherently little-endian (the first byte of memory is multiplexed with the least-significant 8 bits of the address), there are standard techniques for connecting big-endian and bi-endian hosts to the PCI bus. ("Address invariance" is the best technique).

Resent-From: pci-sig-request@znyx.com
Resent-Date: Fri, 19 Jan 1996 18:22:00 -0700
From: "Monish Shah" 
Date: Fri, 19 Jan 1996 18:22:00 -0700
Subject: Re: Big Endian question
Mime-Version: 1.0
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

On Jan 19,  2:47pm, Jim Holeman wrote:
> Subject: Big Endian question
>
> How are various designers and systems handling PCI designs interfacing to
> systems with Big Endian processors (especially where the Big Endian
> processor is using a 64 bit bus)?

This is actually a very complicated problem with no easy answers.  In fact,
exactly what you have to do is very dependent on the specific function you
are trying to implement and how your driver level software works.

However, one trick that is likely to go a long way is the following:

Request twice as much address space as you would have.  Map in all your
registers (or memory or whatever) twice - once with little endian byte
ordering, another time with big endian byte ordering.  Software can then
choose which view it prefers.

Note that this is much better than doing the byte swapping with a mode bit.
You will inevitably run into cases where software will want some registers to
be byte swapped but not others.  If both versions are accessible
simultaneously, software can pick which copy it will talk to for each
register and choose the address accordingly.  That is much easier than
having to switch the mode bit.

The next question is, do you swap bytes only within a 4 byte field or do
you swap all 8 adjacent bytes as a single field, or what.  The general
answer is, swap to the natural size of the register.

Example of 4 byte swapping:  Bytes numbered 0123 4567 appear as 3210 7654.

Example of 8 byte swapping:  Bytes numbered 0123 4567 appear as 7654 3210.

If you have a pair of 32 bit registers, follow example #1.  If you have one
64 bit register, follow example #2.  Other sizes are left as an exercise
for the reader.

My answer above was aimed at solving the problem for CPU accesses to a
card.  DMA is a different story.  For DMA, the answer depends on the
function.  For a SCSI card, it is probably reasonable to not do any byte
swapping on DMA because the data that you are DMAing is probably already in
the correct byte order.  I.e., if the system is big endian, it probably
stored the data on the disk in big endian format to begin with, so it
matches.  Same goes for little endian.

>   holeman
>
>   ____________________________________________________________________________
>
>   Jim Holeman                                          Tandem Computers, Inc.
>   (512) 432-8755 (fax 8247)                            14231 Tandem Boulevard
>   holeman@isd.tandem.com                               Austin, Tx 78728-6699
>
>   "A gentle answer turns away wrath"

Monish Shah
Hewlett Packard

Resent-From: pci-sig-request@znyx.com
Resent-Date: Thu, 30 May 1996 10:59:18 +1000
Date: Thu, 30 May 1996 10:59:18 +1000
From: bull@highland.com.au (Geoff Bull)
Subject: Re: Byte Lanes
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

Steve Belvin wrote:
> I am developing a PCI expansion bridge and am dealing with byte ordering
> issues.  My expansion bus requires byte 0 to be on the lines 7:0 of the bus.
> My experience says this is an unusual requirement for a bus.  I am
implementing
> configuration registers internally that use both big endian and little endian
> formats in the specifications which define them.
>
> I have also decided I need to implement byte swapping within my expansion
> bridge in order to make big endian and little endian processor DWord formats
> happy.  I expect, however, for this capability to be used by few.  The
> expansion bridge will likely maintain byte 0 in lines 7:0 on both buses.
>
> My question is:
>
>   Does PCI require byte 0 always be placed on lines 7:0 of the bus?
>
> I have looked the spec. over and do not see where this is expressed so
directly
> as it is in my expansion bus standard.  An example where putting byte 0 on PCI
> lines 31:24 may be desired is configuring the big-endian format registers.  I
> would prefer to configure those directly in software and not get involved in
> byte swapping and byte lane issues - these can be one of the worst hazards to
> be involved in.
>
> Steve Belvin


Endian issues have been done to death here and you will find it all in the
archive. I dredged out the following:

>Resent-From: pci-sig-request@znyx.com
>Resent-Date: Thu, 25 Jan 96 08:55:00 PST
>
>Gerhard Petrowitsc wrote:
>>What I may have confused you with is perhaps that I said how the bytes are
>>swapped. (0-7, 1-6 etc.)
>>Sorry for that, I should have put it that way: If you connect a little and a
>>big endian bus and want
>>to keep the byte ordering ("first byte in a string is at address 0"), you
>>must connect the busses
>>as shown below (for 32bit busses)
>>
>>       Little Endian bus    Big Endian bus
>>
>> byte 0  Bit  7..0      <->   Bit 31..24   byte 0
>> byte 1  Bit 15..8      <->   Bit 23..16   byte 1
>> byte 2  Bit 23..16     <->   Bit 15..8    byte 2
>> byte 3  Bit 31..24     <->   Bit  7..0    byte 3
Wen-King Su wrote:
>To confuse things even further, if A & D are multiplexed, during the
>address phase the bus has to be reconnected like this:
>
>      Little Endian bus            Big Endian bus
>  byte 0 (LSB) Bit  7..0  <-> (LSB) Bit  7..0    byte 3
>  byte 1       Bit 15..8  <->       Bit 15..8    byte 2
>  byte 2       Bit 23..16 <->       Bit 23..16   byte 1
>  byte 3 (MSB) Bit 31..24 <-> (MSB) Bit 31..24   byte 0

This is implementing what is called address invariance.
Do anything else and your programmers will hate you forever!

It prevents the address map of the little ended pci bus from
being scrambled from your big-ended cpu's point of view.
The programmer must do byte swaps when doing programmed IO.
The above tranformation takes care of dma.

If the cpu is bi-endian put any internal registers in the bridge
on the little ended side so that their address map doesn't change when
the cpu changes mode.

You could take a look at some the PowerPC documents at:
ftp://ftp.austin.ibm.com/pub/technology/spec
It clearly specifies how to interface a bi-endian cpu to PCI.

==============================
Geoff Bull, Hardware Engineer

TPG Server Pty Ltd
Suite 1, 348 Argyle Street,
Moss Vale, NSW, 2577
AUSTRALIA

Phone: +61 48  69 2066
Fax: +61 48  69 2080
Timezone: GMT+10 (Australia/NSW)
Email: Geoff Bull

ftp://ftp.austin.ibm.com/pub/technology/spec

details

The 68HC16 and 68HC11 controllers have 2 serial completely different serial ports:

synchronous (clocked) SPI, which sends Most Significant Bit first (Big-endian)... and
asynch-- which is fairly compatible with most RS-232 methods:
start bit, then Least Significant Bit .... then Most Significant Bit, then "stop bit" gap for a indefinite time until next char. (little-endian)

[-- according to David Cary's data sheets.]

The Intel x86 series of processors are little endian machines (therefore Linux, MS-DOS, MS-Windows, etc. that run on them are little-endian operating systems)

Most Unix machines (HP, Digital..) are big endian.

Most Motorola CPUs are big-endian.

(from the Magic 6.3 source code, downloaded from one of the archives listed at http://www.research.digital.com/wrl/projects/magic/pc.html )

/* ---------------- Start of Machine Configuration Section ----------------- */

/* The great thing about standards is that there are so many to choose from! */
#ifdef	m68k
#define	mc68000		
#endif

/* Both Sun3 and SPARC machines turn on LITTLE_ENDIAN!!!  Buy a DECstation, you
 * bozos.
 */
#ifdef BIG_ENDIAN
#undef BIG_ENDIAN
#endif
#ifdef LITTLE_ENDIAN
#undef LITTLE_ENDIAN
#endif

    /* ------- Configuration:  Selection of Byte Ordering ------- */

/*	Big Endian:
 *		MSB....................LSB
 *		byte0  byte1  byte2  byte3
 *
 *	Little Endian:
 *		MSB....................LSB
 *		byte3  byte2  byte1  byte0
 *
 *	In big-endian, a pointer to a word points to the byte that
 *	contains the most-significant-bit of the word.  In little-endian,
 *	it points to the byte containing the least-significant-bit.
 *
 */

#ifdef	linux
#define	LITTLE_ENDIAN	/* Intel x86 processors running Linux >=.99p7. */
#define	sigvec		sigaction
#define	sv_handler	sa_handler
#endif

#ifdef	vax
#define	LITTLE_ENDIAN	/* The good 'ol VAX. */
#endif

#ifdef	MIPSEL
#define	LITTLE_ENDIAN	/* MIPS processors in little-endian mode. */
#endif

#ifdef	wrltitan
#define	LITTLE_ENDIAN 	/* A DEC-WRL titan research machine (only 20 exist). */
			/* NOT intended for the Ardent titan machine. */
#endif

#ifdef	MIPSEB
#define	BIG_ENDIAN	/* MIPS processors in big-endian mode. */
#endif

#ifdef	mc68000
#define	BIG_ENDIAN	/* All 68xxx machines, such as Sun2's and Sun3's. */
#endif

#ifdef	macII
#define	BIG_ENDIAN	/* Apple MacII (also a 68000, but being safe here.) */
#endif

#ifdef	sparc
#define	BIG_ENDIAN	/* All SPARC-based machines. */
#endif

#ifdef	ibm032
#define	BIG_ENDIAN 	/* IBM PC-RT and related machines. */
#endif

#ifdef	hp9000s300
#define	BIG_ENDIAN 	/* HP 9000 machine.  */
#endif

#ifdef	hp9000s800
#define	BIG_ENDIAN 	/* HP 9000 machine.  */
#endif

#ifdef	hp9000s820
#define	BIG_ENDIAN 	/* HP 9000 machine.  */
#endif

bibliography

Much of this information came from the PCI SIG mailing list.

PCI SIG http://www.pcisig.com/ comp.arch

Resent-From: pci-sig-request@znyx.com
Resent-Date: Tue, 23 Jan 1996 11:22:23 +1100 (EST)
Date: Tue, 23 Jan 1996 11:22:23 +1100 (EST)
From: Andrew Cagney 
Subject: Re: Proposal: Endian FAQ
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

Hello,

If someone is going to build up an FAQ on the BE/LE issue.  I strongly
advise them to include references to documentation written by
IBM/Apple/Mot (such as CHRP) along with a number of other papers.

People wanting to learn more about the BE/LE issue are encouraged to
read those documents.

Key references on this would include:

        ftp://www.mot.com/pub/SPS/PowerPC/library/hw_spec/PPCPlatform.pdf

The Appendix on Bi-endian support.  Someone better net-connected could
probably look around and find the URL pointing directly at that appendix.

        ftp://ftp.austin.ibm.com/pub/technology/spec/README

That directory contains a number of papers some of which again discuss
the issue of BE/LE support.

Finally, if you're a hardware engineer, I strongly recommend that you
discuss this issue with a software developer (that has had experience
implementing network or machine independent IO code).  If the two of you
disagree as to how things should work, I suspect (based on experience)
it is the hardware and not the software person that is confused.  If the
two of you are the same person, you've got serious problems ... :-)

        enjoy,

                Andrew

ftp://www.mot.com/pub/SPS/PowerPC/library/hw_spec/PPCPlatform.pdf ftp://ftp.austin.ibm.com/pub/technology/spec/README

Resent-From: pci-sig-request@znyx.com
Resent-Date: Tue, 27 Feb 96 10:15:34 -0800
Date: Tue, 27 Feb 96 10:15:34 -0800
From: Alan Deikman 
Subject: Re: endian conversion - what's up ?
Precedence: list
Resent-Sender: pci-sig-request@znyx.com
To: Mailing List Recipients 

At 08:27 AM 2/27/96 PST, dave parker wrote:

>     This is a question that I have been trying to get an answer to for a
>     while. Please indulge me with your thoughts !
>
>     The PCI spec states that PCI is a little-endian bus. This suggests to

I guess you  missed it, but there was a long thread on this forum on
this subject with several well-informed and insightful posts.  A quick
search of the most recent tar archive yields the following.  Don't
miss article #1967.

>1912:Subject: Big Endian question
>1915:Subject: Re: Big Endian question
>1915:> Subject: Big Endian question
>1919:Subject: Re: Big Endian question
>1920:Subject: Re: Big Endian question
>1923:Subject: Re: Big endian question
>1924:Subject: Endianess (es)
>1926:Subject: Re: Big endian question, don't swap!
>1932:Subject: Proposal:  Endian FAQ
>1937:Subject: Re: Proposal: Endian FAQ
>1941:Subject: Re[2]: Proposal: Endian FAQ
>1943:Subject: RE: Endian strategy (was Re: PCI and bi-endian hosts)
>1966:Subject: Re(3): Big endian question
>1967:Subject: Re: Big Endian question
>1967:>gnu> Subject: Byte Order: On Holy Wars and a Plea for Peace
>1970:Subject: Re:  Re(3): Big endian question
>1971:Subject: Re: Re(3): Big endian question
>1972:Subject: RE: Big Endian question
>1972:Subject:   Re: Big Endian question
>1977:Subject: Resolving the Big endian question
>1978:Subject:
>1979:Subject: Re(5): Big endian question

The number at the beginning of each line is the article number.  To
retrieve the archive, send e-mail to pci-sig-request@znyx.com.  In
the Subject: line, put the word "archive" without the quotes.

In the body of the message put

  get 9601tar

for the "tar" version, OR

  get 9601zip

for the "zip" version.  BOTH files are UUencoded, which most
mailers can deal with.  If not, ask help from a UNIX drone near
you.  They are easy to spot by their social habits if you know
what to look for.

Regards,

--------------------------------
  Alan Deikman, ZNYX Corporation
  alan@znyx.com

CAN

Controller Area Network (CAN) serialportdocs.html#can is Big Endian for this reason: by transmitting the most-significant bits first, the highest-priority messages will always gain control of the bus without "destructive" collisions. This relies on the fact that comparing 2 numbers gives the correct answer quickest by starting from the most-significant bit.

floating-point format

(see above for some other floating-point comments).

http://sunburn.stanford.edu/~knuth/mmix.ps.gz

http://sunburn.stanford.edu/~knuth/mmix.html

Netlib http://www.netlib.org/ is a collection of mathematical software, papers, and databases. Simple things like decimal-string-to-IEEE-floating-point conversion, and more complex things.
IEEE floating point math can do exact integer calculations, a few comments on roundoff error http://www.mathworks.com/publications/newsletter/pdf/Fall96Cleve.pdf
"Some reflection on the IEEE floating-point format shows that an unsigned integer comparison is the same as a magnitude comparison of two floats with equal signs." -- Dave Gillespie Date: Wed, 27 Oct 1993 21:43:56 GMT Keywords: assembler, optimize, performance Newsgroups: comp.compilers
Date: Fri, 12 Mar 1999 07:06:04 +0100 From: Mathias Brossard X-Accept-Language: en, fr To: ogrimes@bellatlantic.net CC: "f-cpu@egroups.com" Subject: [f-cpu] Re: Instruction Set debate > > BTW, which adressing mode should we support ? > > I have been wondering about this. I am really unsure... The F21 does it by > saying "the following instruction is actually an immediate value to be added to > the number on the top of the stack." That could be a fairly good way of doing > it... I thought of this too. So I asked a friend who has done some RISC assembler, and he said RISC processors load constants in small parts. His answer to my idea, was (vaguely translated from swedish) "That's how it's done on Alphas and Mips, do you think they wouldn't have thought of it?". So I doubted a little... If you use this then it's not strictly a fixed length instruction set anymore. This means only that the decoder will have to tag the immediate value as "non-code". > Bloat must be eliminated! The eternal computer dilemma size versus speed. I always go for speed. > > Does that fit into a 32 bit instruction ? > > Operation (7 bits), A (6 bits) b(6 bits) c (6 bits) Unless we add more registers > to enhance the SIMD capabilitys of the design... Ok I'm currently thinking that we could do with 32 bits instructions, things I was trying to do with 64 bits. For that we would have to do a nice decoder that can decode and launch many instructions each cycle. Although a little more complicated it's possible: the advantages of my idea: - simpler CPU - puts the burden on the compiler. It's the compiler that finds the intrisic parallelism and not the CPU that has to do it on-the-fly. > > I'll admit these are grey areas in my knowledge... I don't know if the > >whole OS can run in interrupted: > > Linux does but any *GOOD* OS doesn't. If Linux does, so do we... Is that on all architecture ? > > a program should be able to call the OS in running mode and then the OS > > would disable the interrupt... just an idea... > > That is how intel processors handle it. Is that "How It Should Be Done(tm)" ? > > Do we have OS specialists ? We need some light... > > I have studied operating system design for years in hopes of developing my own. > I am among the keenest specialists in operating system design that you will > find. :) I said I would check on IEEE754, so I did: The IEEE 754 standard specifies: * Storage formats: IEEE754 C Type Bits Exponent Mantissa ----------------------------------------------------------------- - Single float 32 8 24 - Double double 64 11 53 - Double ext. long double >=80 >=15 >=64 ----------------------------------------------------------------- * Precise specifications of the results of operations Operations specified: - Addition - Substraction - Multiplication - Division - Square root - Remainder (modulo) - Conversion to/from integer - Conversion to/from printed base-10 * Special values - + or - 0 - Denormalized number - NaN (Not a Number) - + or - Infinity * Specified runtim behavior on illegal operations - Overflow to infinity - Underflow to zero - Division by zero - Invalid operation - Inexact operation There are much more details on http://www.loria.fr/serveurs/CCH/documentation/IEEE754 If you intend to make some FPU modules, this is a Must. Mathias -----------

misc

From: "Kevin D. Kissell" 
Newsgroups: comp.arch
Subject: Re: Little Endian vs. Big Endian
Date: 19 Oct 1995 13:57:24 GMT
Organization: The Institute for Impure Science
Lines: 44
Message-ID: <465lg4$m42@hercules.neu.sgi.com>
References: <1995Oct19.114942.21621@comserv.itri.org.tw>
NNTP-Posting-Host: jones.neu.sgi.com
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.1IS (X11; I; IRIX 5.3 IP22)
X-URL: news:1995Oct19.114942.21621@comserv.itri.org.tw

hcc@e0sun3 (Hsin-Chou Chi) wrote:
>Hi there.
>
>Excuse me if this issue has been raised before, but does
>anybody know what are the pros and cons of Little Endian
>vs. Big Endian?  I know what these two schemes mean.
>However, it is not clear to me why some machines choose
>one rather than the other.

Oh Nooooooooo!   The Holy Wars continue!  ;-)

The *bad* reason for doing little-endian machines is that one can
take the address of an operand, and if one "knows" that the range
of values is expressed within a small number of bits, one can reference
the low-order part of the operand at the address without knowing its
actual size. This kind of code really existed in UNIX 15 years ago,
but was always agressively stupid, and has long since been cleaned up.

I once had a discussion with an AI type who insisted that little-endian
machines were better for implementing tagged data types in software,
but when pressed for why, he was never able to explain.  Maybe it's true,
but I doubt it.

The *lame* reason for doing big-endian machines is that, if one
takes a byte-wise hex dump of memory, one can read the values of
multi-byte data items without shuffling the bytes in your head.
This is not agressively stupid, but it is less and less important
as fewer people have to look at hex dumps.

A slightly better reason for doing big-endian machines is that
the standard byte-ordering for internet packet headers is big-endian,
and little-endian machines have to burn a few extra cycles per packet
to put them in canonical order.

So, in the abstract, big endian is preferable, but for fairly silly
second-order reasons.

--
Opinions expressed may not be		Kevin D. Kissell
those of the author, let alone		Silicon Graphics Core Technology Group
those of Silicon Graphics.		Cortaillod, Switzerland

					kevink@neu.sgi.com

"Two font files were necessary to allow for both big and little endian machines. DEC, Linux and DOS store binary files the same way (storing the low byte first followed by the high byte) whereas SUN and SGI store the high and low byte the other way around." -- Noel J. Rode May 1994 in file "genghis.txt".

middle-endian http://www.wins.uva.nl/~mes/jargon/m/middle-endian.html

Rather than choosing Big endian or little endian, there is a 3rd option. Some machines *only* access memory in word-aligned units of one full word. This makes the memory interface a lot simpler. (I hear the Alpha does this).

It seems likely to me that the Freedom CPU will have a multiplexed address/data bus (rather than a 64 bit address bus and a independent 64 bit data bus). Will the lsb address wire be the same as the lsb data wire or will the lsb address wire be the same as the MSb data wire ? Is this difference visible to software ?

Definitions of lsb and MSb: The least significant bit (lsb) of the data bus is the bit that always changes when you increment a integer. Toggling the least significant bit (lsb) of a address generally makes it point to a different register of the same destination device. Toggling the Most Significant bit (MSb) and bits close to it invariably makes a address point to a different "page" on a RAM, and usually a completely different device. Reading anywhere (forwards or backwards) from the same (typically 1 Kword ?) "page" of RAM is fast; there is a fixed overhead for switching to an other page of RAM.

OK, here's my summary on the Big endian v. little endian debate so far:

Humans are confused by endian issues. Ekkehard Morgenstern correctly understands how x86 machines and 68K machines represent integers, but his definitions of the terms "big endian" and "small endian" are exactly the *opposite* of the way Danny Cohen defined them in his classic 1980 paper ( http://www.rdrop.com/~cary/html/endian_faq.html ).

Pros for MSb at lowest address:

For people used to reading English text, it's slightly easier to understand dump files. (Ekkehard Morgenstern pointed this out). For example, with a 68K sort of machine.
```
	address: 98 99 9a 9b 9c 9d 9e 9f
	data   : 74 65 73 74 00 00 00 05
	
```
is the string "test" followed by the integer 5. (The x86 sort of machine would interpret the last part as the integer 0x0500_0000).
Multi-word integer division is easier this way. (But then, how often are we really going to need integers so big they won't fit in a single 64 bit word ?)

Pros for lsb at lowest address:

For people used to reading Hebrew text, it's slightly easier to understand dump files.
We want this chip to interface with the PCI bus (Olaf Naumann reminded me of this), and the PCI bus is inherently lsB at lowest address (Is this true ?).
multi-word integer addition is easer this way. (But then, how often are we really going to need integers so big they won't fit in a single 64 bit word ? We *never* need *addresses* so big they won't fit into a single 64 bit word.)
If you have instructions with address offsets that are too long to grab in a single read, you can continue to increment the "program counter" normally, read the LSB and ripple through its result while you read the MSB.

Pros for Neither Endian:

Reading and writing only whole, aligned words results in much simpler hardware and software. (Ekkehard Morgenstern pointed this out)

Items that make endian somewhat irrelevant:

RAM doesn't care which endian you make it.
Other devices are so slow that we have more than 10 CPU clock cycles after writing one word to it before it's ready to accept the next word. While we're waiting, we have plenty of clock cycles available to shuffle around the next word to whatever endian is needed by that device. ( Standard PCI now has top speed of 33 MHz; most I/O devices are much slower. Many processors have over 300 MHz clock speeds ).
When the native machine "word" is big enough to hold most integers, then they can be read from RAM all in one gulp, and multi-word arithmetic becomes rare.

Let's say I have the string "abcd" at memory location 0x1234_5678. If I probe the physical voltages on these lines, I see

With lsb-lsb, I see on my address/data lines

   MSb of address ...... LSb of address
address phase: 12 34 56 78
data phase:    64 63 62 61

With lsb address -- MSB data, I see on my address/data lines

   MSb of address ...... LSb of address
address phase: 12 34 56 78
data phase:    86 46 c6 26

Is this right ? for some reason I was expecting to see one or the other have
data phase:    61 62 63 64
.

Anyway, while things seem to be transparent to *strings*, it seems that if you store *addresses* in RAM and then read them back one byte at a time (or misaligned), ... the software might be able to tell the difference. I think. (?)

What does Dr. Philip Emeagwali mean by "left-handed algorithms" when he says http://inventors.miningco.com/library/weekly/aa111097.htm

Computers that are commercially available are symmetric or non-handed but it is possible that some existing software and algorithms are left- or right-handed. I have demonstrated that you can apply a righted-handed algorithm and software to a right-handed computer. ... a left-handed computer must have left-handed software and algorithms that also complements it. Therefore, efforts to implement a left-handed software on a right-handed computer may be as awkward as putting your left shoe on your right leg.

Date:  Thu, 12 Nov 1998 08:56:58 +0100
From:  Walter Scherer  Add to Address Book
Subject:  Re: uProcessor Busses, Current, Past and Future
Organization:  Walter Scherer
To:  d_cary@my-dejanews.com

Hello!

d_cary@my-dejanews.com wrote:
> I collect endian-related information at
>   http://www.rdrop.com/~cary/html/endian_faq.html

Nice FAQ.

> If you find other related information, please tell me about it.

I was building a PC clone mainboard with MC68040s once -
a strange beast called the Freak40. It was a hobby project.
A big endian MC68340 was handling the little endian ISA
bus. Turns out I was doing the right thing - swapped the
two byte lanes to get the addresses right. It was indeed
possible to read the copyright string out of the onboard
ROM of a VGA and it was readable without NUXI effects.

Tschau
------
Walter

According to the article "ASCII, BAUDOT AND THE RADIO AMATEUR" by George W. Henry, Jr. K9GWT http://www.halcomm.com/ascii.htm , The Baudot TTY Code (very similar to the ASCII serial asynchronous TTY code) transmits each character as a start pulse (a 0 bit, space, current off), 5 data bits, then a stop pulse (a 1 bit, mark, current on, the rest configuration in which the line remains until the next start pulse). These 5 bits are sent least-significant-bit first. "the five-unit "baudot" code was arranged by Murray so that the most frequently used letters are representated by the least number of mark holes punched in paper tape". The ASCII code is very similar (least-significant bit transmitted first, etc.) except with 7 or 8 data bits, and the letters and numbers arranged in collating order; the RS232-C standard specifies Space in positive voltages and Mark as negative voltages, zero voltage, or open circuit.
XDR handles not only endian issues, but also arbitrarily complex structures.
- http://www.medusa.uni-bremen.de/intern/orpc/node3.html "XDR is a protocol that defines how to exchange data in a platform-independent manner. "
- http://www.acc.umu.se/~balp/rpcunix/xdr_library_primitives.HTML
- XDR Technical Notes http://www.acc.umu.se/~balp/rpcunix/xdrtechnotes.HTML "This chapter contains technical notes on Sun's implementation of the External Data Representation (XDR) standard, a set of library routines that allow a C programmer to describe arbitrary data structures in a machine-independent fashion."
To properly-written software, it shouldn't matter if it is compiled and run on a big-endian machine or compiled and run on a little-endian machine. Here's a short bit of code where it *does* matter http://www.iro.umontreal.ca/~pigeon/pub/indien.c .

(posted to Newsgroups: gnu.gcc.help,gnu.g++.help,comp.unix.programmer,comp.unix.solaris,comp.lang.c )

From: "Paul Lutus" 
Subject: Re: HELP: struct data members alignment???
Date: 07 Jun 1999 00:00:00 GMT

About this subject, and in seemingly endless threads, we get two kinds of
posts:

1. Why can't I just write raw structure data to a file or network socket?
Writing it as text or packaging it in a portable, binary form is a lot of
work, and I probably won't be moving the program to a different platform
anyway.

2. How come my program on platform B cannot read the raw structure data from
platform A?

The answer to (1) is (2). The answer to (2) is (1).

--

Paul Lutus
www.arachnoid.com

In a PNG-formatted file, "All integers that require more than one byte must be in network byte order: the most significant byte comes first, then the less significant bytes in descending order of significance (MSB LSB for two-byte integers, B3 B2 B1 B0 for four-byte integers). The highest bit (value 128) of a byte is numbered bit 7; the lowest bit (value 1) is numbered bit 0. Values are unsigned unless otherwise noted. Values explicitly noted as signed are represented in two's complement notation." http://www.w3.org/TR/REC-png#DR.Integers-and-byte-order "It has been asked why PNG uses network byte order. We have selected one byte ordering and used it consistently. Which order in particular is of little relevance,..." http://www.w3.org/TR/REC-png#R.Byte-order This is "Big-Endian".
Big Endian and Little Endian http://www.efg2.com/Lab/OtherProjects/Endian.htm and how to convert between the 2 in the Delphi programming language.

[FIXME: summarize this]

Mailing-List: contact f-cpu-owner@egroups.com
Precedence: list
X-URL: http://www.egroups.com/list/f-cpu/
X-Mailing-List: f-cpu@egroups.com
Delivered-To: listsaver-egroups-f-cpu@egroups.com
X-Lotus-FromDomain: PRS GMBH
From: "Ekkehard Morgenstern" 
To: f-cpu@egroups.com
Date: Tue, 17 Nov 1998 14:29:02 +0100
Mime-Version: 1.0
Subject: [f-cpu] Re: Big endian v. little endian

Hi Dave,

you wrote:

> I collect on Big endian vs. little endian information at
>  http://www.rdrop.com/~cary/html/endian_faq.html.

I have not fully read it yet, but my first (subjective) impression was
that the illustrations/figures on your page are quite confusing. :)

> So far as I can tell, neither one is significantly better than the other.

Both have advantages and disadvantages.

>From the programmers point of view, MSB..LSB is better, because
the data is readable in a hex dump.

Basically, the memory layout for both architectures is as follows:

     MSB = Most  Significant Byte (carrying the 'higher' value of a number)
     LSB = Least Significant Byte (carrying the 'lower'  value of a number)

     so, in a hexadecimal number 0x12345678, 0x12 would be the MSB,
     while 0x78 would be the LSB.

     There are two common models for byte-addressed processors,
     where in one model, numbers are stored in LSB..MSB order (Intel x86
etc),
     and another model, where numbers are stored from MSB..LSB (Motorola
68K,PPC etc.);

     So, in memory it looks like this (for our example 0x12345678):

     data a:        78       56       34       12 (Intel x86 etc.)
     data b:        12       34       56       78 (Motorola 68K, PPC etc.)
     address: 00003000 00003001 00003002 00003003

     Since, in model 'a', the number 'ends' on its most significant byte
(=0x12),
     the model is called 'big endian'.
     Model 'b' is called 'little endian' because the number ends on its
     least significant byte (=0x78).

I don't like to use the terms 'little endian' and 'big endian' because
they are likely to be confused. I prefer the notation MSB..LSB and
LSB..MSB, resp.

The LSB..MSB order generally can make debugging very difficult, because
looking at the memory can't give you much hints about the use of the data.
MSB..LSB however, stores the data readable in memory, so you can easily
see where 32 bit data or 16 bit data is being used.

LSB..MSB order is an artefact from the 8 bit era, where 16 bit was
an exception. The MSB was then fetched during the next cycle.
For compatibility reasons, Intel continued with this model in its x86
series till today. When dealing with different operand sizes it can
be actually an advantage, because you don't have to read the whole
word of data. However, on most modern processors, it does not make
a difference, whether you read a byte or a whole 32 bit word, because
the memory load/store interfaces already read whole words, whether
you use the remaining bits or not.

> > Douglas Seay @capgemini.fr mentioned
> > Well, not always. I had the habbit of doubling integers with
> > var <<= 1;
> >
> > which went south on a little endian machine.

This is wrong; it depends on the internal architecture of a
shift register in which direction the bits are shifted,
not on the 'endianness' of the processor.

Besides, the C programming language clearly defines
that a left shift by 1 is the same as a multiplication by 2.

In BCPL this was not so. On mainframes, there's generally
a greater variety of architectures; since in BCPL, the
actual behaviour of a shift was not defined, it was more
likely that code using shifts would break.

> I (and others)
> will need to have complete understanding of these low-level issues
> in order to make a functional CPU.

Sure. Processor design is always low-level, _very_ low level.
BTW, are there any hardware designers out there? I suspect we'll
never manage to build a working CPU if we have discussions over
basic principles of processors.

> Rather than choosing Big endian or little endian, there is a 3rd option.
> Some machines *only* access memory in word-aligned units of one full
word.
> This makes the memory interface a lot simpler.

Yes, word-addressed machines are a lot simpler to implement,
both in hardware and software.

Greetings,
Ekkehard.
---
Ekkehard Morgenstern
e-mail office: mailto:emorgens@prs-gmbh.de
e-mail home: mailto:flnca@csi.com
homepage: http://flnca.homepage.nu

The statements made in this mail represent my own personal opinion,
not the one of my employer.

------------------------------------------------------------------------
Free Web-based e-mail groups -- http://www.eGroups.com

To: f-cpu@egroups.com
From: David Cary 
Subject: Big endian v. little endian

I collect on Big endian vs. little endian information at http://www.rdrop.com/~cary/html/endian_faq.html .

So far as I can tell, neither one is significantly better than the other. I've seen a couple of messages in the f-cpu archive where someone seemed to think one or the other was far superior. I would appreciate being informed about any real benefits one may have over the other.

Douglas Seay @capgemini.fr mentioned

>Well, not always. I had the habbit of doubling integers with
>var <<= 1;
>
>which went south on a little endian machine.

I don't understand. I thought this will double the integer on either big-endian or little-endian machines. Please explain exactly what went wrong -- I (and others) will need to have complete understanding of these low-level issues in order to make a functional CPU.

Finally, a chance to choose a endianness based on technical rather than political reasons :-).

Rather than choosing Big endian or little endian, there is a 3rd option. Some machines *only* access memory in word-aligned units of one full word. This makes the memory interface a lot simpler.

Mailing-List: contact f-cpu-owner@egroups.com
Precedence: list
X-URL: http://www.egroups.com/list/f-cpu/
X-Mailing-List: f-cpu@egroups.com
Delivered-To: listsaver-egroups-f-cpu@egroups.com
X-Sender: cary@agora.rdrop.com
Mime-Version: 1.0
Date: Wed, 18 Nov 1998 03:39:46 -0500
To: f-cpu@egroups.com
From: David Cary 
Subject: [f-cpu] Re: Big endian v. little endian

It seems likely to me that the Freedom CPU will have a multiplexed
address/data bus (rather than a 64 bit address bus and a independent 64 bit
data bus). Will the lsb address wire be the same as the lsb data wire or
will the lsb address wire be the same as the MSb data wire ? Is this
difference visible to software ?

Definitions of lsb and MSb:
The least significant bit (lsb) of the data bus is the bit that always
changes when you increment a integer.
Toggling the least significant bit (lsb) of a address generally makes it
point to a different register of the same destination device. Toggling the
Most Significant bit (MSb) and bits close to it
invariably makes a address point to a different "page" on a RAM, and
usually a completely different device. Reading anywhere (forwards or
backwards) from the same (typically 1 Kword ?) "page" of RAM is fast; there
is a fixed overhead for switching to an other page of RAM.

OK, here's my summary on the Big endian v. little endian debate so far:

Humans are confused by endian issues. Ekkehard Morgenstern correctly
understands how x86 machines and 68K machines represent integers, but his
definitions of the terms "big endian" and "small endian" are exactly the
*opposite* of the way Danny Cohen defined them in his classic 1980 paper (
  http://www.rdrop.com/~cary/html/endian_faq.html
).

Pros for MSb at lowest address:
  * For people used to reading English text, it's slightly easier to
understand dump files. (Ekkehard Morgenstern pointed this out). For
example, with a 68K sort of machine.
  address: 98 99 9a 9b 9c 9d 9e 9f
  data   : 74 65 73 74 00 00 00 05
is the string "test" followed by the integer 5.
(The x86 sort of machine would interpret the last part
as the integer 0x0500_0000).

Pros for lsb at lowest address:
  * For people used to reading Hebrew text, it's slightly easier to
understand dump files.
  * We want this chip to interface with the PCI bus (Olaf Naumann reminded
me of this), and the PCI bus is inherently lsB at lowest address (Is this
true ?).
The guy coding the Freedom CPU Simulator claims that multi-word integer
math is easer this way. (How long before a pre-alpha version of this code
is online ?) (But then, how often are we really going to need integers so
big they won't fit in a single 64 bit word ? We *never* need *addresses* so
big they won't fit into a single 64 bit word.)

Pros for Neither Endian:
  * Reading and writing only whole, aligned words results in much simpler
hardware and software. (Ekkehard Morgenstern pointed this out)

Items that make endian somewhat irrelevant:
  * RAM doesn't care which endian you make it.
  * Other devices are so slow that we have more than 10 CPU clock cycles
after writing one word to it before it's ready to accept the next word.
While we're waiting, we have plenty of clock cycles available to shuffle
around the next word to whatever endian is needed by that device. (
Standard PCI now has top speed of 33 MHz; most I/O devices are much slower.
Many processors have over 300 MHz clock speeds ).

--
+ David Cary "mailto:d.cary@ieee.org" "http://www.rdrop.com/~cary/"
| "icbmto:N36 08.830' W97 03.443'"
| Future Tech, Unknowns, machine vision, <*> O-

------------------------------------------------------------------------
Free Web-based e-mail groups -- http://www.eGroups.com

Mailing-List: contact f-cpu-owner@egroups.com
Precedence: list
X-URL: http://www.egroups.com/list/f-cpu/
X-Mailing-List: f-cpu@egroups.com
Delivered-To: listsaver-egroups-f-cpu@egroups.com
From: AlphaRISC@aol.com
Date: Wed, 18 Nov 1998 10:21:34 EST
To: f-cpu@egroups.com
Mime-Version: 1.0
Subject: [f-cpu] Re: Big endian v. little endian
Status: U

In a message dated 11/18/98 4:22:19 AM Central Standard Time, d.cary@ieee.org
writes:

> Items that make endian somewhat irrelevant:
>    * RAM doesn't care which endian you make it.
>    * Other devices are so slow that we have more than 10 CPU clock cycles
>  after writing one word to it before it's ready to accept the next word.
>  While we're waiting, we have plenty of clock cycles available to shuffle
>  around the next word to whatever endian is needed by that device. (
>  Standard PCI now has top speed of 33 MHz; most I/O devices are much slower.
>  Many processors have over 300 MHz clock speeds ).

And I'll be using those extra clocks to execute other threads, not waiting on
a stalled bus.

In a message dated 11/18/98 7:58:52 AM Central Standard Time,
Wolfgang.Jung@micromata.de writes:

about selectable endians:

> Cons:
>    - CPU design follows the laziness of the programmers.
>    - More transistors in the MMU interface.
>    - 1 bit more in the TLB (could also affect max. cacheable area)

It's no big deal. When I release the code the first thing I expect is that
soembody will hack this into it.

ALPHA
------------------------------------------------------------------------
Free Web-based e-mail groups -- http://www.eGroups.com

Started 1998-04-21

Send comments, suggestions, bug reports to

David Cary
d.cary@ieee.org.

Return to index

end http://www.rdrop.com/~cary/html/endian_faq.html