### "I think there is a world market for

# maybe five computers."

# Thomas Watson, chairman of IBM, 1943

EECS 314 Computer Architecture Instructors: Professor Chris Papachristou & Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow

### **Computer Architecture Trends: Post PC era**



• 100 million processors were sold for desktop computers





• 3 BILLION processors were sold for embedded systems

It's expected that the average car will be Internet ready and have over \$2000 worth of embedded computers CWRU EECS 314 2

# **Digital Pager Architecture**



### Smart Cards: Hardware/Software Co-Design





• There are currently 2.8 billion smart cards in use:

• 575 million phone, 36 million financial, 30 million ID cards, ...

• Smart cards differ from credit cards in using onboard memory chips and microprocessors or micro-controllers instead of magnetic strips.

Each chip can hold 100k times the information contained on a standard magnetic-stripe card.

### **Smart Cards: Computer Architecture**



#### Smart cards have

embedded within them a processor and often a crypto-graphically enhanced co-processor.

#### **Features:**

- Accelerated Software Cryptography
- Java Card
- Windows for Smart Cards
- Code Compression
- Secure Memory Spaces

Ξ∢

## Smart Cards: Hardware/Software Co-Design

#### An example of the software handshaking protocol is shown below



### **Design Abstractions**

• Coordination of many *levels of abstraction* 



**= •**(

7

### "The Megahertz Myth."



Percentage faster than a Pentium system:



Why the clock speed of a computer isn't an accurate way to compare system performance.

Overall system design and processor-architecture differences affect real-world application performance, otherwise you might be fooled by what Jon terms "The Megahertz Myth."

--Jon Rubinstein, Apple Senior VP of Hardwarecs 314 8



## **High Performance: Video Graphic Architectures**

DirectX is a set of components, developed to provide Windows-based programs with high-performance, real-time access to available hardware on current computer systems.





A very important abstraction: Instruction Set Architecture

- interface between hardware and low-level software
- standardizes instructions, machine language bit patterns, ...
- advantage: different implementations of the same architectur
- disadvantage: sometimes prevents using new innovations

#### Modern instruction set architectures: PowerPC, DEC Alpha, MIPS, SPARC, HP, 80x86/Pentium/K6\*

*True or False: Binary compatibility is important?* 

Yes (Microsoft/Intel alliance) No - (Unix, Linux, C++, Java)

Yes - Sales, Marketing No - Speed, Engineers, Programmers CWRU EECS 314 10

#### Design Abstractions: DVD



#### An early DVD version





CWKU LECS JIT 12



CWRU EECS 314 13

#### An early XBOX version



CWKU EECS 314 14

Xec

## Brief history

- ENIAC: 1940, 18000 Vacuum tubes, 0.1 Mhz, 150 Kwatts, <u>first general purpose computer</u>,10000 feet<sup>2</sup>, 90% down time, Plugboard programming (ROM)
- EDSAC: 1949, 3000 Vacuum tubes, 0.5 Mhz, 12 Kwatts, first stored-program computer to operate.
- EDVAC: 1956 operational, 1944 designed, 3600 tubes, 10k diodes, 1 Mhz, crashed every 8 hours.
- UNIVAC 1: 1951, <u>1st commerical computer</u>, 5400 tubes, 18k diodes, 2.25 Mhz, 352 feet<sup>2</sup>. Sales: 43. Cost \$750K.
- IBM 701: 1952, total 19 leased, \$15000 per month.

UNIVAC 1107:1960s, Case Institute of Technology, CWRU, http://www.fourmilab.ch/documents/univac/case1107.htm/RU EECS 314 15

#### **Design Abstractions**



- Declaration (announces the properties of variables)
- Definition (declaration that allocates memory): int r3; /\* declare r3 as a signed integer \*/
- Arithmetic operators: +, -, \*, /, % (mod), &, |, ^
- Assignment statements:

Variable = expression; celsius = 5 \* (fahr - 32) / 9;

• Operands:

Variables: a, b, c, r0, r1, celsius Constants: 0, 1000, -17, 15, 0xf3, 017

Note: we begin at chapter 3 of the text book

### Assembly Operators

C Semantics: register int rd, rs, rt;
 rd = rs <op> rt;

- Syntax: <op> \$rd, \$rs, \$rt
  - **<op>:** Opcode (or operator) by name
  - **\$rd:** Destination, operand getting result
  - \$rs: 1st source operand for operation
  - \$rt: 2nd source operand for operation
- Called an (assembly language) Instruction
- Example

add \$r1,\$r2,\$r3 # C Lanaguage: r1=r2+r3;

**Assembly Operators/Instructions** 

• MIPS Assembly Syntax is rigid: 1 operation, 3 variables Why? Keep Hardware simple via regularity Note: Unlike C each line of assembly contains at most 1 instruction

How do following C statement?
 a = b + c + d - e; /\* a = sum of b, c, d minus d \*/

Break into more primitive instructions add a, b, c # a = sum of b & c add a, a, d # a = sum of b,c,d sub a, a, e # a = b+c+d-e

• # is a comment terminated by end of the line CWRU EECS 314 19

### Compilation

- Example: compile by hand this C code:
  f = (g + h) (i + j);
- First sum of g and h. Where put result? add f,g,h # f contains g+h
- Now sum of i and j. Where put result?
  - -Cannot use f !
  - -Compiler creates temp variable to hold sum: t1

add t1,i,j # t1 contains i+j

Finally produce difference
 sub f,f,t1 # f=(g+h)-(i+j)

### **Compilation Summary**

- C statement (5 operands, 3 operators):
   f = (g + h) (i + j);
- Becomes 3 assembly instructions (6 unique operands, 3 operators):

add f,g,h# f contains g+hadd t1,i,j# t1 contains i+jsub f,f,t1# f=(g+h)-(i+j)

- Big Idea: compiler translates notation from 1 level of abstraction to lower level
- In general, each line of C produces many assembly instructions

-One reason why people program in C vs. Assembly; fewer lines of code

-Other reasons? Portability, Optimization RU EECS 314 21

### **Registers: Performance issue**

• Unlike C++, assembly instructions cannot use variables

Why not? Keep Hardware Simple

- Instruction operands are <u>registers</u>:
  - Limited to 32 registers in MIPS (\$r0 \$r31)
  - Also, each MIPS register is 32 bits wide
  - The width of the register is called the word size
  - C language "int" is the word size of the register

Why 32? Smaller is faster (based on technology)

### **Pentium I: registers example**



Ref: http://www.laynetworks.com/users/webs/cs12\_2.htm

## Compilation using Registers

• Compile by hand using registers:

```
f = (g + h) - (i + j);
# assign registers
# $r5="f", $r1="g", $r2="h"
# $r3="i" $r4="j"
```

• MIPS Instructions:

add \$r6,\$r1,\$r2 add \$r7,\$r3,\$r4 sub \$r5,\$r6,\$r7 # \$r6 = g+h
# \$r7 = i+j
# f=(g+h)-(i+j)

## Arithmetic operators

- /\* default \*/
- /\* explicit \*/
- add \$rd,\$rs,\$rt
- sub \$rd,\$rs,\$rt
- addi \$rt,\$rs,signed16
- subi \$rt,\$rs,signed16

register int r0,r1,...,r31;

register signed int r0,...;

- register unsigned int r0,r1,...,r31;
   addu \$rd,\$rs,\$rt
   subu \$rd,\$rs,\$rt
   addiu \$rt,\$rs,signed16
   rt = rs + 1847;
- What's missing from unsigned?
   subiu? Do we need it?
- subiu \$rt,\$rs,1931? addui \$rt,\$rs,-1931

### **Pseudo instructions: move**

 Suppose we only had add, sub, addi, subi instructions How could we do the following C language operation?

```
register signed int r0=0, r1, r2, r5, r6;
```

r2 = r6;

Note: that some processors (i.e. Mips, Sun Microsystems, sparc) hard code \$r0 to zero.

| addi \$r2,\$r6,0   | # also: subi \$r2,\$r6,0   |
|--------------------|----------------------------|
| # also             |                            |
| add \$r2,\$r6,\$r0 | # also: sub \$r2,\$r6,\$r0 |

Pseudo instructions extend the assembly language by substituting with other machine instructions:
 move \$rd,\$rs
 # addi \$r2,\$r6,0

### **Pseudo instruction: li, load immediate**

 Suppose we only had add, sub, addi, subi instructions How could we do the following C language operation?

```
register signed int r0=0, r1, r2, r5, r6;
```

```
r2 = 1847;
```

addi \$r2,\$r0,1847 # mips \$r0 is always zero

• Pseudo instructions extend the assembly language by substituting with other machine instructions:

li \$rd, signed 16

# addi \$r2,\$r0,signed16

## **bitwise C operators**

- /\* default \*/
- /\* explicit \*/
- and \$rd,\$rs,\$rt
- or \$rd,\$rs,\$rt
- xor \$rd,\$rs,\$rt
- not \$rd,\$rs
- andi \$rt,\$rs,signed16
- ori \$rt,\$rs,signed16
- xori \$rt,\$rs,signed16

register int r0,r1,...,r31;

register signed int r0,...;

```
rd = rs & rt;
```

```
rd = rs | rt;
```

rd = ~rs; /\* pseudo instruction \*/

why is there no "noti"?

- Is the follow correct?
- /\* explicit \*/ register unsigned int r0,r1,...,r31;
- andu \$rd,\$rs,\$rt
   rd = rs & rt;
- Use "and", Bitwise operators are not arithmetic operators! 28

### **Pseudo instructions: clear & not**

• review bitwise operators (looking at 1 bit at a time):

not: 0 = -1; 1 = -0;and: 1 = 1 & 1; 0 = 0 & x; 0 = x & 0; /\* conclusive \*/or: 0 = 0 | 0; 1 = 1 | x; 1 = x | 1; /\* inclusive \*/xor:  $0 = 0 \land 0;$   $0 = 1 \land 1;$   $1 = 0 \land 1;$   $1 = 1 \land 0;$ also called exclusive or, difference, mod 2 add

• How is the "clear \$rd" pseudo-instruction implemented?

clear \$rd # xori \$rd,\$rd,\$rd

• How is the "not rd, rs" pseudo-instruction implemented?

not \$rd,\$rs # xori \$rd,\$rs,0xffff

### **Pseudo instructions: Multiply and Divide**

- mul \$rd, \$rs, \$rt
  - mult \$rs, \$rt
  - mflo \$rd
- mulou \$rd, \$rs, \$rt
  - mult<mark>u</mark> \$rs, \$rt
  - mflo \$rd
- div \$rd, \$rs, \$rt
  - div \$rs, \$rt
  - mflo \$rd
- rem \$rd, \$rs, \$rt

- div \$rs, \$rt

– mfhi \$rd

**#** signed pseudo instruction

# (hi:lo)<sub>64</sub> = 
$$rs_{32} * rt_{32}$$

- # rd = lo;
- **# unsigned** pseudo instruction

# (hi:lo)<sub>64</sub> = 
$$rs_{32} * rt_{32}$$

- **# pseudo instruction**
- # (quotient=lo rem=hi)<sub>64</sub> =  $rs_{32}$  /  $rt_{32}$
- # rd = lo;
- **# pseudo instruction**
- # (quotient=lo rem=hi)<sub>64</sub> =  $rs_{32}$  /  $rt_{32}$
- # rd = hi;

#### Shift instructions: sll, srl, sra

| •          |                                      | register unsigned int r0,r1,,r31;                                                                |
|------------|--------------------------------------|--------------------------------------------------------------------------------------------------|
|            | \$rd,\$rt,const5<br>\$rd,\$rt,const5 | rd = rt << 11; /* rd = rt * 2 <sup>11</sup> */<br>rd = rt >> 21; /* rd = rt / 2 <sup>21</sup> */ |
| •<br>• sra | \$rd,\$rt,const5                     | register int r0,r1,,r31;<br>rd = rt >> 4;                                                        |

#### Do we need shift instruction since we have Mul & div?

No, shift left/right can be done with mul/div instructions example: sra \$r2,\$r2,4

**Performance:** shift instructions are faster than mul/div

C Language allows programmer access to shift via >> << ops CWRU EECS 314 - 31