#### **Review**

• Amdahl's Law:

|              | Execution Time without enhancement E | 1                  |
|--------------|--------------------------------------|--------------------|
| Speedup(E) = | Execution Time with enhancement E    | =<br>(1 - F) + F/S |

CPU Time & CPI:

CPU time = Instruction count x CPI x clock cycle time CPU time = Instruction count x CPI / clock rate

CSCE430/830 ISA

#### **Outline**

- Instruction Set Overview
  - Classifying Instruction Set Architectures (ISAs) ←
  - Memory Addressing
  - Types of Instructions
- MIPS Instruction Set (Topic of next lecture)

## **Instruction Set Architecture (ISA)**

- Serves as an interface between software and hardware.
- Provides a mechanism by which the software tells the hardware what should be done.



CSCE430/830 ISA

### **Interface Design**

#### A good interface:

- Lasts through many implementations (portability, compatability)
- Is used in many different ways (generality)
- Provides convenient functionality to higher levels
- Permits an efficient implementation at lower levels



### **Instruction Set Design Issues**

#### Instruction set design issues include:

- Where are operands stored?
  - » registers, memory, stack, accumulator
- How many explicit operands are there?
  - » 0, 1, 2, or 3
- How is the operand location specified?
  - » register, immediate, indirect, . . .
- What type & size of operands are supported?
  - » byte, int, float, double, string, vector. . .
- What operations are supported?
  - » add, sub, mul, move, compare . . .

CSCE430/830 ISA

#### **Evolution of Instruction Sets**

```
Single Accumulator (EDSAC 1950, Maurice Wilkes)
              Accumulator + Index Registers
           (Manchester Mark I, IBM 700 series 1953)
                Separation of Programming Model
                      from Implementation
 High-level Language Based
                                          Concept of a Family
    (B5000 1963)
                                              (IBM 360 1964)
                General Purpose Register Machines
                                         Load/Store Architecture
Complex Instruction Sets
                                            (CDC 6600, Cray 1 1963-76)
  (Vax, Intel 432 1977-80)
    CISC
                          (MIPS,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)
Intel x86, Pentium
```

## **Classifying ISAs**

Accumulator (before 1960, e.g. 68HC11):

1-address add A  $acc \leftarrow acc + mem[A]$ 

**Stack** (1960s to 1970s):

0-address add  $tos \leftarrow tos + next$ 

**Memory-Memory (1970s to 1980s):** 

2-address add A, B  $mem[A] \leftarrow mem[A] + mem[B]$ 3-address add A, B, C  $mem[A] \leftarrow mem[B] + mem[C]$ 

Register-Memory (1970s to present, e.g. 80x86):

2-address add R1, A R1  $\leftarrow$  R1 + mem[A]

load R1, A R1  $\leftarrow$  mem[A]

Register-Register (Load/Store) (1960s to present, e.g. MIPS):

3-address add R1, R2, R3 R1  $\leftarrow$  R2 + R3

 $\begin{array}{lll} \text{load R1, R2} & \text{R1} \leftarrow \text{mem[R2]} \\ \text{store R1, R2} & \text{mem[R1]} \leftarrow \text{R2} \\ \end{array}$ 

CSCE430/830 ISA

## **Operand Locations in Four ISA Classes**



## Code Sequence C = A + B for Four Instruction Sets

| Register (load-<br>store) | Register (register-memory) | Accumulator | Stack  |
|---------------------------|----------------------------|-------------|--------|
| Load R1,A                 | Load R1, A                 | Load A      | Push A |
| Load R2, B                | Add R1, B                  | Add B       | Push B |
| Add R3, R1, R2            | Store C, R1                | Store C     | Add    |
| Store C, R3               |                            |             | Pop C  |
|                           |                            |             | -      |
| ,                         |                            |             | Pop C  |







**More About General Purpose Registers** 

- Why do almost all new architectures use GPRs?
  - Registers are much faster than memory (even cache)
    - » Register values are available immediately
    - » When memory isn't ready, processor must wait ("stall")
  - Registers are convenient for variable storage
    - » Compiler assigns some variables just to registers
    - More compact code since small fields specify registers (compared to memory addresses)







#### **Stack Architectures**

#### • Instruction set:

```
add, sub, mult, div, . . . push A, pop A
```



ISA

#### Example: A\*B - (A+C\*B)





**Stacks: Pros and Cons** 

#### • Pros

- Good code density (implicit top of stack)
- Low hardware requirements
- Easy to write a simpler compiler for stack architectures

#### Cons

- Stack becomes the bottleneck
- Little ability for parallelism or pipelining
- Data is not always at the top of stack when need, so additional instructions like TOP and SWAP are needed
- Difficult to write an optimizing compiler for stack architectures

## **Accumulator Architectures**

• Instruction set:

add A, sub A, mult A, div A, . . . load A, store A



B B\*C A+B\*C A+B\*C A A\*B result

Example: A\*B - (A+C\*B)

load B mul C add A store D

load A

mul B

sub D

CSCE430/830 ISA

## **Accumulators: Pros and Cons**

#### Pros

- Very low hardware requirements
- Easy to design and understand

#### • Cons

- Accumulator becomes the bottleneck
- Little ability for parallelism or pipelining
- High memory traffic

## **Memory-Memory Architectures**

#### Instruction set:

(3 operands) add A, B, C sub A, B, C mul A, B, C (2 operands) add A, B sub A, B mul A, B

Example: A\*B - (A+C\*B)

- 3 operands
 mul D, A, B
 mul E, C, B
 add E, A, E
 sub E, D, E
 add E, A
 sub E, D

CSCE430/830 ISA

## Memory-Memory: Pros and Cons

#### Pros

- Requires fewer instructions (especially if 3 operands)
- Easy to write compilers for (especially if 3 operands)

#### Cons

- Very high memory traffic (especially if 3 operands)
- Variable number of clocks per instruction
- With two operands, more data movements are required

## **Register-Memory Architectures**

#### • Instruction set:

add R1, A sub R1, A mul R1, B load R1, A store R1, A



## • Example: A\*B - (A+C\*B)

|            |                | R1 = R1 +,-,*,/ mem[B] |
|------------|----------------|------------------------|
| <b>/</b> * | A*B            | */                     |
|            |                |                        |
|            |                |                        |
| <b>/</b> * | C*B            | */                     |
| <b>/</b> * | A + CB         | */                     |
| <b>/</b> * | AB - (A + C*B) | */                     |
|            | /*<br>/*       | /* C*B<br>/* A + CB    |

CSCE430/830 ISA

# **Memory-Register: Pros and Cons**

#### Pros

- Some data can be accessed without loading first
- Instruction format easy to encode
- Good code density

#### Cons

- Operands are not equivalent (poor orthogonal)
- Variable number of clocks per instruction
- May limit number of registers

## **Load-Store Architectures**

#### • Instruction set:

add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 load R1, &A store R1, &A move R1, R2



## • Example: A\*B - (A+C\*B)

```
load R1, &A
load R2, &B
load R3, &C
mul R7, R3, R2
                                  C*B
                                                 */
                                  A + C*B
                          /*
add R8, R7, R1
                                                 */
                          /*
                                  A*B
mul R9, R1, R2
                                  A*B - (A+C*B) */
                          /*
sub R10, R9, R8
```

CSCE430/830 ISA

# Load-Store: Pros and Cons

#### Pros

- Simple, fixed length instruction encodings
- Instructions take similar number of cycles
- Relatively easy to pipeline and make superscalar

#### • Cons

- Higher instruction count
- Not all instructions need three operands
- Dependent on good compiler

## **Registers: Advantages and Disadvantages**

#### Advantages

- Faster than cache or main memory (no addressing mode or tags)
- Deterministic (no misses)
- Can replicate (multiple read ports)
- Short identifier (typically 3 to 8 bits)
- Reduce memory traffic

#### Disadvantages

- Need to save and restore on procedure calls and context switch
- Can't take the address of a register (for pointers)
- Fixed size (can't store strings or structures efficiently)
- Compiler must manage
- Limited number

Every ISA designed after 1980 uses a load-store ISA (i.e. RISC, to simplify CPU design).

ISA

**Word-Oriented Memory Organization** 

- Memory is byte addressed and provides access for bytes (8 bits), half words (16 bits), words (32 bits), and double words(64 bits).
- **Addresses Specify Byte** Locations
  - Address of first byte in word
  - Addresses of successive words differ by 4 (32-bit) or 8 (64-bit)

| 32-bit<br>Words   | 64-bit<br>Words | Bytes | Addr. |
|-------------------|-----------------|-------|-------|
|                   |                 |       | 0000  |
| Addr<br>=         |                 |       | 0001  |
| 0000              |                 |       | 0002  |
|                   | Addr<br>=       |       | 0003  |
| ا ا               | 0000            |       | 0004  |
| Addr<br>=         |                 |       | 0005  |
| 0004              |                 |       | 0006  |
|                   |                 |       | 0007  |
| ا ا               |                 |       | 0008  |
| Addr<br>=         |                 |       | 0009  |
| 0008              | Addr            |       | 0010  |
|                   | =               |       | 0011  |
|                   | 0008            |       | 0012  |
| Addr<br>=<br>0012 |                 |       | 0013  |
|                   |                 |       | 0014  |
|                   |                 |       | 0015  |

ISA CSCE430/830

## **Byte Ordering**

- How should bytes within multi-byte word be ordered in memory?
- Conventions
  - Sun's, Mac's are "Big Endian" machines
    - » Least significant byte has highest address
  - Alphas, PC's are "Little Endian" machines
    - » Least significant byte has lowest address

CSCE430/830 ISA

## **Byte Ordering Example**

- Big Endian
  - Least significant byte has highest address
- Little Endian
  - Least significant byte has lowest address
- Example
  - Variable x has 4-byte representation 0x01234567
  - Address given by &x is 0x100

| Big Endiar   | 1  | 0x100 | 0x101 | 0x102 | 0x103 |  |
|--------------|----|-------|-------|-------|-------|--|
|              |    | 01    | 23    | 45    | 67    |  |
| Little Endia | an | 0x100 | 0x101 | 0x102 | 0x103 |  |
|              |    | 67    | 45    | 23    | 01    |  |

## **Reading Byte-Reversed Listings**

- Disassembly
  - Text representation of binary machine code
  - Generated by program that reads the machine code
- Example Fragment

| Address  | Instruction Code     | Assembly Rendition    |
|----------|----------------------|-----------------------|
| 8048365: | 5b                   | pop %ebx              |
| 8048366: | 81 c3 ab 12 00 00    | add \$0x12ab,%ebx     |
| 804836c: | 83 bb 28 00 00 00 00 | cmpl \$0x0,0x28(%ebx) |

#### Deciphering Numbers

- Value: 0x12ab
- Pad to 4 bytes: 0x000012ab
- Split into bytes: 00 00 12 ab

- Reverse: ab 12 00 00

CSCE430/830 ISA

## **Types of Addressing Modes (VAX)**

| <b>Addressing Mode</b> | Example             | Action                        |
|------------------------|---------------------|-------------------------------|
| 1. Register direct     | Add R4, R3          | R4 <- R4 + R3                 |
| 2. Immediate           | Add R4, #3          | R4 <- R4 + 3                  |
| 3. Displacement        | Add R4, 100(R1)     | R4 <- R4 + M[100 + R1]        |
| 4. Register indirect   | Add R4, (R1)        | R4 < - R4 + M[R1]             |
| 5. Indexed             | Add R4, (R1 + R2)   | R4 < -R4 + M[R1 + R2]         |
| 6. Direct              | Add R4, (1000)      | R4 <- R4 + M[1000]            |
| 7. Memory Indirect     | Add R4, @(R3)       | $R4 \leftarrow R4 + M[M[R3]]$ |
| 8. Autoincrement       | Add R4, (R2)+       | R4 < -R4 + M[R2]              |
|                        |                     | R2 <- R2 + d                  |
| 9. Autodecrement       | Add R4, (R2)-       | R4 < -R4 + M[R2]              |
|                        |                     | R2 <- R2 - d                  |
| 10. Scaled             | Add R4, 100(R2)[R3] | R4 <- R4 +                    |
|                        |                     | M[100 + R2 + R3*d]            |

• Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX.

## **Types of Operations**

• Arithmetic and Logic: AND, ADD

Data Transfer: MOVE, LOAD, STORE

Control BRANCH, JUMP, CALL

• System OS CALL, VM

Floating Point ADDF, MULF, DIVF

• Decimal ADDD, CONVERT

• String MOVE, COMPARE

• Graphics (DE)COMPRESS

CSCE430/830 ISA

### **80x86 Instruction Frequency**

| Rank  | Instruction   | Frequency |
|-------|---------------|-----------|
| 1     | load          | 22%       |
| 2     | branch        | 20%       |
| 3     | compare       | 16%       |
| 4     | store         | 12%       |
| 5     | add           | 8%        |
| 6     | and           | <b>6%</b> |
| 7     | sub           | <b>5%</b> |
| 8     | register move | 4%        |
| 9     | call          | 1%        |
| 10    | return        | 1%        |
| Total |               | 96%       |

## Relative Frequency of Control Instructions

| Operation   | SPECint92 | SPECfp92 |
|-------------|-----------|----------|
| Call/Return | 13%       | 11%      |
| Jumps       | 6%        | 4%       |
| Branches    | 81%       | 87%      |

 Design hardware to handle branches quickly, since these occur most frequently

CSCE430/830 ISA

## **Summery**

- Instruction Set Overview
  - Classifying Instruction Set Architectures (ISAs)
  - Memory Addressing
  - Types of Instructions
- MIPS Instruction Set (Topic of next class) ←
  - Overview
  - Registers and Memory
  - Instructions