Review of Processor Architecture

The Intel 8085 Processor (1976)

Review of Processor Architecture

by: burt rosenberg
at: university of miami
update: 11 sept 2019
4 sept 2020
4 sept 2021

Overview

On the bicentennial year of America's founding, Intel introduced the first fully functional computer on a chip, the Intel 8085. The current, powerful, Intel processors are direct descendants of this 8-bit microprocessor.

It is called an 8-bit microprocessor because its normal data paths were the 8 bit byte. However, the address lines were normally two bytes, 16-bits, so 65536 bytes (or 64 KiB) of memory was supported.

The 8085 was made possible by the increasing density of components etched and deposited into a single wafer of silicon in a photographically based process called VLSI, Very Large Scale Integration. Ever increasing numbers of transistors could be printed onto a single wafer. To create the 8085 6,500 transistors were required. The Intel i5, the CPU in the computer I was using when I started writing this page, has two billion transistors.

Intel 8085 Microarchitecture

8085 Microarchitecture

Because of its simplicity, the 8085 is a good place to start a review of processor architecture. The major deficiencies of the 8085 are,

it supports only direct or indirect addressing,
it does not support virtual memory,
it does not support privileged operating modes, and
has a simplified interrupt scheme.

However, otherwise it has all the most important elements still current in today's Intel chips, as show in the diagram to the right.

The General Purpose Registers (B, C, D, E, H, L): Six 8 bit registers, the source or destination of data. They can be used in pairs BC, DE and HL as 16 bit registers.
The addressing modes supported are,
- Immediate: loading into a register the data in the instruction
- Direct: loading and storing a register at the address given in the instruction
- Register: moving data from register to register
- Register Indirect: loading or storing a register at memory adress in register HL
The Stack Pointer (SP): Holds the 16 bit address of the lowest byte in a region of memory arranged as a stack. A push first decrements the SP by two bytes, then stores the contents of register pair R at that address.
```
        *(--SP) = R 
```
A pop increments the SP by two bytes, then loads the contents of that address into the register pair R.
```
        R = *(SP++) 
```
The Program Counter (PC): The more modern name is the Instruction pointer (IP). Always contains the 16 bit address in memory of the next instruction to be fetched and executed.
The Flag Register: a register which each bit has a meaning. This communicates results between instructions, such as the zero flag, indicating the previous executed instruction yielded zero.
The Interrupt Control: The 8085 supported vectored interrupts, however in a manner very simple compared to interrupt handling in today's microarchitectures.
In response to an electrical signal on the interrupt control lines, in collaboration with support chips, the next instruction at the PC is ignored and the instruction RST(n) is pushed into the data stream. The RST(n) is a call to memory location 8*n, for n equal to 0 through 7. As it is a call, the PC is pushed onto the stack,
```
        *(--SP) = PC
        PC = 0  // (or 8, 16, ... , 56 depending on n)
```
Typically the target of the RST is a jump instruction (JMP) to code handling the interrupt at level n. To later resume the interrupted instruction flow, the PC is popped off the stack, and any interrupt mask bits are cleared.
```
       PC = *(SP++)
```

Subroutine calls on the 8085

        |        |
        +--------+
        |  b hi  |
        +--------+  assume sizeof(int)==2
        |  b lo  |
        +--------+
        |  a hi  |
        +--------+  assume sizeof(int)==2
        |  a lo  |
        +--------+  -+
        |        |   |
        |        |   |
        +-   s  -+    > sizeof(struct S) 
        |        |   |
        |        |   |
        +--------+  -+
        |  PC hi |
        +--------+  return address
 SP --> |  PC lo |
        +--------+
        |        |

The stack after the call to f().

Stacks are used extensively for subroutine calls. Let us see how the 8085 would handle a call to this function:

    struct S f(int a, int b) ;

    int main(int argc, char * argv[]) {
        struct S s ;
        int i, j ;
        s = f( i, j ) ;
        return 0 ;
    }

The convention by which subroutines pass arguments is called the Application Binary Interface (ABI). Aspects of the ABI are arbitrary, but all code must agree on the ABI to interoperate

The ABI described here will use the stack to pass all arguments and the return value. The arguments are pushed rightmost argument first, then the stack pointer is descended by the storage size of the return value, in this example, the number of bytes of struct S.

This ABI requires that the caller clean up the stack. When f returns, the old PC value is at the top of the stack, and is popped off into the PC register. The return value is conveniently on the top of stack, and is copied into s. The stack is then moved up by the total number of bytes of the parameters and return value,

    SP += sizeof(struct S) + 2 * sizeof(int)

A note about ABI's.

The compiler knows the default ABI for its platform, and compiles code consistent the ABI's requirements. However, the calling conventions used in the kernel are not the same as is used for user programs. We, when writing kernel code, will need to use a different ABI. C language uses the storage class of a function to notify the compiler of the correct ABI to use, if it is not the default. In the case of the linux kernel, use storage class asmlinkage.

Also note that the return of a struct in a preallocated region on the stack, as if it were the first parameter, is non-standard. In cdecl, the caller would dynamically allocate a temporary struct on the heap, and pass a pointer to the struct as a hidden first parameter.

Modern Microarchitectures

Modern microarchitectures support a privileged mode kernel, virtual memory, and a more expansive protocol for interrupts. To obtain their goals, each of these three features leans on the others in interesting ways.

The virtual memory that never was

The following is an explanation of a fictitious MMU the 8085 never had. It is inspired however by the Page Size Extension method of the Intel Pentium Pro (schematic above).

A special MMU register is loaded with a physical address called the page table base B. A 16 bit virtual address V is separated into the top 7 bits of page number P and the lower 9 bits of offset O, such that V = P * 2⁹ + O.

The MMU fetches the one byte page table entry E from the page table at physical address B+P. If the least significant bit of E is zero, the entry is not valid, and the MMU interrupts the processor. Else, the 7 top bits of E are copied to the 7 top bits of V, producing the mapped physical address.

When changing between processes or entering or leaving the kernel, reloading the page table base in the MMU installs the appropriate page table.

Protection of the kernel is achieved through the processor preventing any direct hardware access unless in privileged mode, and only the kernel runs in privileged mode. The kernel can prepare any hardware access requested by the user program while it runs in privileged mode, and on return to the user program will drop privilege mode so the user program cannot circumvent the kernel's preparations. There are three forms of hardware resource to protect: those provided by the CPU, such as the manipulation of the privilege mode itself, those provide by physical memory, and then hardware other than CPU such as storage devices (hard disk of SSD).

Privileged mode protects CPU resources by blocking certain instructions from executing unless the processor is in privileged mode. Memory access is protected by declaring the addresses used by software to be addresses to virtual memory that will need to be mapped to physical memory in order to actually read or write at physical memory. Processors now include a Memory Management Unit (MMU) that implements this mapping. The MMU is a CPU resource and so can only be manipulated during privileged mode. The virtual to physical mapping require access to data structures which are themselves stored in physical memory. These are called the page tables. There are distinct page tables for each process running, and a unique, distinct page table for the kernel.

Software speaks only virtual addresses. Even the kernel, to access physical memory, must place the physical address in a page table, associated with a particular virtual address, and then do the fetch or store at that virtual address. On every memory access the MMU intervenes to translate virtual to physical, and places the physical address on the bus shared with the memory. By the combination of restricting MMU instructions to the kernel running in privileged mode, and the kernel limiting what is in the page table of a non-privileged user, the kernel can maintain complete control over what memory can be accessed by a non-privileged user.

Interrupts:

Interrupts and Exceptions

The interrupt mechanism was reused in the creation of the trap, in order to enter into privileged mode safely. A trap is a kind of exception that requests, as does an interrupt, that normal instruction execution be suspended, and kernel mode entered in order to handle something exceptional.

Interrupt comes in asynchronously from the world.
Exceptions are a consequence of instruction execution.

Because exceptions are caused by instruction execution, they fall into three classes based on how to recover from the exception,

The fault. The reason for the instruction faulting can be remedied. The kernel should fix the problem and return to retry the faulting instruction.
The trap. The instruction caused the trap as a software interrupt. The kernel should handle the request and return to the instruction following the trapping instruction.
The abort. Bad things happened. There is no reasonable way for the thread to continue executing.

While in privileged mode, an instruction is allowed that will drop the privilege. Entering privileged mode must be completely protected and under the entire control of the kernel.

Interrupts to service hardware must enter privileged mode, to carry out what is required by hardware. If it is an interrupt from the disk, it must be able to issues instructions to the disk to handle the interrupt. These service routines are the drivers, and are considered part of the kernel, so it is correct they share with the kernel the privilege mode. Therefore, among the processors responses to an interrupt, it will obtain privilege mode coincidentally with calling the service routine.

When a user program requires a privileged mode service, it mimics the interrupt by issuing a software-initiated interrupt, called a trap. As does the interrupt, the trap will force a call into the kernel, while at the same time obtaining privileged mode.

The response to a trap or an interrupt is completely under control of the kernel. Before enabling interrupts or starting a user mode process, the kernel places the addresses of interrupt and trap service routines in a table. This table is in memory inaccessible to any process but the kernel. The location of the table is installed into the interrupt handling mechanism using a privileged mode instruction. Once privilege mode is dropped, the response to an interrupt or trap cannot be modified. A user mode process is denied the hardware access necessary to manipulate the response.

Kernel stacks:

Linux 32-bit virtual memory

The swap of page tables is quickly done by loading the base (physical) address of the appropriate page table into an MMU register. However, the 32-bit Linux operating system has an additional mechanism when entering and leaving the kernel's protected address space. The 4G virtual memory range of 32-bit operating systems is partitioned into the lower 3G for user, and the upper 1G for the kernel.

Intel provides virtual memory space by segments that describe a range and a permission on the range. When in kernel mode the segment spans the entire 4G range. Moving between kernel and user, the segment is restricted to just the lower 3G. While different page tables map the lower 3G differently, they all map the upper 1G the same — that virtual memory region is that of the kernel.

Therefore, when running in the kernel the operating system can access the current user memory space as well as the kernel space in a single range. The kernel can swap between user spaces without affecting its access to the kernel space. It does however limit the user to only 3G of virtual memory.

The transition to or from privileged mode is accompanied by additional swaps of context. To provide complete protection of the kernel from the user process, interrupts and traps do not push onto the user's stack, nor use the user's page tables. Entry to privilege mode swaps in the kernel's stack and kernel's page tables just as exit from privilege mode restores the user's stack and page tables.

In the case of Intel, there is one interrupt stack per core that is activated when handling an interrupt. Each thread has its own kernel stack that is activated on exception by that thread.

Context switch:

The kernel resident stack, whether the interrupt or thread stack, can be completed to contain all registers of the CPU. In the case of the 8085, this means that the handler of the interrupt or trap will push the SP, BC, DE, HL and flags register. Note that the SP register must be interpreted using the page table of the interrupted or trapped process.

Also, the referents to the page tables, that which is installed into the MMU to activate the virtual-physical address mapping for this process, are pushed.

The contents of the stack can then be copied off in to a Thread Context Block (TCB) that is stored among other such TCB's in a linked list inside the memory space of the kernel. If a different TCP is selected to copy the saved registers into the interrupt or thread stack, the physical thread of the core will return into the process context and specific code execution of that thread — picking off where that computation left off.

This is called a context switch, and pre-emptive multiprocessing can be achieved by attaching such a context switch to the handling of a clock interrupt, say every tenth of a second, for a scheduling time quantum of one tenth of a second.

Exercise

For the virtual memory that never was, show the mapping between physical page frames and the virtual memory space. Keep in mind the following:

The 64K memory space consists of 128 512-byte pages.
Consider the Linux virtual memory partitioning scheme:
1. User processes map virtual addresses only between byte 0 and 48K.
2. Kernel process maps its own code and data to 48K to 64K.
3. The kernel can map its lower virtual addresses equivalent to the current user mapping.
4. The lowest physical page (bytes 0 through 511) must be set aside for the interrupt vector.
Give 16K of physical memory for each user process. Map it to two regions of the process' virtual space.
1. Map 12K of the lowest virtual addresses. This is for code and data.
2. Map 4K for the highest virtual addresses (bytes 44K to 48K). This is for the stack.
Let us limit to 4 user processes, so there are at most 4 pages tables.
1. Each page table is 128 bytes (one entry per page). The entire set of page tables fits in one page.
2. The pagetable page will be in the kernel's virtual memory.
3. Assume that hardware protection is provided to protect the kernel's data and code when the processor is in usermode.
4. What simple mechanism can you divise to provide this protection.

References

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Author: Burton Rosenberg
Created: 10 September 2019
Last Update: 4 September 2020