System Calls and Traps


Overview

System calls are accomplished by moving parameters to registers and then calling int 2e to trap into the kernel. Exceptions and interrupts, whether arising from external events, internal faults, or software generated using the int instruction, are vectored throught Interrupt Descriptor Table, the IDT. This table is located according to the contents of the processors IDT register, the IDTR, and contains 256 entries. IDT entries are interrupt gates, trap gates or task gates.

Intel architecture accepts two forms of forced flow-control branching: the interrupt and the exception. Exceptions are classified as traps, faults or aborts. Traps are restarted at the address following the address causing the trap, faults are restarted at the address of the fault, and aborts give no reliable restart address.

The usual distinction between interrupts and exceptions is that interrupts are asynchronous whereas exceptions are synchronous. External interrupts are clearly asynchronous. These interrupts are maskable by the value of the Interrupt Flag in the EFLAGS register. Exceptions are not masked by this flag. Therefore, and exception handling might cause an other exception: called a double-fault. Although an int 2e is often called a software interrupt, it seems to me to be a trap: it is immune to the setting of the IF flag and restarts at the address following the instruction address.

Whatever the source, all interrupts and exceptions vector through the same IDT based on index. If the interrupt or exception passes through an interrupt gate then the IF flag is automatically cleared. Passing through an exception gate does not change this flag. Looking below, you can see that NT's IDT uses only interrupt and task gates.

An interrupt or task gate references a new Code Segment Selector and an offset into the segment as a target address. If the protection ring of the new code segment is different from that of the current code segment, a stack switch occurs, else no stack switch occurs. The Task-State Segment (TSS) of the Intel architecture assigns to each of the higher 3 protection rings (0, 1 and 2) a Segment Selector and offset (pointer) within the segment. If a stack switch needs to occur to a higher privilege level, the new stack information is taken from the TSS.

An int 2e will switch from ring 3, user mode, to ring 0, kernel mode. Hence there will be stack switch. When an interrupt or trap gate causes a stack switch, the old stack selector and stack pointer will be pushed on the new stack before pushing the EFLAGS register, the old code segment selector and the old instruction pointer. To return from an interrupt or exception, the IRET instruction is issued. The processor can detect a change in privilege ring on return by virtue of the privilege associated with the popped code segment selector. If a level is changed, the process will continue popping the stack beyond the EFLAGS to find and recover the interrupted thread's stack segment selector and pointer.

Certain exceptions also push an Error Code under the instruction pointer. Whether this happens, and it's meaning, depend on the exception. NT pads exceptions that are without this automatic push so that all exceptions have a common stack layout.

Interrupt, Trap and Task Gates

Interrup, Trap and Task Gates are members of Intel's family of Segment Descriptors, 64 bit structures which and share a common format: [ available:32 ] [ available:16 | STUFF:8 | available:8 ] where STUFF is: [ P:1 | DPL:2 | S:1 | TYPE:4 ] Here we have shown the Little Endian Layout as appears in Intel NT. The Present Flag, P, must be 0 for a valid Descriptor. There are system descriptors (S==0), including Interrupt, Trap and Task Gates, or data descriptors (S==1). System Descriptors are further discrminated by the value of the TYPE field. DPL is the Descriptor Privlege Level, regulating access to the descriptor.

The format of the system descriptors is a refinement of the generic format:

[ seg. sel:16 | offset low:16 ] [ offset high:16 | STUFF:8 | 0x00:8 ] where the segment selector and offset combine to give the target address for the call. The segment selector is an index into the GDT or LDT where the segment descriptor (type S==1) can be found for the new code segment. The offset is the address within this segment of the interrupt or exception handler.

Of the various system segment descriptors, only interrupt, trap and task gates are allowed in the IDT. The TYPE values for these are 0101 for a task gate, D110 for a interrupt gate, D111 for a trap gate, where D is 1 for 32 bit gate and 0 for a 16 bit gate.

NT traps are interrupt gates except for a few task gates. The task gates are used for machine check and other catastrophies, perhaps to preserve as accurately as possible the machine state, for the post mortum.

An Example

In kd, either !pcr or the r idtr can be used to find the IDT's location. Here is a dump of my NT's IDT:

kd> !pcr
PCR Processor 0 @ffdff000
        NtTib.ExceptionList: 8014f09c
            NtTib.StackBase: 8014f380
           NtTib.StackLimit: 8014c3f0
         NtTib.SubSystemTib: 00000000
              NtTib.Version: 00000000
          NtTib.UserPointer: 00000000
              NtTib.SelfTib: 00000000

                    SelfPcr: ffdff000
                       Prcb: ffdff120
                       Irql: 0000001c
                        IRR: 00000000
                        IDR: ffff22e8
              InterruptMode: 00000000
                        IDT: 80036400
                        GDT: 80036000
                        TSS: 8001d000

              CurrentThread: 8014c1b0
                 NextThread: 00000000
                 IdleThread: 8014c1b0

*** Bad IOCTL request from an extension [1]

kd> dd idtr
80036400  00085034 80148e00 0008517c 80148e00
80036410  005812de 00008500 00085444 8014ee00
80036420  00085598 8014ee00 000856d4 80148e00
80036430  0008582c 80148e00 00085d48 80148e00
80036440  00501338 00008500 00086088 80148e00
80036450  00086188 80148e00 000862ac 80148e00
80036460  0008659c 80148e00 0008679c 80148e00
80036470  00087194 80148e00 00087528 80148e00

kd> dd
80036480  00087628 80148e00 0008773c 80148e00
80036490  00a07528 80148500 00087528 80148e00
800364a0  00087528 80148e00 00087528 80148e00
800364b0  00087528 80148e00 00087528 80148e00
800364c0  00087528 80148e00 00087528 80148e00
800364d0  00087528 80148e00 00087528 80148e00
800364e0  00087528 80148e00 00087528 80148e00
800364f0  00087528 80148e00 00087528 80148e00

kd> dd
80036500  00080000 00000000 00080000 00000000
80036510  00080000 00000000 00080000 00000000
80036520  00080000 00000000 00080000 00000000
80036530  00080000 00000000 00080000 00000000
80036540  00080000 00000000 00080000 00000000
80036550  00084586 8014ee00 00084670 8014ee00
80036560  00084780 8014ee00 0008533c 8014ee00
80036570  00084100 8014ee00 00087528 80148e00

kd> dd
80036580  00084ffc 80018e00 00087b44 80678e00
80036590  00083844 80148e00 0008384e 80148e00
800365a0  00089dc4 805f8e00 00083862 80148e00
800365b0  0008386c 80148e00 00083876 80148e00
800365c0  00080a18 80018e00 0008388a 80148e00
800365d0  00080ba4 80628e00 00080a84 806f8e00
800365e0  00087dc4 80678e00 000838b2 80148e00
800365f0  00081dc4 806e8e00 00081404 806f8e00

Int 2e

NT system services are accessed through the trap at inex 0x2e. The trap handler dispatches the system service request according to an index in the ecx register. The services are layout in the service vector automatically in the NT build. Their indices are actually in alphabetical order by service name!

Perhaps... A pointer to a table of system service tables is available in each processes EPROCESS structure. Two tables are need for GUI threads: one for the base NT services and a second to access graphics routines which previously resided outside the kernel, but are now part of the in-kernel GDI.