Linux Scheduler

The PCB on linus is combined with the kernel stack, into a contiguous block of 8K (two 4K pages), with the kernel stack from top growing downwards, and the PCB a fixed structed aligned along the low memory boundary.

The Linux schedulers recognizes three scheduling classes: real-time round-robin and fifo classes, and a time-share other class. The policy field of the the task_struct contains the integer defined in include/linux/sched.h line 108 (linux 2.4.22).

Real-time run first based on the rt_priority value in the PCB. Else the algorithm for selection is based on the result of the goodness function in sys/kernel/sched.c (line 144). the task_struct.counter is added to the nice value, with bonus points for CPU and memory affinity. The counter is the number of ticks left in the qunatum. Once everyone has eaten their quantum, an epoch completes and new quantum are distributed.

The scheduler was major overhauled from 2.4 to 2.5. The file to see is kernel/sched.c. At first glance, the changes seem larger than they were. It seems to me now that just some fairly standard data structure techniques were used, and in hindsight it is odd that the original scheduler didn't use these techniques. But there are some other changes whose impact might be more subtle. Google O(1) Scheduler Linux for more information.

Exhibits, from linux 2.4.22

The loop on line 606 of schedule is the heart of scheduling the SCHED_OTHER task. On line 578 is the scheduling of SCHED_RR tasks. Note position at end of run queue and the reloading of the count. On line 622 wis the end of epoch calculation. Note that nice is added twice. It makes the quota longer (count) and count is a part of goodness, but also goodness adds the nice value again.

Is that odd?

Goals of Scheduling

Scheduling can be seen as either:

The OS objective of efficient use of resource. In particular, the CPU Burst/IO Burst model of process execution means that a process will be waiting much of the time. Interleaving several processes will keep the CPU busy. This is a gain in efficiency for free.
The OS objective of support process abstractions. First, the process abstraction hides the CPU resource allocation. If a process expects to execute, it is the scheduler that must provide CPU cycles to fulfill this expectation.
Second, the process abstraction might include a seemingly simultaneous advancement of processes. This is the essence of time-sharing, or of interactive processes, depending on how things are defined. The scheduler must allocate CPU is such a way the the interleaving of cycle allocations among processes is transparent to the user.
Or the OS objective of isolation and protection. A process should not be able to hog the CPU.

Preemption

Preemption is the removal of the CPU resource from a process except at certain natural pauses in the process' progress. In order to provide transparent support for the simultaneous progression model of processes, preemption is necessary. Let's say that the natural pauses of a process is an I/O wait, or a deliberate wait of some sort (death of child, wait for a signal), or process exit. The the additional preemption points used by most kernels are:

Return from a system call.
Return from a hardware interrupt (unrelated to the process).
A dedicated preemption interrupt.

The dedicated preemption interrupt is a clock which forces a preemption decision on a regular basis regardless of the process' behavior. From another viewpoint, this third case can be merged with the second, since the preemption interrupt will be a hardware interrupt driven by a clock device, but we have kept it separate because it seems to represent the Big Step in the decision to implement a fully robust preemption.

Preemption can further be considered as to whether it can occur in when the process is in kernel mode, or only in user mode. Traditional Unix kernels did not preempt the process when running un kernel mode. The kernel shared data structures across processes, hence kernel preemption would introduce issues of data structure consistency and synchronization. Instead, in Unix, the preemption event occuring in kernel mode was handled on return from the system call which placed to process in kernel mode.

Scheduling Qualities

I am going to go out on a limb and say some original things. There are there big aspects to scheduling:

CPU Utilization. Scheduling should keep the CPU busy.
Overall utilization. CPU Utilization is trivial, but the propoer overlap so that jobs are not kept waiting is important. In the boot they categorize this as total wait time.
Latency. Scheduling might want to shorten the time between an event and the time the process handling the event runs.
Progress. Scheduling might want to make sure that every process makes some amount of progress.

There are some other measures that can be employed. For Batch processes, one can measure the number of jobs finished over a time period (through-put); or he time from start to finish of a job (turn-around). These have real-life analogs. The express lane in a supermarket has higher through-put. Why is that good? It should also have better turn-around. The first class line for airplanes does not have utilization but it has the best turn-around. Why is that good?

Processes have various needs and the scheduler should match those needs. Interactive processes need low latency. Hopefully they are easily satisfied in terms of progress (quick but small repsonses to events).

Finally, satisfaction of those needs can be measured either by average, best or worst case results, or by a consistency value.