CanesOS Semaphores: from spin locks to wait/notify

by: burt rosenberg
at: university of miami
last-update: sep 2020 -notation and fuller explanation
oct 2019 -original

Overview

The operating system supplies system calls used by threads to solve problems of concurrency.

If there is a necessary time ordering between statement, statement blocks, functions, or processes, then those element are synchronized. Else if there is no imposed order, they are concurrent. The word concurrent tends to connotate a necessary concurrency — that two things run at the same time. This is not the case in our definition.

Test and set

To achieve synchronization between threads also necessarily requires hardware primitives upon which to build software primitives. The simplest to understand is the test-and-set primitive, which is the 8088 was the instruction LOCK XCHG using memory, register addressing (page 2-157 of the Intel iAPX 88 Book, 1981).

We represent this in code somewhat schematically in the below code. Since any such instruction is likely to be machine architecture dependent, it is likely that any well written operating system would enclose it in a call or macro, so a change in one place will make the resource available to all dependent code under a consistent API.

What LOCK XCHG register, memory does is swap the values in the named register and memory location, in a synchronous manner. Any seemingly concurrent LOCK XCHG instructions on different threads will resolve to a consistent time ordering of which was fully accomplished before the other.

The quickly presage how such an instructions helps build correct concurrent code, the memory location can be the lock variable, normally loaded with a 0 to indicate unlocked. Contenders for the lock LOCK XCHG a 1 into the lock variable, and if the register after the exchange is 0, then that thread has locked the lock. Else if will be a 1, and that thread had failed to lock the lock, as it was already locked.

The thread holding the lock will set the lock variable to 0 to unlock the lock.

Spin locking

We conclude by placing the lock with test and set (LOCK XCHG) in context.

If the thread gets the lock, register is 0, it proceeds to in its code until it unlocks the thread by setting the lock to 0.

If the thread does not get the lock the register is 0, it tries again in a tight loop.

Spin Locking

Thread A: 
		...
		
		// spinlock on lock variable
		r = 1 ;
		LOCK XCHG(&r,&lock) ;
		while (r==1) LOCK XCHG(&r,&lock) ;
			// holding lock
			...
			...
			// release lock
			lock = 0 ;
		// outside locked section of code
		
		...

Thread B:
		...
		
		// spinlock on lock variable
		r = 1 ;
		LOCK XCHG(&r,&lock) ;
		while (r==1) LOCK XCHG(&r,&lock) ;
			// holding lock
			...
			...
			// release lock
			lock = 0 ;
		// outside locked section of code
		
		...

PV-Semaphores

However inefficiency of this require that the spin times should be very short. Therefore the spinlock is the most basic synchronization primitive and better primitives are built upon it. For instance the semaphore.

The semaphore includes a wait mechanism. If a thread does not get the lock, it is placed in a wait queue attached to the lock, and does not consume anymore cycles until awakened. It is awakened by the thread that unlocks the lock that the waiting thread did not get.

The rules for a PV-semaphore are,

A sample application is to redo the test and set example of simple mutual exclusion.
PV-Semaphore Mutex

Thread A:

		semaphore_set(1) ; // set s to 1
		
		...
		
		// get lock
		semaphore_P() ;
			// holding lock
			...
			...
			semaphore_V() ;
		// outside locked section of code

Thread B:
		...
		
		// get lock
		semaphore_P() ;
			// holding lock
			...
			...
			semaphore_V() ;
		// outside locked section of code
		
		...

Toy implementation in CAOS

#include<stdlib.h>
#include<stdio.h>
#include<assert.h>
#include<unistd.h>


/*
 * semaphore 
 * for the hypothetical cane OS 85 op sys, a.k.a. caos85
 * author: burt rosenberg
 * created: october 11, 2019
 * last update: 11 oct 2019 
 *
 */


typedef enum { 
	THREAD_INIT, THREAD_READY, THREAD_RUNNING, THREAD_WAITING, THREAD_ZOMBIE  } 
	ThreadState ;


struct TCB {
	struct TCB * wait_next ;
	int tid ;
	ThreadState  thread_state ;
	int priority ;
	int cpu ;
	int sleeps ;
	int wakeups ;
} ;

struct PVSema {
	int lock ;
	int count ;
	struct TCB * head_queue ;	
} ;

#define N_SEMAPHORES 8

struct PVSema sema_table[N_SEMAPHORES] ;
struct TCB * current ;


int sys_sema_queue_empty( struct PVSema * s) {
	return ( s->head_queue==NULL ) ;
}

struct TCB * sys_sema_queue_remove_head( struct PVSema * s) {
	struct TCB * t ;
	t = s->head_queue ;
	if (t==NULL) return NULL ;
	s->head_queue = t->wait_next ;
	return t ;
}

int sys_sema_queue_priority_insert( struct PVSema * s, struct TCB * tcb ) {
	struct TCB * t = s->head_queue ;
	if (t==NULL) {
		s->head_queue = tcp ;
		return ;
	}
	while (t->wait_next) {
		if (t->wait_next->priority>tcb->priority) {
			tcb->wait_next = t->wait_next ;
			t->wait_next = tcb ;
			return 0 ;
		}
		t = t->wait_next ;
	}
	t->wait_next = tcb ;
	return 0 ;
}

#define ATOMIC /* atomic section */

int testandset(int * m, int val) {
	ATOMIC { 
		int old_val = *m ;
		*m = val ;
		return old_val ; 	
	}
}

void sys_sema_spinlock(struct PVSema * pvs) {
	int l = testandset(&(pvs->lock),1) ;
	while (l){
		l = testandset(&(pvs->lock),1) ;
	}
	return ;
}

void sys_sema_unlock(struct PVSema * pvs) {
	testandset(&(pvs->lock),0) ;
	return ;
}

int sys_sema_P(int sid) {
	struct PVSema * pvs ;
	if (sid<0 || sid>=N_SEMAPHORES) return -1;
	pvs = &(sema_table[sid]) ;
	
	sys_sema_spinlock(pvs) ;
	pvs->count-- ;
	if (pvs->count<0) {
		sys_sema_queue_priority_insert(pvs,current) ;
		current->thread_state = THREAD_WAITING ;
	}
	sys_sema_unlock(pvs) ;
	return (pvs->count<0)?0; pvs->count ;
	 // return will take us through the scheduler
}

int sys_sema_V(int sid) {
	struct PVSema * pvs ;
	if (sid<0 || sid>=N_SEMAPHORES) return -1 ;
	pvs = &(sema_table[sid]) ;
	sys_sema_spinlock(pvs) ;
	pvs->count++ ;
	if (!sys_sema_queue_empty(pvs)) {
		struct TCB * t ;
		t = sys_sema_queue_remove_head(pvs) ;
		t->thread_state = THREAD_READY ;
	}
	sys_sema_unlock(pvs) ;
	return (pvs->count>0)?0; -pvs->count ;
	// return will take us through the scheduler
}

Yikes! Mathematics

The following are uses of semaphores to solve other problems in concurrency. The problems and solutions are from Allen Downey's Little Book on Semaphores.

Notation: A good way to reason about concurrency is to mark non-concurrent events, A and B, with their mandatory time ordering. For instance, A < B means A will come before B (if they both do occur). Concurrency is then an application of a non-excluded middle: neither A < B nor B < A are required. Solving a problem in concurrency is using (for instance) semaphores to achieve the ordering relationship required.

Note that the notation is consistent with conventions about inequality:

One must not think of statements A < B as statements of neutral fact, since barring the infinitesimal possibility that A and B occur at exactly the same moment, Special Relativity says that simultaneity is relative anyway, it is a natural tautology that A < B ∨ B < A (special relativity: in a frame of reference).

The meaning of A < B is that there is a mechanism in place that insures A < B. If there is no mechanism in place that insure A < B and none that insures B < A, the A and B are concurrent. Since neither is true we can write A ⊥ B.

We can notate non-concurrency by defining A ∐ B ≝ A < B ∨ B < A, and have the concurrency poset diagram,


         A < B     A > B
            |\      /|
            | \    / |
            |  \  /  |
            |   \/   |
            |        |
            | A ∐ B  |
            |        |
            \   |   /
             \  |  /
              \ | /
               \|/
              A ⊥ B

Solutions to some scenarios

Three basic concurrency scenarios are,
### Signal/waits:

# Enforce A1 < B1

s = Semaphore(0)

Thread A:
	statement A1
	Sema_V(s)
	
	
Thread B:
	Sema_P(s)
	statement B1

*Proof:

   given:  A1 < Sema_V, Sema_P < B1 and the
           semaphore promise Sema_V ∐ Sema_P
   
   if Sema_V < Sema_P, then
       A1 < Sema_V < Sema_P < B1, 
   and we have the result
   
   if Sema_P < Sema_V, Sema_P occurs when the count is 0, 
   and therefore thread B does not go on until Sema_V, so 
       Sema_V < B1, 
   and we have the result from A1 < Sema_V < B1
    
   note that in this second case we have established no relationship
   between A1 and Sema_P: A1 ⊥ Sema_P.



### Rendez-vous:

# Enforce A1 < B2 and B1 < A2

sA = Semaphore(0)
sB = Semaphore(0)

Thread A:

	statement A1
	Sema_V(sB)
	Sema_P(sA)
	statement A2
	
Thread B:

	statement B1
	Sema_V(sA)
	Sema_P(sB)
	statement B2

*Proof:

   given: the plain sequence relations, Sema_V(sB) < Sema_P(sA) (from thread A), 
          and the semaphore promises Sema_V(sA) ∐ Sema_P(sA) and  Sema_V(sB) ∐ Sema_P(sB)
   
   Establishing A1 < B2
   
   If Sema_V(sB) < Sema_P(sB), 
       then A1 < Sema_V(sB) < Sema_P(sB) < B2
   Else if Sema_P(sB) < Sema_V(sB) 
       then B2 does not begin until Sema_V(sB), Sema_V(sB) < B2,
       and A1 < Sema_V(sB) < B2
      
   The argument is easily changed to establish B1 < A2.
   (the reader should attempt this)



### Mutex:

# Enforce for all Threads _A_, _B_, either _A_n < _B_1 or _B_n < _A_1

M = Semaphore(1)

Thread _X_:

	Sema_P(M)
	statement _X_1
	...
	statement _X_n
	Sema_V(M)

	
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 13 oct 2019
update: 27 sep 2020