Chat about Process Abstraction

Simon

The knowledgeable programmer. Confident and clear, Simon explains concepts in an abstract manner.

Sage

The thoughtful and curious observer. Skeptical at times and not afraid to challenge Simon, Sage voices questions and frustrations that many learners feel.

Simple

The beginner student. Curious, eager to learn, and often reacts to explanations, making the dialogue accessible for readers who are new to computer science.

Simple:I have heard developers talk of a process. When I ask what is a process, I get so many answers! Some say it is a running program; some say it is a container of resource; others say it is a collection of threads. I am really lost in all this.

Sage: Well Simple, no need to complicate things. A process is a running program. A program is some text and a process is when that program runs.

Simple: If that's the case, who proceeds the running program to set it running? To provide memory of various sorts, RAM and files? To listen to and answer the input/output?

Sage: Well Simple, some things just are. Simon?

Simon: Well Sage I think Simple has a point. It is tempting to look at things simply, and call a process a running program, but that viewpoint hits these dead-ends.

Sage: I suppose there is some truth to Simple's objections. I think of a running program and if in my mind's eye I halt it briefly I don't feel like the process disappears and reappears when the process continues.

Simple: You can halt and continue a program?

Sage: Yes Simple, it is called multi-processing or time sharing, and it is done hundreds of times every second even before your eyes, as you gaze at your computer screen. Halting a program, that is, halting a thread in the process, storing all its data, to be brought forth and to continue the thread's computation later — that is how a computer can do many things at a time.

Simon: Thank you Sage for bringing that up. Perhaps not only the notion of process is worthy of our philosophizing but the idea of to run. Which brings me to another objection, Sage. Which is — a program can have multiple running streams of activity. Remember how someone told Simple that a process is a collection of threads? If one focuses only on the running of a program, would not some programs run more than others, as threads are created and destroyed?

Sage: Indeed. Something remains before all threads to contain them. And provide for them as they multiply and are dismissed. And remains after for the operating system to recuperate the resource. And is that then the notion of a process?

Simon: Process as a container. Think of the resources needed for threads to run: memory, access to files, privileges and a claim to hardware to run the thread. The processis also a communication endpoint for signals and messages from other process, and even the rest of the world through networking.

A process is a container for those things, and then the threads that are needed to carry out the program code are run inside this container and uses the container's resources found or requested to accomplish its task.

Simple: How is memory put inside the container?

Simon: The memory you are thinking of is physical memory. No, physical memory is not going anywhere, Simple. A process sees virtual memory, not physical memory. The operating system provides the process an address space that appears continuous and private, and gives the hardware instructions how to pass reads and writes to virtual addresses in virtual memory to physical addresses in physical memory.

Sage: Simon, you seem to answer every question with a riddle. First a process is not a real running process, it is a container. Now the memory in this container is not physical, only virtual. Why can't we do like Simple says, just write to memory, not like Simon says, with memory that is only virtual?

Simon: If processes accessed physical memory directly, they could interfere with one another.

Virtual memory allows each process to have a private, isolated address space. The operating system maps virtual addresses to physical locations.
Not only are memory spaces isolated, memory can be overcommitted — processes are braggarts and will make exaggerated claims on memory. The virtual space is the operaring systems' forward promise to provide physical memory when demanded.
Also the operating system can take advantage of the memory hierarchy. When large amounts of memory are demanded, the lest frequently used data can be moved to larger, cheaper storage, like SSD drives.

Simple: So the operating system does this — sets up the virtual to physical correspondence, when you say a process has memory, what it has is this correspondence. Each process, each container, has attached to it the correspondence.

Simon: Yes, called its virtual memory space.

Sage: I must admit this is coming together nicely. Simple, you asked some very good questions. Sometimes you have the best ideas.

Simple: Now if i can only get you to like pineapple on pizza.

Sage & Simon: Never!

Simple: Okay, so a process contains resources like memory mappings. What else is in the container?

Simon: The process works with the operating system to create and maintain open file descriptors. Every thread in the process can refer to those.

Simple: What is a file descriptor? What needs describing?

Simon: A descriptor is really just a data structure in the operating system memory. They are arranged into an array of descriptors with indices 0, 1 and 2 reserved for stdin, stdout and stderr. The library function printf writes to file descriptor 1,stdout by default.

Simple: How is using printf writing to a file? It comes out on my terminal.

Simon: In unix so much is a file. If it can receive or produce bytes it is a file. So as far as Unix is concerned, writing to the terminal is just another file.

Sage: A file in the file system has protections. The file has an owner and that owner says who can read and write that file. How does a process get access to a file?

Simon: The brings up another process property, a very important property. A process has a user ID (UID) attached to it. What the process does it does with the permissions and rights that that user has. On file access, the process UID is matched against the file's ownership tags to see if access is permitted.

All this information is put in a kernel data structure called the Process Control Block (PCB). The PCB's are linked together so that the operating system can find them. On forking a new process, some PCB information is copied into the new PCB that was created for the new process — such as the owner of the process.

Sage: I can see now the container aspect of a process. But how does running code fit in with this?

Simon: The running of code is given a new abstraction called a thread. A thread is also a very physical thing, called a hardware thread which consists of the CPU executing a program. A computer many have many CPU's; and each CPU can have many cores. Each core can run a program independently. That's the physical image of a thread.

A software thread is the same but we assume there are limitless software threads that can run simultaneously through time-slicing the hardware threads. The scheduling of access of a processes software threads to hardware threads is also focused on the process. The process can make claims for CPU access for all the the one or several threads the run in the process.

Simple: This reminds me of one of those chess exhibitions; where a grand master plays many opponents by devoting a little time at each chess table, makes a move, and moving on the next table.

Simon: It really is like that. The grand master has the chess board in front of them that's all they need to know. Associated to each software thread is a Thread Control Block (TCB) data structure that serves the same function. When the software thread is detached from a hardware thread, the TCB holds the values of all the CPU registers ready for when the software thread becomes active again.

Sage: I am confused about time slicing. Are the hardware thread dedicated to software threads?

Simon: A thread has a state, waiting, ready, or running. This is a flag in its TCB. When the scheduler has available a time-slice of a hardware thread it picks a TCP which is in the ready state and assigns its software thread to that hardware thread. The state is changed to running. A thread can go back to ready from running it it is preempted, that is, a hardware time has expired.

A thread can go from the running state to the waiting state for several reasons. They might have requested I/O that is not completed. They may have used pause to wait for a signal. When the waited-upon event occurs, the thread is moved to the ready state.

Simple: On please! Talk about signals. I have a project in signals in my operating system's class! I am to write a program that forks a child. Then the child signals the parent, and the parent signals back, until they have done this back and forth 10 times. Then both processes exit.

Simon: That project sounds familiar. I think I know the human that is behind this.

About signals: A signal is a software interrupt delivered to a process. It is the kernel's way of saying: "Hey process, something happened — stop what you are doing and handle this." Examples are:

SIGINT (Ctrl+C in a terminal)
SIGCHLD (child terminated)
SIGSEGV (segmentation fault)

The kernel maintains, per process, a set of pending signals (a bitmap or queue). A signal is just a bit. It does not record the PID of the signalling process, or if it has been signaled multiple times before being handled.

A signal is delivered when a (software) thread returns from a wait state. The signal itself can cause the exit from the wait state. For instance, the pause() syscall places the thread in a wait state to become ready-to-run on any signal. To deliver the signal, the awakening software thread is redirected to run the handler code rather than resuming its own codeflow. To the programmer it looks like the handler code was called as a subroutine in the middle of the normal code flow.

Simple: So this signal handling is really an example of scheduling, time-slicing, and the wait-ready-run states of a software thread?

Simon: It is. It is also another example of process as container. Signals originate from a thread, but they are seen as coming from a process. The signal is (generally) posted to the receiving process and some thread running in that process will be selected to handle the signal. For a thread to successfully signal, that thread's containing process must have a permission of the receiving process. That permission will be based on properties of the process: it coud be the UID's or process group of the two processes.

Simple: I think I need to do a SIGHUP. Have you heard of that signal?

Sage: Ah the SIGHUP! I know thee well! It means "hangup". In early days it was sent to dependent processes by the software monitoring the terminal lines. As an integer its value is 1. It is a popular one to be "caught", that is, to install your own custom signal handler. It is often used by daemon processes to retake their configuration files.

Simon: Well it's late. So at this point I am going to kill 9 this convo. If you know what I mean.

Chat about Process Abstraction

by: burt rosenberg
at: university of miami
date: september 2025

Overview

Cast of characters

Process as Container

Process memory space

Process file descriptors and process owner

Threads and Signals

Chat about Process Abstraction

by: burt rosenbergat: university of miami date: september 2025

Overview

Cast of characters

Process as Container

Process memory space

Process file descriptors and process owner

Threads and Signals

by: burt rosenberg
at: university of miami
date: september 2025