Thoughts on Virtual Memory

Burton Rosenberg
Created: October 2005
Last revision: October 2006

Physcial versus Logical Memory: Virtual memory is the idea that there will be an arbitrary mapping between logical addresses, addresses which the programming, compiler and loader/linker see and physical addresses. It makes physical memory more of a general resource to be applied, and breaks the connection to the incidentals of physical memory layout. But it is true, and this weakens the more abstract explanation of virtual memory, that physical memory as an array of bytes is exactly the model of logical memory.

Linear versus Logical Memory: Memory is mapped in convient blocks called pages when referencing linear memory and page frames when referencing physical memory. Note that the words linear and logical memory are easily confused. Logical memory might be segmented, that is, memory bytes might be addresses as a pair (T,A), where T is a tag and A an address. However, the distinction falls a bit false in current usage. Each process has its own virtual space, so what is termed linear is really logical where the tag is the process identifier, or more generally, whatever abstraction encloses a consistent memory space. A true linear space would place all process memory into one gynormous stretch.

Demand Paging: Virtual memory refers to the idea of this mapping. Paging, and in particular demand paging, refers to the maintanece of the mapping. Demand paging says that a page will be mapped through to a page frame only if and when that page is accessed. There are exceptions, as always. Often a code page will read-ahead successive pages on the hopes that a wholesale purchase of page frames and bulk read of disk will payoff on the prediction that the feched and mapped pages will be soon used.

Memory Hierarchy: Although demand paging will get you up and running faster, and might allow you to run a program which otherwise would seem to require too much memory (the programmer wrote code which is never executed, maybe it deals with impossible error conditions) the strongest motivation for demand paging is to conserve page frames. Memory is consider to be a hierarchy of ever slower, ever larger technologies: from fast on-chip cache, through main memory, to the slow backing store of disk. From this point of view, Virtual Memory is something different - it is a manner to allocate varying qualities of resource to achieve the best result. The actual bit of code that is running is moved from the disk, where it spends its quiet moments unmolested, into main memory, where it can be quickly accessed during execution. Even this is not enough since chunks of main memory get shoveled into on-chip cache, and the actual burning tip of the instruction pointer is always feching from this faster memory.

Page Eviction: This being the case, the question of un-mapping pages becomes important. The simplest practicality is that if pages are never unmapped, we will run out of physical memory. More subtlely, if we unmap the right pages at the right time we will have a better computer as the resource of fast main memory (versus slower disk) is more wisely allocated. In considering this unmapping, we distinguish two scenarios: clean pages and dirty pages.

Swap space (backing store): Clean pages have never been written to, at least not since they were copied from disk at the time of mapping. Dirty pages have been written to. Dirty pages are more difficult to evict. Their contents must be written back out to disk (a.k.a. backing store) before the frame can be reused. Code pages, the program, are not necessarily clean. Code is often written so that a bit of linking occurs at run time. In which case, the disk image of the code isn't complete, it cannot be run. Also, there is self-modifying code. So code will be paged in (the word for creating a mapping and copying the content into the frame) and paged out to a backing store which contains the modified code (e.g. with resolved symbol reference). It will probably never again need a page out.

Data pages might or might not require page out. Their initial page in will depend on whether the data is pre-initialized in a complicated manner or not. A pre-initialized page will be read from the program file, just as if it were code. The page might otherwise be zero filled or left uninitialized. For security reasons, unitialized is not favored, since the page might be reused from some security sensitive process.

LRU Algorithm: It is a fact, a mathematical theorem, so they say, that the optimal paging strategy is to evict the page which will be reused the furthest in the future. (But oops, this doesn't take into consideration the difference between clean and dirty pages.) We don't know which page will be reused the furthest in the future, not knowing the future. So we generally take a proxy for this by looking for the page whose last use is the furthest is the past. This is called the LRU algorithm, Least Recently Used. Yet even this we can't do. It is expensive and perhaps over-precise to measure exactly the time from last use. Approximations are made, e.g., an access bit is associated with each page frame, which is periodically cleared and set on every page frame use. At any moment in time, frames with set access bits have been used more recently than frames with cleared access bits. Hardware can do this bit manipulation as part of a memory fetch.

Summary: So now you know the general setup of virtual memory as it is practiced today. Pages, page frames, demand paging for page in and approximate LRU for page out. There are two crucial data strutures to make this work: the page tables, which map forward from logical to physical addresses; and the page frame database, which maps backwards from page frames to logical addresses (this is needed for page out) and keeps tracks of the page frame's status.

Imagine if physical memory limitations, either amount or the speed hierarchy, weren't an issue. Does anything remain of virtual memory?

Memory layout problems are lessened. Processes needn't layout memory with respect to each other. But layout within a process still remains an issue. So perhaps this argument is weak.
Memory protection using VM is very strong, by using the principal that illegal memory cannot even be addressed, as opposed to the addressed memory being prevented from access by segment protection bits, or what have you.
Shared code and data segements have additional management options. Without VM, all sharers must agree on a free logical address range to map the shared memory. A process later wanting to join the share must coincidentally have open, or force to be open, this logical address range in order to establish the share.
Virtual address space is constant per process, rather than divided by the number or processes.

So the usefulness of virutal memory is not entirely based on the idea of memory hierarchy. However, this idea of backing store for levels of faster or slower memories is perhaps the fastest explanation of why VM is good. It also suits those that consider that all engineering is a matter of economics.