Overview of osv
The overarching goal of osv
(the experimental kernel) is to help you understand operating system implementation at an advanced level. The baseline code for osv
provides a complete, albeit limited, working operating system, capable of running a single program (loaded from disk) that can receive and print characters to a serial port (keyboard and display). It also contains basic synchronization, process time slicing, memory management, and a file system.
The lab assignments will task you with adding basic UNIX-like functionality to this baseline in a series of projects:
- Implement file system calls for managing file descriptors (e.g.,
open
,read
,close
, etc.). - Implement file system calls to support a user-level shell by implementing UNIX
fork
,wait
, andexit
. - Implement
pipe
to supplement the shell. - Add
sbrk
and dynamic stack growth.
There is an additional extension, which is a possibility for your final project, in adding copy-on-write semantics. Except for the first lab, these lab assignments require your code to carefully consider data structure synchronization.
Baseline code walkthrough
We next walk through the code and main data structures.
Directory structure
-
arch
: architecture-depending code for kernel and user. You shouldn’t have to make any changes here, but if you want to understand how the OS interacts with hardware, this is the place. -
include
: header files for all architecture-independent code. Provides detailed documentation on OS subsystem interfaces (virtual memory, file system)—they are contracts that each OS component provides. -
kernel
: the operating system kernel code that implements the above contracts. -
user
: user applications that run onosv
. This may be updated throughout the term with additional test cases for each lab. -
lib
: library code shared by the kernel and user; some are for user programs only. -
tools
: utility programs for buildingosv
.
Booting
The first “program” that gets executed as an x86 PC boots is the BIOS. The BIOS, after performing hardware initialization, loads the bootloader from a fixed location on disk into a fixed location in memory and jumps to it. The bootloader in turn loads the operating system (osv
) kernel from a predefined location on disk into a fixed location in memory and jumps to it.
When compiled with the x86_64
flag, osv
uses the arch/x86_64
folder for architecture-dependent code. As x86
is backwardly compatible with earlier operating systems (e.g., 16-bit or 32-bit), when the machine boots it starts out in a more primitive mode. Part of the task of the bootloader is to shift the processer to 64-bit mode. As gdb
is not set up to be able to debug both 32- and 64-bit modes simultaneously, we start debugging at the beginning of osv
rather than the beginning of the bootloader.
For reference, the bootloader code is primarily in arch/x86_64/boot/bootasm.S
, arch/x86_64/boot/bootmain.c
, and include/kernel/multiboot.h
. Unless you’re really curious, you can safely ignore these files.
Initialization
For initialization code, we’ll look at kernel/main.c
.
In the function main()
, the kernel needs to do several things as it starts up. Note that the ordering of the function calls in main()
very much matters—for example, we need the ability to allocate memory before we can build the kernel page table. etc. We will discuss the functionality in the order they are referenced by main()
.
vm_init()
More info to come later. This is mostly focused on physical vs. virtual memory and page tables.
arch_init()
This function sets up multiprocessor execution; osv
runs on 2 CPUs.
It also sets up I/O interrupts, initializes the per-CPU segment table, and sets up handlers for interrupts, exceptions, and system calls.
Note that again ordering really matters. We need to set up the interrupt table before we can enable interrupts!
thread_sys_init()
Sets up the scheduler and its associated spinlock, along with the kernel threading system and its associated spinlock. Note that in osv
threads are the unit of execution.
synch_init()
Enables synchronization primitives (locks and condition variables).
proc_sys_init()
Sets up structures needed to manage process’ life cycles. Each process has its own address space and metadata.
trap_sys_init()
Sets up the machine-independent part of trap handling, where trap handlers are registered and invoked.
console_init()
Sets up the console UART driver. The specific initialization code depends on the device (e.g., the specific model number of the serial port). Understanding this device code therefore requires reading device manuals, something you should skip for now.
However, note that kprintf()
assumes that the console and uart are initialized! So, don’t try to print anything before this point. (You can of course use gdb
to set breakpoints to examine the contents of memory.)
pmem_info()
Displays information about the physical memory map.
sched_start()
Sets up the current thread as this CPU’s idle thread, turns on interrupts for this CPU, and then starts scheduling. The current thread (idle thread initially) schedules the next thread to run.
As discussed below, the actual context-switch code lives in arch/x86_64/kernel/swtch.S
.
A second thread
At the end of main()
, a new thread is created to run kernel_init()
, which initializes the storage system and starts up the second CPU. This involves a handful of steps, discussed next.
We don’t immediately jump to the user program, however. Rather, we mark it as “scheduable” by calling thread_start_context()
. The user process will get scheduled later on a timer interrupt.
bdev_init()
Initializes a machine-independent block device interface. This also involves device-specific code for the (IDE) disk device.
fs_init()
Sets up the virtual file system (VFS) interface that defines operations a file system needs to support.
The file it lives in (kernel/fs/fs.c
) also defines general file-system operations that other parts of the kernel can use.
Note that the actual VFS layer is implemented by code in kernel/fs/sfs/sfs.c
; this is the default file system in osv
.
proc_spawn()
This call occurs after all initialization is done. It sets up the first process by copying the application code/data from a file on disk into memory. It also sets up the page table so that the hardware will translate user addresses to their corresponding physical addresses.
Traps
More info will be filled in later; this is related to what we’ve seen in class walking us through how our system calls actually get called.
Processes, threads, and synchronization
If what you’re reading below doesn’t quite make sense, look ahead to Chapters 26-28 in OSTEP.
Process/thread creation
Each process has its own address space (a page table and a set of memory regions), a list of threads (for this course, a process will have only one thread), a process ID, a reference to its parent process (if any), and a reference to its current working directory. A thread can run inside a process. It has a stac to run in the kernel, a state (e.g., ready, sleeping), the current trapframe, and space to save its scheduling context.
The file kernel/proc.c
has several interesting procedures. One is proc_init
. This function allocates a process struct and initializes it. Each process needs at least one thread to run its code. The function call to kernel/thread.c:thread_create
from kernel/proc.c:proc_spawn
creates a thread, allocates a kernel stack, and attaches the thread to the process.
Thread execution
Now that a process has set up its address space and has a thread to run, it can start running by calling thread_start_context
.
All runnable threads are on the ready queue (kernel/sched.c
). The queue is only accessible through the scheduler interface. Each running thread selects another thread to run when its time slice is up.
Synchronization
In osv
, there are two kinds of locks to enforce mutual exclusion. The difference concerns what they do when the lock is busy.
-
kernel/sync.c:spinlock_*
: A spinlock spins in a loop until the lock is free. -
kernel/sync.c:sleeplock_*
: A sleeplock yields the processor until the lock is free.
A given data structure will generally be protected by either a spinlock or a sleeplock. There is no guarantee of mutual exclusion if they are mixed.
The low-level kernel routines typically use spinlocks; for example, the scheduler needs to use a spinlock as if the scheduler is busy, you won’t be able to put the current process to sleep and pick another process to run until the scheduler is free.
Similarly, it is generally a bad idea to try to acquire a sleeplock in an interrupt handler. Most interrupt handlers are structured to avoid accessing shared data as much as possible, but where it is needed, they should use a spinlock. Care must be taken that interrupts are disabled whenever a spinlock is held, or else the interrupt handler could get into an infinite loop (waiting for a spinlock that is held by the interrupted process).
Another synchronization primitive that osv
offers is condition variables (kernel/sync.c:condvar_*
). Read OSTEP Chapter 30 for more details on condition variables.
A final bit of complexity in the process abstraction concerns the context-switch code (arch/x86_64/kernel/swtch.S
). To really understand it, you will need to look at the code, trace the execution into and out of a context switch, and so forth. You can ignore it for Labs 1 and 2.
Memory management
Although the memory management code is complex (reflecting the intricacies of x86), the abstractions are fairly straightforward.
Page tables
The key idea is that we have two sets of page tables for every process:
-
One is machine-dependent, with a structure defined by the particular architecture. For example, in x86-64, there are four levels of page tables with a particular format for each page table entry, etc.
-
In addition,
osv
has a machine-independent page table, represented as a set ofmemregions
—contiguous regions of the virtual address space used for a specific purpose (such as to hold code or data). The reason for this split is:-
to make
osv
for portable (not tied to a specific page table format), and -
to allow extra information to be stored per page as necessary (the reason for this should become obvious in later assignments).
-
We have routines to generate the machine-dependent page table from the machine-independent one on demand—essentially, every time you make a change to the machine-independent one, you’ll need to invalidate (kernel/mm/vm.c:memregion_invalidate
) the old machine-dependent page table, to get the system to generate a new (machine dependent) one.
Virtual address space
A process has information about its virtual address space, its file descriptors, and its threads.
include/kernel/vm.h:addrspace
describes the virtual address space of a process. it consists of a list of virtual memory regions (include/kernel/vm.h:memregion
), and also a pointer to the x86 page table (vpmap
).
include/kernel/vm.h:memregion
is a 4KB page-aligned region of virtual memory of indefinite page-aligned length. A set of memregion
s defines the virtual addresses accessible to a process. Addresses outside of a memregion
are considered invalid.
Operations on memregion
include: extending the region, mapping a region to an address space, copying a memregion
, etc.
A memregion
tracks its address space, its range, permission of the region, whether it’s shared by multiple address spaces, and its backing memstore
, if any (used for memory-mapped files; you don’t need to worry about it).
File system
The file system consists of a VFS layer (include/kernel/fs.h
) which specifies the file system interface, and a concete implementation (include/kernel/sfs.h
).
The on-disk data structure representing a file is called an inode. A copy of the inode can be stored in memory, e.g., if the file is open. If so, there is a bit more information kept, e.g., the reference count. The function open
loads an inode if it doesn’t already exist in memory.
The file system locates the on-disk inode by doing a directory lookup. A directory contains multiple (filename, inode number) pairs, where the inode number is an index into a disk region that stores all the inodes. Note that inodes don’t actually store file data; they simply record where on disk to locate file data for a given file.