CS 332: Operating Systems

Linux miscellany

For each of the tasks and questions described below, submit on paper:

A discussion of your results. This may range from a sentence or two to a couple pages, depending on the complexity of the question. The discussion may also include data you have collected, displayed as appropriate to the situation.
Code fragments showing the changes you made to the code to complete the task. Don't give me full printouts of .c or .h files--just your modifications with a couple lines before and after your changes to give me some context. Please label each code fragment with the path of the file being modified.

Where is the code that handles the dispatching of system calls, and what does it do? (In the _syscall0, _syscall1, etc. macros, we see the "int $0x80" trap jump to some sort of handler, which in turn causes the desired system call to happen, etc. This question is asking you to find this handler, and to describe in detail what it does.)
Write or borrow some CPU-bound C program that takes at least 10 seconds to run (you could modify you prime-number generator, for example). Then modify your code to collect scheduling data for the program. How many times does the program get scheduled? What are its maximum, minimum, and average quantum length? How do these numbers differ, if at all, if the computer is under heavy load (running lots of processes) or light load?

How to go about this? Here are some pieces you might want to include in your system:
- Add a few global variables to the kernel to enable the scheduler to collect the data you need collected. You'll want to provide declarations of these variables (e.g. "extern int jondich_nQuanta;") in some common .h file, and a definition (e.g. "int jondich_nQuanta;") in a single .c file--probably the .c file where you define your system calls.
- Add a system call to activate the data collection process for a particular pid, another system call to stop the data collection, and a third to retrieve the collected data. (You could probably merge the first two system calls into one by passing a pid parameter that's > 0 to start data collection and == 0 to stop it.)
Pick one of the following. These are intended as in-depth projects. A thorough study of any of these topics could easily go on for weeks or months, so you'll need to stop before learning everything there is to know. For this assignment, I have in mind something on the order of a 5-10 page discussion. If you have questions about whether you are going deeply enough into your chosen topic, talk to me about it.

Feel free to use (with citation) information from on-line or print sources. But don't just take other people's word for how a particular system works. Instead, you should dig into the Linux code and really try to figure out what data structures are relevant, how they are initialized and maintained, and how control flows through the Linux code to cause your particular system to work.

I hope that one of these topics sparks your curiosity. Some of the things that operating systems do can seem like magic, or at least like the hidden knowledge of a secret society. I hope that some of the mystery will fall away for you once you dig into one of these topics. Have fun!
- The start-up system. Investigate in detail what happens at system startup. How does the hardware first transfer control to the Linux code? What does that code do? What data structures get initialized, when, and how? How soon are interrupts allowed? When is the first process created, and what is it? When is virtual memory activated? At what point does the "choose your OS" screen appear? etc.
- The virtual memory system. Investigate in detail the structure of the virtual memory system. What kind of page table system does Linux use? How does the page table get initialized on a fork/exec? What assumptions does the paging system make about hardware support for paging? What happens on a page fault? How does the page replacement system work? What properties do pages have? Can pages be shared between processes, and how? How does the OS switch between page tables at a context switch? etc.
- fork, exec, and wait. Describe in detail what happens when a process forks, the child executes one of the versions of exec(), and the parent executes one of the versions of wait(). What resources get shared, as opposed to resources that are copied? Does some of the exec'd program's file get loaded into memory, and if so, which portion? What happens to open files? What happens to open network connections? Where does the child process's page table come from? Where is the exit status of the child stored, and how is it retrieved to prevent the survival of a "zombie process"? etc.
- The buffer cache. File system implementations typically maintain a cache of blocks read from a hard drive. Here, your job is to investigate the Linux buffer cache system, and describe how it works. If there's a general mechanism that is file system-independent, go ahead and describe that. If the caching is system-dependent, then pick a common file system and describe its caching mechanism. As when investigating any caching system, you should find the answers to the four questions asked by Patterson and Hennessy in the "Memory Hierarchies" chapter of their textbook "Computer Organization & Design." Also, look into how the contents of the buffer cache are shared (if at all) between distinct processes that might have identical files open for reading or writing.
- Device drivers. Pick one relatively simple device type (I suggest mouse, keyboard, or hard drive) and investigate how the OS communicates with the device in question. How, for example, do keystrokes get routed from the physical device to the input buffer of the right process? What happens when a process performs a write operation on a file stored on a hard drive? How do motions of the mouse turn into a cursor that moves around the screen?