CS 332: Operating Systems

Final project: do something with Linux

Proposal due via e-mail by 11:10AM Wednesday, June 2. Code and related documents due by 5:00PM Monday, June 7, via the Courses system. You may work with a partner on this assignment.

Your assignment for the final project is to either:

make an interesting modification to the Linux source code, collect data on the effects of your modifications, and discuss your results, or
do an in-depth study of some sub-system of Linux and provide a report.

You will turn in the following:

(Via e-mail by 11:10AM June 2). A project proposal outlining what question(s) you want to answer, how you intend to modify Linux to address your question(s), and how you plan to evaluate your modifications. I will provide feedback as quickly as I can to help you get started well. If you get me a proposal sooner rather than at the deadline, I'm likely to respond more quickly.
If your project involves modification of the Linux code, turn in code fragments showing the changes you made to complete your task. Don't give me full printouts of .c or .h files. Just show me your modifications with a couple lines before and after your changes to give me some context. Please label each code fragment with the path of the file being modified.
A discussion of your results. This will typically be 2-5 pages, depending on the goals of your project if you did a modify-Linux project. (For an in-depth-study style of project, your discussion should be on the order of 8-10 pages.) Your discussion may also include data you have collected, displayed as appropriate to the situation.

Here are a few project ideas. Feel free to use or adapt one of these, or come up with your own idea. Keep in mind that a thorough study of any of these topics could easily go on for weeks or months, so you'll need to stop before learning everything there is to know. If you have questions about whether you are going deeply enough into your chosen topic, talk to me about it.

Modify the scheduler. The existing Linux scheduler uses some sort of hybrid scheme that is multi-core aware, attentive to priorities, etc. There are many ways to modify a scheduling algorithm, including the ones discussed in Tanenbaum. If you wanted to understand the scheduler better, you could devise and test two or three modifications to it. For example, can you figure out how to enforce a strict round-robin scheduling? If so, how does it change performance under light load? Heavy load? CPU-bound programs? I/O-intensive programs? etc. (And what do you measure for performance--average quantum length? turn-around time? something else?)
Modify page-replacement. Page replacement occurs when Linux decides that physical memory is loaded enough to warrant discarding or swapping to disk some lightly used pages. There are lots of page replacement strategies discussed in Tanenbaum. You could find the page replacement code in Linux, try a couple different replacement strategies, and measure the effects on performance.
Create memory system analysis tools. To help you study the behavior of the memory system, you could write system calls to collect data about virtual memory as the system runs. You could, for example, write system calls named startmemtracker() and stopmemtracker(struct MemTrackerData *mtd), where stopmemtracker(&memTrackerData); would return whatever data you collected via the memTrackerData parameter. Using this pair of system calls and a suitably constructed MemTrackerData struct, you might examine the frequency of page faults, the percentage of memory accesses that are hits rather than misses, the number of second-level page tables in use, etc.
Create file system analysis tools. Similar to the previous idea, but now you would want to collect information about the number of i-nodes, free blocks, free i-nodes, wasted space in data blocks, etc.
Study fork, exec, and wait. Describe in detail what happens when a process forks, the child executes one of the versions of exec(), and the parent executes one of the versions of wait(). What resources get shared, as opposed to resources that are copied? Does some of the exec'd program's file get loaded into memory, and if so, which portion? What happens to open files? What happens to open network connections? Where does the child process's page table come from? Where is the exit status of the child stored, and how is it retrieved to prevent the survival of a "zombie process"? etc.
Study or modify the buffer cache. File system implementations typically maintain a cache of blocks read from a hard drive. Here, your job is to investigate the Linux buffer cache system, and describe how it works (or modify it and collect data on the resulting changes in performance). If there's a general mechanism that is file system-independent, go ahead and describe that. If the caching is system-dependent, then pick a common file system and describe its caching mechanism. As when investigating any caching system, you should find the answers to the four questions asked by Patterson and Hennessy in the "Memory Hierarchies" chapter of their textbook "Computer Organization & Design." Also, look into how the contents of the buffer cache are shared (if at all) between distinct processes that might have identical files open for reading or writing.
Device drivers. Pick one relatively simple device type (e.g. mouse, keyboard, or hard drive) and investigate how the OS communicates with the device in question. How, for example, do keystrokes get routed from the physical device to the input buffer of the right process? What happens when a process performs a write operation on a file stored on a hard drive? How do motions of the mouse turn into a cursor that moves around the screen?

Alternatively, try modifying a driver. You might, for example, try to modify the mouse driver in some way to give you control over cursor acceleration, or reversing horizontal and vertical motion of the cursor, etc.
Add biometric measurements to the keyboard driver. One kind of biometric measures patterns in the timing of an individual typist's keystrokes. Studies (I really ought to have a reference here) suggest that it is very hard to mimic somebody else's keystroke timing. If this idea turns out to be reliable, it could enable us to discard passwords and instead require users to type some plain text to gain entry to their accounts.

For this project, you could modify the Linux keyboard driver to collect simple keystroke timing information, such as the mean and variance of the delays between keystrokes.

Many of my project ideas involve collecting statistics of some kind as the kernel runs. How should you go about this? Suppose, for example, that you want to collect statistics about quantum lengths before and after a change to the scheduler. Here are some pieces you might want to include in your system:

Add a few global variables to the kernel to enable the scheduler to collect the data you need collected. You'll want to provide declarations of these variables (e.g. "extern int jondich_nQuanta;") in some common .h file, and a definition (e.g. "int jondich_nQuanta;") in a single .c file--probably the .c file where you define your system calls.
Add a system call to activate the data collection process, possibly restricted to a particular pid, and another system call to stop the data collection and retrieve the collected data.

Have fun.