CS 332: Operating Systems
Final project: do something with Linux
Proposal due via e-mail by 11:10AM Wednesday, June 2. Code and related
documents due by 5:00PM Monday, June 7, via the Courses system.
You may work with a partner on this assignment.
Your assignment for the final project is to either:
- make an interesting modification to the Linux source code,
collect data on the effects of your modifications, and discuss
your results, or
- do an in-depth study of some sub-system of Linux and provide a report.
You will turn in the following:
- (Via e-mail by 11:10AM June 2). A project proposal outlining
what question(s) you want to answer, how you intend to modify Linux
to address your question(s), and how you plan to evaluate your modifications.
I will provide feedback as quickly as I can to help you get started
well. If you get me a proposal sooner rather than at the deadline,
I'm likely to respond more quickly.
- If your project involves modification of the Linux code, turn in
code fragments showing the changes you made to complete
your task. Don't give me full printouts of .c or .h files. Just
show me your modifications
with a couple lines before and after your changes to give me some context.
Please label each code fragment with the path of the file being modified.
- A discussion of your results. This will typically be 2-5 pages,
depending on the goals of your project if you did a modify-Linux project.
(For an in-depth-study style of project, your discussion should be
on the order of 8-10 pages.) Your discussion
may also include data you have collected, displayed as appropriate to the situation.
Here are a few project ideas. Feel free to use or adapt one of these,
or come up with your own idea. Keep in mind that a thorough
study of any of these topics could easily go on for weeks or months, so you'll need
to stop before learning everything there is to know. If you have questions
about whether you are going deeply enough into your chosen topic, talk to me about it.
- Modify the scheduler. The existing Linux scheduler uses some
sort of hybrid scheme that is multi-core aware, attentive to priorities, etc.
There are many ways to modify a scheduling algorithm, including the ones discussed
in Tanenbaum. If you wanted to understand the scheduler better, you could devise
and test two or three modifications to it. For example, can you figure out how
to enforce a strict round-robin scheduling? If so, how does it change performance
under light load? Heavy load? CPU-bound programs? I/O-intensive programs? etc.
(And what do you measure for performance--average quantum length? turn-around time?
something else?)
- Modify page-replacement. Page replacement occurs when Linux
decides that physical memory is loaded enough to warrant discarding
or swapping to disk some lightly used pages. There are lots of page replacement
strategies discussed in Tanenbaum. You could find the page replacement code in
Linux, try a couple different replacement strategies, and measure the effects on
performance.
- Create memory system analysis tools. To help you study the
behavior of the memory system, you could write system calls to collect data about
virtual memory as the system runs. You could, for example, write system calls
named
startmemtracker()
and stopmemtracker(struct MemTrackerData *mtd)
, where
stopmemtracker(&memTrackerData);
would return whatever data you
collected via the memTrackerData parameter. Using this pair of system calls and a
suitably constructed MemTrackerData
struct, you might examine the
frequency of page faults, the percentage of memory accesses that are hits rather
than misses, the number of second-level page tables in use, etc.
- Create file system analysis tools. Similar to the previous
idea, but now you would want to collect information about the number of i-nodes,
free blocks, free i-nodes, wasted space in data blocks, etc.
- Study fork, exec, and wait. Describe in detail what
happens when a process forks, the child executes one of the versions
of exec(), and the parent executes one of the versions of wait(). What
resources get shared, as opposed to resources that are copied? Does some of the
exec'd program's file get loaded into memory, and if so, which portion? What happens
to open files? What happens to open network connections? Where does the child
process's page table come from? Where is the exit status of the child stored,
and how is it retrieved to prevent the survival of a "zombie process"? etc.
- Study or modify the buffer cache. File system implementations
typically maintain a
cache of blocks read from a hard drive. Here, your job is to investigate the Linux
buffer cache system, and describe how it works (or modify it and collect data on
the resulting changes in performance). If there's a general mechanism that is
file system-independent, go ahead and describe that. If the caching is
system-dependent, then pick a common file system and describe its caching mechanism.
As when investigating any caching system, you should find the answers to the
four questions asked by Patterson and Hennessy in the "Memory Hierarchies" chapter
of their textbook "Computer Organization & Design."
Also, look into how the contents of the buffer cache are shared (if at all) between
distinct processes that might have identical files open for reading or writing.
Device drivers. Pick one relatively simple device type
(e.g. mouse, keyboard, or hard drive) and investigate how the OS communicates
with the device in question. How, for example, do keystrokes get routed from
the physical device to the input buffer of the right process? What happens
when a process performs a write operation on a file stored on a hard drive?
How do motions of the mouse turn into a cursor that moves around the screen?
Alternatively, try modifying a driver. You might, for example, try to modify
the mouse driver in some way to give you control over cursor acceleration, or
reversing horizontal and vertical motion of the cursor, etc.
Add biometric measurements to the keyboard driver.
One kind of biometric measures patterns in the timing
of an individual typist's keystrokes. Studies (I really ought to have
a reference here) suggest that it is very hard
to mimic somebody else's keystroke timing. If this idea turns out to be
reliable, it could enable us to discard passwords and instead require
users to type some plain text to gain entry to their accounts.
For this project, you could modify the Linux keyboard driver
to collect simple keystroke timing information, such as
the mean and variance of the delays between keystrokes.
Many of my project ideas involve collecting statistics of some
kind as the kernel runs. How should you go about this? Suppose, for example,
that you want to collect statistics about quantum lengths before and
after a change to the scheduler. Here are some pieces you might want to include
in your system:
- Add a few global variables to the kernel to enable the scheduler to
collect the data you need collected. You'll want to provide declarations
of these variables (e.g. "extern int jondich_nQuanta;") in some
common .h file, and a definition (e.g. "int jondich_nQuanta;") in a single
.c file--probably the .c file where you define your system calls.
- Add a system call to activate the data collection process, possibly
restricted to a particular pid, and another system call to stop the data
collection and retrieve the collected data.
Have fun.