A Multithreaded Web Server
Table of Contents
This is a pair programming assignment. If you are on a team, this means that you and your partner should be doing the entirety of this assignment side-by-side, on a single computer, where one person is "driving" and the other is "navigating." Take turns every so often who is driving; you should each spend approximately 50% of the time driving.
Introduction
For this project, you will implement a multithreaded web server. This project is designed to give you some practice writing client-server socket programs and writing multithreaded programs, as well as familiarizing you with the HTTP protocol.
Part 1: Sockets
Sockets allow computers to communicate over a network connection. A socket is similar to a file handle, except instead of sending data to and from disk, it sends it over the network.
Setup
If you are running on your own computer, download this zip
file and unzip it somewhere. This is a scrape from an old version of the
department website, downloaded a couple of years ago. This is the website that
your web server will ultimately serve code for. These files are
also available on the lab machines in /Accounts/courses/cs348w16/website
, so you
don't need to unzip them anywhere if you're working in the lab.
Initial socket code
Go through the Java tutorial on sockets, especially the knock-knock joke example. Get that code running, and ask any questions you have about it.
Your task
Modify the above knock-knock code it so that when it tells a knock-knock joke, the punchline is the HTML code for one of the pages in the department website. (I admit, HTML is not very funny as a joke, but this will help you get your webserver going.) The location for the website should be specified on the command line when running the server, so we should be able to test your code in two different terminal windows as follows:
javac KnockKnockServer.java java KnockKnockServer rootDir
javac KnockKnockClient.java java KnockKnockClient localhost:8888
The command line argument rootDir
to the server should indicate the
directory where the website is being stored, and the command line argument to
the client should indicate the URL for the server. If you are running them both
on the same computer, localhost
will do that. Your server should list for
connections on port 8888.
Note that compiling KnockKnockWebServer.java will compile any dependent files as well, so you can use as many files as you need.
Make sure that you pay attention to the software licenses on the sample code you download, and observe them accordingly.
Part 2: Simplistic Single-Threaded Web Server
For this portion of the assignment, you should transform your knock-knock server into an HTTP server. It should be able to serve up pages that are viewable in a web browser; this is a fully-functioning (though limited in capability) web server.
When your work is complete, we should be able to test it by running:
javac WebServer.java java WebServer rootDir
The two aspects of this assignment that you do not need to implement yet are:
- Multithreading. That will happen later.
- More complicated HTTP requests. For this portion of the assignment, you only need to make simple GET requests work for a specific filename.
When your work is complete, we should be able to test it by running:
javac WebServer.java java WebServer rootDir
We should then be able to start a browser on the same compter, visit
localhost:8888/index.html
. Your browser should then display at least a portion
of the Carleton Computer Science Department home page. Only the front page was
downloaded, so the links probably won't work! Here's a screenshot showing
something of what you should expect to see.
HTTP
Read HTTP Made Really Easy by Jim Marshall. At a minimum, read it carefully through the section titled "Sample HTTP Exchange."
One important detail involves whitespace characters. It is very important that
you can interpret the format of a client request correctly, and that you send
correctly formatted responses to clients. Many parts of a correctly formatted
message involve sequences of carriage return and newline characters (i.e.,
\r\n
). These are used to signify the end of all or part of a "message".
Here is the general format of a server request:
initial line Header1: value1 Header2: value2 Header3: value3 (optional message body goes here)
For example, a GET response for a very simple page may look like:
HTTP/1.1 200 OK Date: Sun, 10 Feb 2013 18:17:43 GMT Content-Type: text/html Content-Length: 54 <html><body> <h1>CS 348 Test Page</h1> </body></html>
It is very important that each header line ends with a \r\n
and that there
is a blank line (another \r\n
) between the headers and the message body.
The message body, however is sent without a trailing \r\n
. Instead the
header Content-Length
is used to tell the client the size of the message
body.
Web clients
There are many ways of testing your web server, and you may find some of them useful:
telnet server port_num
, then type in a GET command (make sure to enter a blank line after the GET command). For example:$ telnet localhost 8888 GET /index.html HTTP/1.0
telnet will exit when it detects that your web server has closed its end of the socket (or you can kill it with ctrl-C, or if that doesn't work use kill or killall:
killall telnet
.Firefox/Chrome: Enter the URL of the desired page specifying your web server using its IP:port_num (e.g. http://137.22.4.77:8888/index.php). You can also just use
localhost
or the host name on our system:localhost:8888/index.php
wget:
wget -v localhost:8888/index.html
wget copies the html file returned by your web server into a file with a matching name (index.html) in the directory from which you call wget.
- Your client program from part 1, or some modification of it. This might be useful if you want to inspect the data received over the socket more closely, or test your server's response to broken requests.
Transmitting a CSS file
Some browsers (e.g., Chrome, Safari, possibly others) won't properly render a
website with CSS files unless the HTTP responses contain text/css
as the
Content-type
in the header. The only reliable way I could find to do this was to
check the file extension of the file being requested. If it ends in .css
, then
I set the Content-Type
field accordingly.
Reading a file (HTML, JPG, whatever) and transmitting it via socket
A common task that you'll need to do is be able to read a file that has been requested, and as part of your HTTP response, transmit it back to the requesting client. Reading the file as straight text, line-by-line, may work for HTML, but won't work for images. Furthermore, if you do a line-by-line read, you may be changing the newlines in the file you transmit. Admittedly, that may not change how the browser renders the page, but your server is still inappropriately changing the structure of the file it has been asked to transmit.
Here is (part of) the approach I used for reading a file in pure binary and transfering it via a socket. I was inspired by this StackOverflow posting. This code is intentionally incomplete, and might even be incorrect. (I copied lines out of my solution without verifying that they run on their own.)
OutputStream os = socket.getOutputStream(); InputStream input = Files.newInputStream(path); ByteArrayOutputStream buffer = new ByteArrayOutputStream(); byte[] data = new byte[1024]; int totRead = 0; while ((numRead = input.read(data,0,data.length)) != -1) { totRead += numRead; buffer.write(data,0,numRead); } String response = "Content length: "; // not even close to complete for (Byte b : response.getBytes()) { os.write(b); } buffer.writeTo(os);
Part 3: Simplistic Multithreaded Web Server
For this part, you should extend part 2 to be multithreaded. Multiple web browsers (or browser window/tabs) connecting to the server at the same time should launch multiple threads in your server. The knock-knock joke example provided above has a section at the end called "Supporting Multiple Clients," which provides more sample code on going multithreaded. You are welcome to use that as a starting point as well; again, observe the software license that is provided.
When your work is complete, we should be able to test it by running:
javac WebServer.java java WebServer maxConnections rootDir
maxConnections
is an integer greater than or equal to
1 representing the maximum number of client connections (i.e., the number of
threads serving web pages) allowed at any given
time.
Web server structure
The basic design of your web server should be the following:
- Create a server socket on port 8888.
- Enter an infinite loop:
- Accept the next connection.
- If there are already max connections, kill the oldest thread by closing
its client socket. This will cause the worker thread to receive a
SocketException
the next time it tries to read from or write to the socket. The worker thread should then exit. - Create a new thread to handle the new client's connection, passing it the client socket returned by accept.
- The main server thread should exit only if it encounters an
IOException
.
The worker thread's run()
method should be an infinite loop that only exits if
it encounters an IOException
or if the socket is closed by the main server
thread or by the client. Otherwise, the worker threads continue to handle HTTP
requests from the client.
Remember that connections can also be closed on the client-side. In this case the associated worker thread on the server should detect that the socket was closed, clean up any shared state, and exit.
If your solution requires any use of shared state among threads, make sure to use synchronization to coordinate the accesses to this shared state.
Part 4: Better Multi-Threaded Web Server that Handles More HTTP
This part adds to the capability of your web server, adding more parts of the HTTP 1.1 protocol. Specifically:
- Your server must handle GET and HEAD client requests. It does not need to handle POST nor any other requests.
- It should return appropriate status codes, including 200, 400, 403, and 404. If the server returns an error code to a client, it should also return headers and a message body with a simple error page. For example:
<html><body>Not Found</body></html>
- It should support the headers
Content-Length
,Content-Type
, andDate
. - It does not need to handle any PHP or JavaScript parsing.
- It should handle paths that start with
/
. It does not need to handle paths that start with a username, such as/~username/
.
GET requests and mapping URLs to files
Directory names in URLs correspond to files named either index.html or index.htm in the named directory. Your web server should first look for a file named index.html, and if that doesn't exist, look for index.htm.
Here are some example GET requests that you need to handle, and their corresponding filename(s):
GET / HTTP/1.1 /rootDir/index.html or /rootDir/index.htm GET /index.html HTTP/1.1 /rootDir/index.html GET /index.htm HTTP/1.1 /rootDir/index.htm GET /search.html HTTP/1.1 /rootDir/search.html GET /cat.jpg HTTP/1.1 /rootDir/cat.jpg GET /courses/ HTTP/1.1 /rootDir/courses/index.html /rootDir/courses/index.php
You do not need to correctly handle GET requests of the following format (i.e. GET requests with no trailing '/' when the last name corresponds to a directory):
GET /courses HTTP/1.1
We won't test this case.
Useful Resources
- HTTP Made Really Easy
- HTTP 1.0 Specification
- HTTP 1.1 Specification
- Java Tutorial on Socket Programming
- Useful classes in the Java standard library:
Submission
For each part, you should submit two files to Moodle, the first one of which should be anonymized:
- A zip file containing your Java code.
- Citations in a text file
credits.txt
There is no writeup required for this assignment.