CS 395 Assignment #1: Getting Started


For this assignment, you will explore a dataset to see what you can uncover.

I have placed a copy of the dataset titled "census-income" in the directory /Accounts/courses/cs395/dmusican/census-income.data. A dictionary for this dataset can be found in the file census-income.names, in the same directory. Your goal is to learn everything that you can about the dataset. Answer the following questions as a starting point, but you should dig further. What more can you discover?

Starting questions:

How many records are there?
How many features are there?
How many features are continuous, and how many are nominal?
For the continuous features, what are the average, median, maximum, and minimum values? What is the variance?
For the continuous features, use Mathematica or some other plotting tool to make 2-dimensional scatter plots of two features at a time. What relationships can you find?

You should submit a paper document describing what you discover.

The following is an example of a session with Octave to make a
scatterplot.

Here is my data file, called "example.data":

# name: example
# type: matrix
# rows: 5
# columns: 2
1 1
2 3
3 4
5 6
2 9


Here is my Octave session:

prism> octave
GNU Octave, version 2.1.35 (i386-redhat-linux-gnu).
Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 John W. Eaton.
This is free software with ABSOLUTELY NO WARRANTY.
For details, type `warranty'.

*** This is a development version of Octave.  Development releases
*** are provided for people who want to help test, debug, and improve
*** Octave.
***
*** If you want a stable, well-tested version of Octave, you should be
*** using one of the stable releases (when this development release
*** was made, the latest stable version was 2.0.16).

octave:1> load -force -ascii "example.data"
octave:2> plot(example(:,1),example(:,2),'k@')

To print your plot:

octave:3> gset term postscript
octave:4> gset output "output.ps"
octave:5> replot

This replots your plot, but dumps it to a postscript file called
"output.ps". You can then go out to a Linux prompt and type "lpr
output.ps" to dump it to the printer.

To rewire Octave to go the screen again:

octave:6> gset term X11
octave:7> replot      


Alternatively, if your postscript files are coming out too large, you can instead dump your plot to a png file:

octave:8> gset term png
octave:9> gset output "output.png"
octave:10> replot

You can then open output.png in Mozilla, and print it out from there.