Assignment #1: Getting Started

For this assignment, you will explore a dataset to see what you can uncover.

I have placed a copy of the dataset titled "census-income" in the directory /Accounts/courses/cs377/census-income/census-income.data. A dictionary for this dataset can be found in the file census-income.names, in the same directory. Your goal is to learn everything that you can about the dataset. Answer the following questions as a starting point, but you should dig further. What more can you discover? What is the most interesting and surprising thing that you can dig up?

Starting questions:

You should submit a paper document describing what you discover.

Important Warnings

Sample Octave session

The following is an example of a session with Octave to make a scatterplot.


Here is my data file, called "example.mat":

# name: example
# type: matrix
# rows: 5
# columns: 2
1 1
2 3
3 4
5 6
2 9


Here is my Octave session:

prism> octave
GNU Octave, version 2.1.35 (i386-redhat-linux-gnu).
Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 John W. Eaton.
This is free software with ABSOLUTELY NO WARRANTY.
For details, type `warranty'.

*** This is a development version of Octave.  Development releases
*** are provided for people who want to help test, debug, and improve
*** Octave.
***
*** If you want a stable, well-tested version of Octave, you should be
*** using one of the stable releases (when this development release
*** was made, the latest stable version was 2.0.16).

octave:1> load -force -ascii "example.mat"
octave:2> plot(example(:,1),example(:,2),'k@')

To print your plot:

octave:3> gset term postscript
octave:4> gset output "output.ps"
octave:5> replot

This replots your plot, but dumps it to a postscript file called "output.ps". You can then go out to a Linux prompt and type "lpr output.ps" to dump it to the printer.

To rewire Octave to go the screen again:

octave:6> gset term X11
octave:7> replot      


Alternatively, if your postscript files are coming out too large, you can instead dump your plot to a png file:

octave:8> gset term png
octave:9> gset output "output.png"
octave:10> replot

You can then open output.png in Mozilla, and print it out from there.