Course Overview  
 
 

Syllabus

About Perl:
Perl is a powerful and unique interpreted language specifically suited for text processing. It is used across the world for CGI applications and system administration tasks. It has incorporated many features from various languages including dynamic typing, objects, and first-class functions (closures). The versatility extends into the syntactic structure as well, allowing one to write code that is most natural for him/her.

Students will learn the basics of programming in Perl through writing weekly programs and readings about the style and structure of the language. Much of the focus will be on processing text using regular expressions. The final project will involve writing a web crawler to traverse a local repository of web pages.

Week 1:
Week one will focus on introducing students to Perl and getting them ready for many of the idiosyncrasies of Perl. This will include values that Perl considers to be false (there is no boolean data type), lack of an integer type, dynamic arrays, namespace, scalar vs. list contexts, and default variables.

Week 2:
Week two will focus on subroutines (functions) and file handles. Students will also learn different ways of reading input from a file and the advantages of each method.

Week 3:
Week three will cover hashes (associative arrays) and a quick introduction to regular expressions (more on these next week). Students will learn about character classes and simple quantifiers before getting a chance to write some simple regular expressions

Week 4:
Week four goes much more into depth on regular expressions and the engine that supports them. They will learn about option modifiers, text anchors, match variables, more quantifiers, precedence within a regular expression, and greediness.

Week 5:
Week five will explain additional control structures, expression modifiers, loop controls, and advanced sorting techniques, this allows you to specify the ordering without having to write your own sorting algorithm.

Week 6:
Week six will give students a cursory overview of modules, file tests, and directory operations.

Week 7:
Week seven will cover some advanced topics including references. Students will learn how to create complex data structures as well as how memory is managed internally using reference counting.

Week 8:
Week eight will involve reading several papers about strategies for crawling the web as well the importance of downloading high quality pages early in the process (and what constitutes a 'high-quality' page). Students will also learn about ethical crawling practices.

Final:
The final project will involve writing a web crawler to crawl a local repository of web pages. This will avoid potential issues with network activity as the code is developed as well as allowing for a more controlled test environment. Students will not need to know any network programming but much of the practice of writing a real-world web crawler will be the same.


Files to Be Downloaded