Social networks, the electrical grid, route planning on the road or in the air, patterns of transmission of disease, matching of Carleton classes to classrooms: a multitude of real-world phenomena are fruitfully modeled using graphs. In the last couple of years, Julia Strand (Carleton Psychology) and I have begun a research collaboration on spoken-word recognition. When you're hanging out with a friend in Sayles during 3A on Wednesday in the middle of term, the fact that you're able to communicate at all is pretty remarkable. It's noisy, and there are a lot of different words in English; that you can figure out whether your friend described Sean Spicer as "bald" or "appalled" is a real feat. And word recognition is another real-world phenomenon that's proven amenable to graph-theoretic modeling.
Here's the graph G that's usually used in this context: there's one node for every word in the lexicon (i.e., in the English language), and there's an edge between any two words whose pronunciations are within a single "edit" (insertion, deletion, or substitution) of each other. Of course, the graph G is necessarily dependent on a particular lexicon---that is, some particular set of words. But you might know more words than I do (is "fleek" a word? or "nerts"?), and we might pronounce the words we do know differently (I understand that "pajamas" rhymes with "Oxford commas," but you might be confused about that).
In this project, we will consider a wide range of graph-theoretic measures, such as a collection of different ways of measuring the "centrality" of each node in a graph, including many properties that have been used in the word-recognition context. We will explore a fundamental question of robustness for each of these measures: to what extent are these measures robust to changes in lexicon size and composition?
This project will be both empirical and theoretical. Empirically, we will assess the stability of various network-based predictions for particular words based on networks from several different lexicons. Theoretically, we will explore the stability of various node-level graph-theoretic quantities with respect to certain types of perturbations of the graph.
In this project, you can expect to:
This is very much a research-style comps project. The good news: if all goes well, you'll be exploring and answering some questions that have never been answered before, by anyone; if things go very well, there's a possibility that this project might lead to a publishable paper. The bad news: research can be frustrating, filled with false starts, demanding of your time at incredibly inopportune moments, and may fail to lead to interesting results at all.
This will be an algorithmically intensive project; you must have already completed CS 252 (Algorithms) if you wish to be part of this project. Experience with basic linguistics (especially phonetics and phonology), such as knowing the International Phonetic Alphabet, is helpful but not required.