Carleton Comps Project: Face-Tracking

Final Results

Calm Prehensile Flex Your Sighs

Advisor: David Liben-Nowell

Background

Social networks, the electrical grid, route planning on the road or in the air, patterns of transmission of disease, matching of Carleton classes to classrooms: a multitude of real-world phenomena are fruitfully modeled using graphs. In the last couple of years, Julia Strand (Carleton Psychology) and I have begun a research collaboration on spoken-word recognition. When you're hanging out with a friend in Sayles during 3A on Wednesday in the middle of term, the fact that you're able to communicate at all is pretty remarkable. It's noisy, and there are a lot of different words in English; that you can figure out whether your friend described Sean Spicer as "bald" or "appalled" is a real feat. And word recognition is another real-world phenomenon that's proven amenable to graph-theoretic modeling.

Here's the graph G that's usually used in this context: there's one node for every word in the lexicon (i.e., in the English language), and there's an edge between any two words whose pronunciations are within a single "edit" (insertion, deletion, or substitution) of each other. Of course, the graph G is necessarily dependent on a particular lexicon---that is, some particular set of words. But you might know more words than I do (is "fleek" a word? or "nerts"?), and we might pronounce the words we do know differently (I understand that "pajamas" rhymes with "Oxford commas," but you might be confused about that).

The Project

In this project, we will consider a wide range of graph-theoretic measures, such as a collection of different ways of measuring the "centrality" of each node in a graph, including many properties that have been used in the word-recognition context. We will explore a fundamental question of robustness for each of these measures: to what extent are these measures robust to changes in lexicon size and composition?

This project will be both empirical and theoretical. Empirically, we will assess the stability of various network-based predictions for particular words based on networks from several different lexicons. Theoretically, we will explore the stability of various node-level graph-theoretic quantities with respect to certain types of perturbations of the graph.

In this project, you can expect to:

read a large number of recently published papers, both in the network science literature (e.g., Tsugawa & Ohsaki, 2015; Wei et al, 2015) and the literature on spoken-word recognition (e.g., Chan & Vitevitch, 2015; Siew, 2017).
assess the robustness of particular graph-theoretic measures, both theoretically (prove a theorem bounding the amount of change? develop an algorithm to maximally impact that measure with a small change to G?) and empirically (implement algorithms to compute these measures, and test how they change as G varies).
write a (substantial) paper describing what you've found.

Good News, Bad News, and Recommended Background

This is very much a research-style comps project. The good news: if all goes well, you'll be exploring and answering some questions that have never been answered before, by anyone; if things go very well, there's a possibility that this project might lead to a publishable paper. The bad news: research can be frustrating, filled with false starts, demanding of your time at incredibly inopportune moments, and may fail to lead to interesting results at all.

This will be an algorithmically intensive project; you must have already completed CS 252 (Algorithms) if you wish to be part of this project. Experience with basic linguistics (especially phonetics and phonology), such as knowing the International Phonetic Alphabet, is helpful but not required.

References (just for the flavor!)

Chan, K. Y. & Vitevitch, M.S. The influence of neighborhood density on the recognition of Spanish-accented words. Journal of Experimental Psychology: Human Perception and Performance, 41, 69–85, 2015.
Siew, C. S. Q. The influence of 2-hop network density on spoken word recognition. Psychonomic Bulletin & Review, Volume 24, Issue 2, pp. 496–502.
Tsugawa S., Ohsaki H.Analysis of the Robustness of Degree Centrality against Random Errors in Graphs. In: Mangioni G., Simini F., Uzzo S., Wang D. (eds) Complex Networks VI. Studies in Computational Intelligence, vol 597. Springer, 2015.
Wei, W., Joseph, K., Liu, H., & Carley, K. M. The fragility of Twitter social networks against suspended users. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015 (pp. 9-16).