ALL OUR DATA

About πŸ“Œ

β€œAll Our Data” is a project exploring data, data privacy, and the data brokerage industry. The data brokerage industry includes larger advertisements, political persuasion, and manipulation and discrimination for profit. On the other hand, sharing of data also has its benefits in scientific advancement and contact tracing to prevent viral spread. While you sometimes have the option to view data being collected on you, the data is often obfuscated, incomplete, or otherwise unclear, which leads to a serious lack of transparency. Furthermore, the mechanisms of this data brokering process (who is doing it, what are they collecting, how are they collecting it, and how is the collected data used) are largely unknown to the everyday user, and as the industry remains intentionally opaque. Recently, laws have begun attempts to regain some transparency about data collected (e.g., California Consumer Act, Vermont 2018, etc.), but are these sufficient? Our goal is to learn about the data brokerage industry, the data that is collected, what data is collected, who ultimately gets our data and how, and how our data is used (or misused). What are the benefits and downsides of data brokering, who is targeted, and how can the data being collected on us be made more accessible, transparent, or easier to understand?

View Our Full Presentation Here

View Our 3 Minute Trailer ("Flash Talk") Here

Experimentation πŸ§ͺ

Goals πŸ“‹

Our primary questions of interest are (1) who is collecting our data, (2) what are they collecting, (3) how do they collect data, (4) how is our collected data being used, (5) how can we increase transparency about the data collected from us, and (6) how can we simulate targeted advertising? We will conduct our research by constructing digital personas with the goal of emulating real personalities with social media accounts (Google, Quora, YouTube, etc.). We construct these digital personas with control variables and simulate realistic user behavior. Finally, we receive advertisements served to our respective personas, and analyze differences between experiment and control group personas. In pursuit of learning more about where our data goes, we investigated trackers on popular websites and apps to analyze where our data is sent.

Tools πŸ”¨

To perform these experiments, we used a variety of software and coding tools (e.g., R, Python3). We also used AdFisher, an open-source, web automation, data collection tool designed by researchers at Carnegie Mellon, and adapted for our experiments. Finally we used MITMproxy, an open source proxy, packet interception tool and Blacklight, a real-time web privacy inspector and tracker finder.

Research πŸ“Š

We used our experimental framework to answer questions about targeted advertising relating to mental health, political affiliation, and data leakage.

See how advertisers target your:

Team Members πŸ‘₯

Aishee Mukherji, Cole Hanson, Eric Odoom, Jeffrey Boitnott, Yasmeen Awad

Computer Science, Class of 2021