Machine learning to determine the political stance on online news articles
We used the Python scikit-learn package to create a machine learning model to classify the political stances of news articles as eiher Left, Right, or Center-leaning.
We used the TfidfTransformer
to extract TF-IDF
features from our article text data,
and trained several of sklearn's built in models including LinearSVC
,
MLPClassifier
, and RandomForestClassifier
. We ended up using
the LinearSVC
model because it gave us the best performace, with the following
evaluation metrics.
Label | Recall | Precision |
---|---|---|
L | 0.83 | 0.90 |
C | 0.87 | 0.85 |
R | 0.92 | 0.82 |
The dataset we used to train our models was gathered from a website called allsides.com. Allsides provides readers with different news articles from accross the political spectrum, so we were able to scrape the text from almost 40,000 articles along with their political stance labels in order to train our supervised machine learning models.
We also experimented with a few more advanced deep learning models including Google's BERT model. However, we decided not to go with these models as they acheived similar performace to the sklearn classifiers, but were far more computationally expensive.