News Classification

Machine learning to determine the political stance on online news articles

scikit-learn

We used the Python scikit-learn package to create a machine learning model to classify the political stances of news articles as eiher Left, Right, or Center-leaning.

We used the TfidfTransformer to extract TF-IDF features from our article text data, and trained several of sklearn's built in models including LinearSVC, MLPClassifier, and RandomForestClassifier. We ended up using the LinearSVC model because it gave us the best performace, with the following evaluation metrics.

Label Recall Precision
L 0.83 0.90
C 0.87 0.85
R 0.92 0.82

Allsides.com

The dataset we used to train our models was gathered from a website called allsides.com. Allsides provides readers with different news articles from accross the political spectrum, so we were able to scrape the text from almost 40,000 articles along with their political stance labels in order to train our supervised machine learning models.

Deep Learning

We also experimented with a few more advanced deep learning models including Google's BERT model. However, we decided not to go with these models as they acheived similar performace to the sklearn classifiers, but were far more computationally expensive.