Building a Conversational Bot Using a Novel Neural Network Architecture

Soren DeHaan, Elliot Heyman, Silas Rhyneer, Carl Tankersley

Introduction

Our journey started with a single paper. The specific idea that had us intrigued was the concept of adding metadata into a recurrent neural network. Recurrent neural networks are relatively easy to create, but what would be an example of a recurrent neural network that would benefit by adding extra information to them?

App Reviews with the Basic Architecture

Our first goal after selecting a paper was to get a basic neural network up and running, which we were able to do relatively quickly and easily with a tutorial that we found. With slight modifications, we got it working on our app review and developer response data set, and quickly had a proof of concept neural network up and running. However, while the network worked fine in a technical sense, we weren't getting the quality and variety of results that we wanted. We noticed that the vast majority of the developer responses which we were training our network on were simply copied and pasted templates, so our network was only learning to reproduce these templates. We realized that we needed another dataset that had more variety.

Scaling to Reddit

Since we had terrible data coming from our app reviews, we decided to switch to a bigger dataset—all of Reddit's comment history from 2015. With bigger data came new issues. We had to compress files, write new functions to normalize the data, and clean our data in new ways. With all of this new complexity, we realized we needed a better pipeline for our network so that we could point our network at a new dataset and have it run. We put together a template for config files and made our code much more robust so that we only had to alter a few values in a JSON file to change the behavior of our network.

Including Metadata

With a larger dataset and no more errors, we started including metadata. There were several ways we could have included this data in our network but we settled on concatenating it to every hidden layer in the network as the decoder iterated forward. That way the metadata was always accessible and never faded. Although this changed throughout the project, we eventually only included sentiment analysis and upvotes our training data.

Improving Performance

Now that the neural network was technically functional, the next step was to make it perform better. This mainly consisted of hyperparameter tuning. The hyperparameter we found to have the biggest impact was the teacher forcing ratio. The data we trained on (Reddit) had multiple 'correct' responses for each input, which by default led to the network constructing some amalgamation of these correct responses which never looked good. Increasing the teacher forcing ratio helped the network recognize multiple 'correct' responses.

Results

Here is a demo of our bot's hypothetical response to about 9,000 Reddit comments. Comments labeled as “input” come from randomly selected Reddit comments, and comments labeled as “response” are our bot's response to the input above it. The comments have no relation to each other. Disclaimer: these sample responses are not representative of our beliefs; they are just what the bot has learned from Reddit.

Github Repo

Link to repository