2023–24 Projects:
Advisor: Anna Rafferty
Times: Winter 4a
Imagine you want to make a robot that can place dishes in a dishwasher. It might learn how to do this by trying lots of different ways of manipulating the dishes. Some things it might try could lead to the dishes not getting washed very well - for instance, if all the plates are directly on top of each other. Other things that it might try could lead to dishes actually breaking - say if it tries to put plates into the dishwasher by dropping them from several feet above their desired location. The latter situation is in some sense worse: breaking dishes is monetarily costly, and the dish fragments could harm nearby people or even the robot.
Learning by trying out different possible actions and observing how successful they are is a type of learning known as reinforcement learning. A subfield of reinforcement learning focuses on safe reinforcement learning: ensuring that while the agent is learning, it doesn’t take actions that can have catastrophic consequences. Doing this is hard - in some sense, it requires avoiding things that are dangerous without necessarily knowing which things are dangerous!
A number of different approaches have been developed. One, for instance, suggests that the agent should "imagine into the near future" so that it can consider what might happen and avoid it (Thomas, Luo, & Ma 2021). Below, an image from that paper shows that in the top scenario, it's too late for the top car to brake and avoid hitting the emoji pedestrian, but in the bottom image, the car still has enough time to act, if it can only recognize that it needs to do so by "imagining" what might happen in several seconds:
Safe reinforcement learning is valuable in a whole range of contexts, from adjusting HVAC control to make greener buildings to deploying robots in inhospitable terrain like the deep sea.
In this project, you’ll be learning about reinforcement learning in general as well as safe reinforcement learning. Your goal will be to use these algorithms in a simulator to better understand how they work and examine the consequences of different approaches for ensuring safety. You’ll also think about where these methods might and might not be appropriate for real-world deployment outside of a simulator.
The progression of the project will look something like the following, where different parts of this progression might be split among different team members:
Depending on the group’s interests and skill sets, you might spend more time using existing libraries that implement some of these safe reinforcement learning methods and understanding the consequences of different approaches, or you might spend more time on implementing particular algorithms.
Having taken CS321: Making Decisions with AI may be helpful background, as could CS320: Machine Learning, but neither is necessary. Willingness to engage with mathematics will also be helpful.
I’ll provide additional references when we start the project, but here are some relevant papers: