Author: Samuel Aboderin Status: Complete
In this research project, I explore the problem of Steering in multi-agent systems. The fundamental question I address is:
How can we guide an autonomous agent to exhibit a specific behavior (distribution) without explicitly programming its path, but rather by designing the incentives (rewards) it responds to?
This is a classic problem in Inverse Reinforcement Learning (IRL) and Mechanism Design. By treating the reward function as a control variable, I implement an optimization loop that "learns" the necessary incentives to align the agent's self-interest with a global target objective.
Standard Reinforcement Learning asks: "Given a reward, what is the optimal behavior?"
My project asks the reverse: "Given a desired behavior, what is the optimal reward?"
This is critical for AI Alignment and Safe RL. We often know what we want the system to look like (e.g., "traffic should flow smoothly," "the robot should stay in the safe zone"), but we don't know the exact reward points to assign to every state to make that happen.
I implemented a Steering Optimizer that performs the following loop:
- Observe: Check where the agent currently spends its time (Current Occupancy).
- Compare: Calculate the difference (Error) between the agent's location and the Target Distribution.
- Adjust: Update the reward map. If the agent is visiting a state less than desired, increase the reward there. If more, decrease it.
- Repeat: The agent re-optimizes its policy based on the new rewards.
I tested the algorithm on a GridWorld environment. The goal was to steer the agent to concentrate in a specific Gaussian region (top-right corner).
- Left (Target Distribution): This is the goal. The yellow region represents where I want the agent to be.
- Middle (Achieved Occupancy): This is the actual behavior of the agent after my optimization algorithm finished. As you can see, it matches the target almost perfectly.
- Right (Learned Rewards): This is the "solution" found by the algorithm. The Red areas represent high rewards (incentives) that attract the agent, while Blue areas represent low rewards (disincentives).
The codebase is structured to be modular and readable for fellow researchers.
src/env.py: Implements the Markov Decision Process (MDP) dynamics.src/agent.py: Implements a Soft (Maximum Entropy) Agent. I chose a "soft" agent because it provides a differentiable mapping from rewards to occupancy, which is essential for the gradient-based optimization.src/steering.py: Contains theSteeringOptimizerclass, which performs the gradient descent on the reward function.
To reproduce my results:
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Interactive Notebook: Open
notebooks/demo_steering.ipynb. This notebook contains the full experiment pipeline and generates the visualizations shown above. -
Run the Headless Script: For a quick verification of the convergence metrics:
python run_demo.py
This framework lays the groundwork for more complex steering tasks, such as:
- Multi-Agent Steering: Guiding multiple interacting agents to avoid congestion.
- Constraint Satisfaction: Steering agents while ensuring they avoid unsafe regions (obstacles).
