Steering Markovian Agents: Optimization Under Uncertainty

Author: Samuel Aboderin Status: Complete

1. Project Overview

In this research project, I explore the problem of Steering in multi-agent systems. The fundamental question I address is:

How can we guide an autonomous agent to exhibit a specific behavior (distribution) without explicitly programming its path, but rather by designing the incentives (rewards) it responds to?

This is a classic problem in Inverse Reinforcement Learning (IRL) and Mechanism Design. By treating the reward function as a control variable, I implement an optimization loop that "learns" the necessary incentives to align the agent's self-interest with a global target objective.

2. The "Why" and "How"

The Challenge

Standard Reinforcement Learning asks: "Given a reward, what is the optimal behavior?"
My project asks the reverse: "Given a desired behavior, what is the optimal reward?"

This is critical for AI Alignment and Safe RL. We often know what we want the system to look like (e.g., "traffic should flow smoothly," "the robot should stay in the safe zone"), but we don't know the exact reward points to assign to every state to make that happen.

The Solution: Gradient-Based Steering

I implemented a Steering Optimizer that performs the following loop:

Observe: Check where the agent currently spends its time (Current Occupancy).
Compare: Calculate the difference (Error) between the agent's location and the Target Distribution.
Adjust: Update the reward map. If the agent is visiting a state less than desired, increase the reward there. If more, decrease it.
Repeat: The agent re-optimizes its policy based on the new rewards.

3. Results and Visualization

I tested the algorithm on a GridWorld environment. The goal was to steer the agent to concentrate in a specific Gaussian region (top-right corner).

Left (Target Distribution): This is the goal. The yellow region represents where I want the agent to be.
Middle (Achieved Occupancy): This is the actual behavior of the agent after my optimization algorithm finished. As you can see, it matches the target almost perfectly.
Right (Learned Rewards): This is the "solution" found by the algorithm. The Red areas represent high rewards (incentives) that attract the agent, while Blue areas represent low rewards (disincentives).

4. Technical Implementation

The codebase is structured to be modular and readable for fellow researchers.

src/env.py: Implements the Markov Decision Process (MDP) dynamics.
src/agent.py: Implements a Soft (Maximum Entropy) Agent. I chose a "soft" agent because it provides a differentiable mapping from rewards to occupancy, which is essential for the gradient-based optimization.
src/steering.py: Contains the SteeringOptimizer class, which performs the gradient descent on the reward function.

5. Running the Research Code

To reproduce my results:

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Interactive Notebook: Open notebooks/demo_steering.ipynb. This notebook contains the full experiment pipeline and generates the visualizations shown above.
Run the Headless Script: For a quick verification of the convergence metrics:
```
python run_demo.py
```

6. Future Directions

This framework lays the groundwork for more complex steering tasks, such as:

Multi-Agent Steering: Guiding multiple interacting agents to avoid congestion.
Constraint Satisfaction: Steering agents while ensuring they avoid unsafe regions (obstacles).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steering Markovian Agents: Optimization Under Uncertainty

1. Project Overview

2. The "Why" and "How"

The Challenge

The Solution: Gradient-Based Steering

3. Results and Visualization

4. Technical Implementation

5. Running the Research Code

6. Future Directions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_demo.py		run_demo.py

aboderinsamuel/Telos

Folders and files

Latest commit

History

Repository files navigation

Steering Markovian Agents: Optimization Under Uncertainty

1. Project Overview

2. The "Why" and "How"

The Challenge

The Solution: Gradient-Based Steering

3. Results and Visualization

4. Technical Implementation

5. Running the Research Code

6. Future Directions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages