AlphaApollo: A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning.

It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines.

AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.

News

[2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
[2025.10] Our technical report is released; see here for details.

Installation

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Supported features

Agentic reasoning

Tool-integrated reasoning rollout with seamless environment interaction
Dynamic memory updates for multi-turn reasoning

Agentic learning

Multi-turn supervised fine-tuning (SFT)
Reinforcement learning algorithms: GRPO, PPO, DAPO, and more.

Agentic self-evolution

Multi-round, multi-model solution refinement with shared state
Iterative improvement via feedback and executable checks

Built-in tools

Python interpreter
Retrieval-Augmented Generation (RAG)
Web search

Quick-start recipes

Agentic reasoning

bash examples/generation/run_generation_informal_math_no_tool.sh # no-tool reasoning

bash examples/generation/run_generation_informal_math_tool.sh # tool-integrated reasaoning

Agentic learning

bash examples/sft/run_sft_informal_math_no_tool.sh # vallina SFT

bash examples/sft/run_sft_informal_math_tool.sh # multi-turn SFT

bash examples/grpo/run_grpo_informal_math_no_tool.sh # vallina GRPO

bash examples/grpo/run_grpo_informal_math_tool.sh # multi-turn GRPO

Agentic self-evolution

Before running the self-evolution scripts, make sure to serve the corresponding number of models.

python utils/ray_serve_llm.py --model_path <model_path> --gpus <gpus> --port <port> --model_id <model_id>
# python utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "4,5" --port 9876 --model_id "qwen3_4b_inst"

bash examples/evolving/run_vllm_informalmath_evolving.sh # single-model evolution

bash examples/evolving/run_vllm_informalmath_evolving_multi_models.sh # multi-model evolution

Code Structure

Informal Math Environment (Training):

Environment package in ./agent_system/environments/informal_math_training
Prompts in ./agent_system/environments/prompts/informal_math_training.py

Informal Math Environment (Evolving):

Environment package in ./agent_system/environments/informal_math_evolving
Prompts in ./agent_system/environments/prompts/informal_math_evolving.py

Tools (for reference)

Python Code implementation: ./tools/python_code.py
Local RAG implementation: ./tools/rag

Note: Before using the local RAG module, please follow the instructions in tools/rag/README.md to set up the required environment.

Acknowledgement

AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.

Cite

If you find AlphaApollo useful in your research, please consider citing our work:

@article{zhou2025alphaapollo,
  title={AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning},
  author={Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Xu, Linrui and Cheng, Tian and others},
  journal={arXiv preprint arXiv:2510.06261},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agent_system		agent_system
examples		examples
tools		tools
utils		utils
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
installation.sh		installation.sh
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaApollo: A System for Deep Agentic Reasoning

News

Installation

Supported features

Agentic reasoning

Agentic learning

Agentic self-evolution

Built-in tools

Quick-start recipes

Agentic reasoning

Agentic learning

Agentic self-evolution

Code Structure

Informal Math Environment (Training):

Informal Math Environment (Evolving):

Tools (for reference)

Acknowledgement

Cite

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

tmlr-group/AlphaApollo

Folders and files

Latest commit

History

Repository files navigation

AlphaApollo: A System for Deep Agentic Reasoning

News

Installation

Supported features

Agentic reasoning

Agentic learning

Agentic self-evolution

Built-in tools

Quick-start recipes

Agentic reasoning

Agentic learning

Agentic self-evolution

Code Structure

Informal Math Environment (Training):

Informal Math Environment (Evolving):

Tools (for reference)

Acknowledgement

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages