AlphaApollo is an agentic reasoning framework that integrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning.
It supports a wide range of agentic reasoning paradigms, including tool-integrated reasoning, agentic post-training (multi-turn SFT and reinforcement learning), and agentic self-evolution. AlphaApollo incorporates multiple post-training algorithms such as PPO, GRPO, and DAPO, and provides dataset-backed agentic evaluation pipelines.
AlphaApollo also offers flexible and extensible agentic environments and tool-set configurations, allowing users to easily customize, extend, and scale agentic reasoning workflows.
- [2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
- [2025.10] Our technical report is released; see here for details.
conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo
bash installation.sh- Tool-integrated reasoning rollout with seamless environment interaction
- Dynamic memory updates for multi-turn reasoning
- Multi-turn supervised fine-tuning (SFT)
- Reinforcement learning algorithms: GRPO, PPO, DAPO, and more.
- Multi-round, multi-model solution refinement with shared state
- Iterative improvement via feedback and executable checks
- Python interpreter
- Retrieval-Augmented Generation (RAG)
- Web search
bash examples/generation/run_generation_informal_math_no_tool.sh # no-tool reasoningbash examples/generation/run_generation_informal_math_tool.sh # tool-integrated reasaoningbash examples/sft/run_sft_informal_math_no_tool.sh # vallina SFTbash examples/sft/run_sft_informal_math_tool.sh # multi-turn SFTbash examples/grpo/run_grpo_informal_math_no_tool.sh # vallina GRPObash examples/grpo/run_grpo_informal_math_tool.sh # multi-turn GRPOBefore running the self-evolution scripts, make sure to serve the corresponding number of models.
python utils/ray_serve_llm.py --model_path <model_path> --gpus <gpus> --port <port> --model_id <model_id>
# python utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "4,5" --port 9876 --model_id "qwen3_4b_inst"bash examples/evolving/run_vllm_informalmath_evolving.sh # single-model evolutionbash examples/evolving/run_vllm_informalmath_evolving_multi_models.sh # multi-model evolution- Environment package in
./agent_system/environments/informal_math_training - Prompts in
./agent_system/environments/prompts/informal_math_training.py
- Environment package in
./agent_system/environments/informal_math_evolving - Prompts in
./agent_system/environments/prompts/informal_math_evolving.py
- Python Code implementation:
./tools/python_code.py - Local RAG implementation:
./tools/rag
Note: Before using the local RAG module, please follow the instructions in tools/rag/README.md to set up the required environment.
AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.
If you find AlphaApollo useful in your research, please consider citing our work:
@article{zhou2025alphaapollo,
title={AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning},
author={Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Xu, Linrui and Cheng, Tian and others},
journal={arXiv preprint arXiv:2510.06261},
year={2025}
}