-
Notifications
You must be signed in to change notification settings - Fork 13
Description
While reviewing CellForge’s code generation scripts under scAgents/cellforge/Code_Generation/, I found:
- The prompt builder in auto_start_openhands.py hardcodes results/research_plan.json and expects keys like task_description, dataset, and perturbations, but in this project the authoritative metadata appears in
Task_Analysis/results/task_analysis_*.jsonwith different key names (e.g., dataset_info). This causes missing/incorrect prompt content and path mismatches. - The current code generation pipeline appears to only start OpenHands + request code + write result.py, but does not implement the “auto compile/run/debug verification” mentioned in the original description (no py_compile, smoke test, unit test, etc.), so generated code is not automatically validated.
Evidence / Code Pointers
A) Prompt reads results/research_plan.json (hardcoded)
In auto_start_openhands.py, prerequisites and prompt construction rely on results/research_plan.json:
auto_start_openhands.py:
research_plan_path = Path("results/research_plan.json")if research_plan_path.exists(): logger.info(f"Found research plan: {research_plan_path}")else: logger.warning(f"Research plan not found: {research_plan_path}") logger.info("OpenHands will start without research plan")
Prompt construction pulls keys from that JSON:
auto_start_openhands.py
research_plan_path = Path("results/research_plan.json")with open(research_plan_path, 'r', encoding='utf-8') as f: research_plan = json.load(f)task_description = research_plan.get("task_description", "Single-cell perturbation prediction")dataset_info = research_plan.get("dataset", {})perturbations = research_plan.get("perturbations", [])
B) Actual metadata exists in Task_Analysis outputs (different schema)
Example task analysis output contains task_description and dataset_info:
task_analysis_2026_.json
{ "timestamp": "20260119_105842", "task_description": "...", "dataset_info": { "dataset_path": "cellforge/data/datasets/", "dataset_name": "norman_2019_k562", "data_type": "scRNA-seq", "cell_line": "K562", "perturbation_type": "CRISPRi" }, ...}
This means the prompt builder is likely reading the wrong file and/or wrong keys (dataset vs dataset_info).
C) Code generation writes result.py but does not run/validate it
OpenHandsCodeGenerator.generate_code() starts OpenHands, calls the chat API, extracts code, and writes result.py. No post-generation compile/run/test step is implemented:
init.py
def generate_code(...): ... if not self.start_openhands_docker(): return None if not self.wait_for_openhands_ready(): return None full_prompt = f"{self.code_generation_prompt}\n\nRESEARCH PLAN:\n{research_plan_json}" code_file_path = self._send_to_openhands(full_prompt, output_dir) if code_file_path: logger.info(f"Code generated successfully: {code_file_path}") return code_file_path ...
Steps to Reproduce
Ensure scAgents/cellforge/Task_Analysis/results/task_analysis_*.json exists (generated by Task Analysis).
Run:
python scAgents/cellforge/Code_Generation/auto_start_openhands.py
Inspect the generated prompt (e.g., ~/.openhands-workspace/initial_prompt.md) and/or logs.
Observe that dataset/task fields may be missing or “Unknown” due to schema/path mismatch.
Run the code generation flow and note it outputs result.py but does not attempt to compile/run/smoke-test it.
Expected Behavior
Prompt creation should use the authoritative task analysis output by default (e.g., latest Task_Analysis/results/task_analysis_*.json), or allow explicit configuration of the source file via CLI flag/env var.
Schema/key compatibility should be handled (dataset_info vs dataset, etc.), with clear warnings when fields are missing.
After generation, the pipeline should provide an optional but default-safe validation loop such as:
_compile result.py`
optional smoke test (e.g., python result.py --help or minimal run)
optional tests/linting if applicable (pytest -q, etc.)
persist logs to output_dir and optionally re-prompt OpenHands to fix failures automatically.
Actual Behavior
Prompt creation is hardcoded to results/research_plan.json and expects keys that don’t match the project’s task analysis schema, resulting in incomplete/incorrect prompt info.
Generated code is saved but not automatically validated by compilation/execution/testing.