Skip to content

Code_Generation: Prompt JSON source mismatch (research_plan.json vs task_analysis_*.json) + missing post-generation run/debug verification step #7

@AA-ke

Description

@AA-ke

While reviewing CellForge’s code generation scripts under scAgents/cellforge/Code_Generation/, I found:

  1. The prompt builder in auto_start_openhands.py hardcodes results/research_plan.json and expects keys like task_description, dataset, and perturbations, but in this project the authoritative metadata appears in Task_Analysis/results/task_analysis_*.json with different key names (e.g., dataset_info). This causes missing/incorrect prompt content and path mismatches.
  2. The current code generation pipeline appears to only start OpenHands + request code + write result.py, but does not implement the “auto compile/run/debug verification” mentioned in the original description (no py_compile, smoke test, unit test, etc.), so generated code is not automatically validated.

Evidence / Code Pointers

A) Prompt reads results/research_plan.json (hardcoded)
In auto_start_openhands.py, prerequisites and prompt construction rely on results/research_plan.json:
auto_start_openhands.py:
research_plan_path = Path("results/research_plan.json")if research_plan_path.exists(): logger.info(f"Found research plan: {research_plan_path}")else: logger.warning(f"Research plan not found: {research_plan_path}") logger.info("OpenHands will start without research plan")
Prompt construction pulls keys from that JSON:
auto_start_openhands.py
research_plan_path = Path("results/research_plan.json")with open(research_plan_path, 'r', encoding='utf-8') as f: research_plan = json.load(f)task_description = research_plan.get("task_description", "Single-cell perturbation prediction")dataset_info = research_plan.get("dataset", {})perturbations = research_plan.get("perturbations", [])
B) Actual metadata exists in Task_Analysis outputs (different schema)
Example task analysis output contains task_description and dataset_info:
task_analysis_2026_.json
{ "timestamp": "20260119_105842", "task_description": "...", "dataset_info": { "dataset_path": "cellforge/data/datasets/", "dataset_name": "norman_2019_k562", "data_type": "scRNA-seq", "cell_line": "K562", "perturbation_type": "CRISPRi" }, ...}
This means the prompt builder is likely reading the wrong file and/or wrong keys (dataset vs dataset_info).
C) Code generation writes result.py but does not run/validate it
OpenHandsCodeGenerator.generate_code() starts OpenHands, calls the chat API, extracts code, and writes result.py. No post-generation compile/run/test step is implemented:
init.py
def generate_code(...): ... if not self.start_openhands_docker(): return None if not self.wait_for_openhands_ready(): return None full_prompt = f"{self.code_generation_prompt}\n\nRESEARCH PLAN:\n{research_plan_json}" code_file_path = self._send_to_openhands(full_prompt, output_dir) if code_file_path: logger.info(f"Code generated successfully: {code_file_path}") return code_file_path ...

Steps to Reproduce

Ensure scAgents/cellforge/Task_Analysis/results/task_analysis_*.json exists (generated by Task Analysis).
Run:
python scAgents/cellforge/Code_Generation/auto_start_openhands.py
Inspect the generated prompt (e.g., ~/.openhands-workspace/initial_prompt.md) and/or logs.
Observe that dataset/task fields may be missing or “Unknown” due to schema/path mismatch.
Run the code generation flow and note it outputs result.py but does not attempt to compile/run/smoke-test it.

Expected Behavior

Prompt creation should use the authoritative task analysis output by default (e.g., latest Task_Analysis/results/task_analysis_*.json), or allow explicit configuration of the source file via CLI flag/env var.
Schema/key compatibility should be handled (dataset_info vs dataset, etc.), with clear warnings when fields are missing.
After generation, the pipeline should provide an optional but default-safe validation loop such as:
_compile result.py`
optional smoke test (e.g., python result.py --help or minimal run)
optional tests/linting if applicable (pytest -q, etc.)
persist logs to output_dir and optionally re-prompt OpenHands to fix failures automatically.

Actual Behavior

Prompt creation is hardcoded to results/research_plan.json and expects keys that don’t match the project’s task analysis schema, resulting in incomplete/incorrect prompt info.
Generated code is saved but not automatically validated by compilation/execution/testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions