Skip to content

nju-websoft/PSR

Repository files navigation

PSR

1. Requirements

1.1 Python Environment

  • Python >= 3.10

1.2 Dependencies

Install all required packages:

pip install -r requirements.txt
Key dependencies
Package Version
numpy 1.26.4
tqdm 4.66.4
requests 2.32.3
openai 1.40.6
SPARQLWrapper 2.0.0
sentence-transformers 3.0.1
transformers 4.44.2
tokenizers 0.19.1

1.3 External Services

Service Endpoint Description
Virtuoso SPARQL http://localhost:8890/sparql Must be loaded with a Freebase-compatible RDF dump using namespace http://rdf.freebase.com/ns/

2. Project Structure

├── main.py              # End-to-end pipeline and evaluation
├── run.py               # Entry point script
├── llm_client.py        # LLM API interface with retry logic
├── freebase_func.py     # SPARQL utilities for Freebase (incl. CVT handling)
├── chains_oneshot.py    # One-shot schema-level path enumeration
├── schema_tri.py        # Schema trie construction and management
├── prompt_list.py       # Prompt templates for LLM calls
├── freebase_schema.csv  # Freebase schema definitions
├── described.jsonl      # Entity descriptions
├── data/                # Input datasets
├── output/              # Intermediate results
├── requirements.txt
└── README.md

3. Dataset Format

The input dataset is a JSON file containing a list of question objects:

[
  {
    "question": "Who directed Titanic?",
    "topic_entity": {
      "m.0jcx": "Titanic"
    },
    "answer": "James Cameron"
  }
]
Field Type Description
question string Natural language question
topic_entity dict Mapping from Freebase MIDs to entity names
answer string Gold answer

4. Configuration

Set the following environment variables before running:

export LLM_API_KEY=<your_api_key>
export LLM_API_BASE=<your_api_base_url>

5. Usage

python main.py \
    --data_path data/cwq.json \
    --limit 100 \
    --workers 8 \
    --model gpt-4
Argument Description Default
--data_path Path to input dataset -
--limit Number of questions to evaluate None (all)
--workers Number of parallel workers 1
--model LLM backend identifier gpt-4

6. Output

6.1 Predictions

Final predictions are saved in JSON format:

{
  "idx": 0,
  "question": "...",
  "gold": ["James Cameron"],
  "pred": "James Cameron"
}

6.2 Intermediate Results

Per-question artifacts are saved under output/ for analysis and debugging.

License

This project is released for research purposes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages