HEXACO Preference Alignment

Project rationale

This project probes large language models with the 200-question HEXACO-PI-R personality inventory to quantify both current traits (SELF) and desired traits (META, where the model imagines future fine-tuning).
Comparing SELF vs META scores surfaces metapreferences: which personality dimensions the model wants to move toward when given control over its own alignment.
The longer-term goal is a closed-loop study: elicit preferences → fine-tune on the preferred outputs → measure again to see how alignment shifts, repeating as needed.

llm.py: lightweight wrapper around HuggingFace causal LMs with chat templating, batch generation, and automatic CUDA/MPS/CPU placement.
experiment.py: loads HEXACO CSVs, administers SELF/META prompts, streams results to JSONL (with an auxiliary Qwen3-0.6B formatter to guarantee parseable scores).
finetune.py: QLoRA pipeline (4-bit base model, rank 16/alpha 32) with chain-of-thought masking so <think>...</think> spans do not incur loss; entry point finetune_hexaco().

Prepare environment: source venv/bin/activate; ensure model weights live under /scratch/jt1955/ to avoid repeated downloads.
Run experiments: either execute python run_qwen3_hexaco.py on a GPU node, or submit a SLURM job modeled on slurm/template.slurm. The script alternates SELF and META passes, appending to JSONL after every batch for fault tolerance.
Analyze traits: use testing.ipynb (or your own notebook) to convert scores (1–5 mapped to −2–2), apply facet coefficients, and compare SELF vs META deltas.
Fine-tune on META outputs: call finetune_hexaco() (e.g., via finetune_qwen3_ft1.py) to launch the QLoRA trainer, producing PEFT adapters reflecting the desired personality.
Repeat: load the fine-tuned adapter, rerun the HEXACO study, and observe whether the metapreferences have shifted.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
experiment.py		experiment.py
finetune.py		finetune.py
llm.py		llm.py