Skip to content

FR2SQL is a text-to-SQL prototype developed for the MTI820 course that generates SQL queries from French natural language requests. It fine-tunes a quantized LLaMA 3 Instruct model with QLoRA on the Spider-FR dataset, using constrained decoding to ensure valid SQL generation.

License

Notifications You must be signed in to change notification settings

fwilhelmy/FR2SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FR2SQL

Overview

FR2SQL is a prototype project aiming to generate SQL queries from natural language requests written in French. The system is intended to simplify data access for business users and is built as part of the MTI820 course. The project leverages a large language model (LLaMA 3 Instruct‑8B) quantized to 4‑bit and fine‑tuned with QLoRA on the Spider‑FR dataset.

Methodology

The pipeline is implemented in Python using PyTorch Lightning. We quantize the base model with BitsAndBytes, then fine‑tune low‑rank adapters via QLoRA. Generation is constrained by PICARD to ensure valid SQL syntax. The system will be evaluated with metrics such as Execution Accuracy, Exact Match, Valid SQL Rate and Valid Efficiency Score.

Data

We rely on Spider‑FR, a French translation of the Spider benchmark, which pairs natural language questions with SQL queries against complex relational schemas. Each entry includes the question, target SQL and the database identifier.

Directory Structure

├── src/        Source code
├── data/       Spider-FR dataset and dialog history
├── logs/       Logs for the entire system
├── databases/  SQLite databases used in examples
├── reports/    Project reports and LaTeX sources

The reports folder contains two PDF documents:

  • MTI820_Proposition_de_Projet.pdf – the original project proposal describing objectives and planning.
  • MTI820_Revue_de_Litterature.pdf – a literature review on using large language models for BI assistance.

Usage

Training

Fine‑tune the multilingual model on the French Spider dataset:

python src/agent/train.py

Adapters and tokenizer files will be saved in the adapters/ directory.

Interactive Demo

Run the end‑to‑end pipeline that links a natural language question to a SQLite database and executes the generated query:

python src/main.py

This uses database/sqlite/employee_db.sqlite and stores past queries in data/dialog_memory.txt.

Evaluation

Evaluate a trained model on a Spider‑FR style dataset. In addition to exact match, the script will attempt to run the official test-suite-sql-eval evaluation if the repository is present under test-suite-sql-eval/:

python src/evaluation/pipeline_evaluator.py data/spider-fr/dev_spider.json --model adapters --db-root databases/spider/test_database

The script writes the predictions and an accuracy report in the current directory and, when the test-suite repo is available, prints the Test Suite execution accuracy.

Future Work

Implementation of the training and inference pipeline is planned but not yet committed. The project schedule includes dataset preparation, fine‑tuning, evaluation, and reporting, as detailed in the project proposal.

Credits

This project makes use of the following open source resources:

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

FR2SQL is a text-to-SQL prototype developed for the MTI820 course that generates SQL queries from French natural language requests. It fine-tunes a quantized LLaMA 3 Instruct model with QLoRA on the Spider-FR dataset, using constrained decoding to ensure valid SQL generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •