I-SAGE — iBLESS Analysis Pipeline

I-SAGE is a reproducible, Nextflow-based pipeline for analyzing iBLESS sequencing data and performing genome-wide differential DNA double-strand break (DSB) analysis, particularly under replication stress conditions.

The pipeline integrates alignment, break calling, normalization, visualization, differential statistics, validation, and sensitivity analyses into a single configurable workflow.

Key Features

Per-base iBLESS break calling
Strand-aware break aggregation
Configurable binning for visualization and statistics
Replicate-aware differential break analysis
Genome-wide or region-restricted testing (BED)
Automated validation via downsampling and spike-in
Bin-size sensitivity analysis
Optional EBV contig annotation and enrichment
Publication- and genome-browser–ready outputs

Documentation

The full documentation for I-SAGE is available as a hosted website:

👉 https://sfglab.github.io/I-SAGE/

The documentation includes:

Getting started guide
Full configuration reference
Pipeline module descriptions
Statistical methods and assumptions
Output interpretation
Developer and contribution guidelines

Pipeline Overview

FASTQ
  ↓
Alignment & Deduplication
  ↓
Break Calling (per-base)
  ↓
Visualization Tracks (binned bedGraph)
  ↓
Normalization
  ↓
Differential Break Statistics
  ↓
Validation & Sensitivity Analyses

Requirements

I-SAGE is designed to run on HPC systems and relies on a combination of workflow, bioinformatics, and Python-based tools.

Workflow & Runtime

Nextflow ≥ 22
Java ≥ 11
Bash / core UNIX utilities

Bioinformatics Tools

The following tools must be available in the execution environment (typically via modules or Conda):

samtools — BAM processing, sorting, indexing
pysam — Python bindings for BAM/CRAM access
bwa (or equivalent aligner) — read alignment
bedGraph / UCSC utilities — bedGraph and bigWig handling

Exact tool versions are managed via the active Nextflow profile (e.g. eden_local) and HPC environment.

Python

Python ≥ 3.9

Python Packages

numpy
pandas
scipy
matplotlib

Note:
I-SAGE assumes these tools are provided by the execution environment (HPC modules, Conda, or container).
The pipeline does not install system-level dependencies automatically.

Quick Start

nextflow run workflows/iblesse_month2/main.nf \
  -profile eden_local \
  -params-file configs/iblesse.yaml

For detailed setup instructions, see the Configuration section below.

Configuration

Pipeline behavior is controlled via a YAML configuration file (e.g., configs/iblesse.yaml).

Main Configuration Sections

1. Break Calling (`break_calling`)

Controls per-base or binned break calling parameters

2. Visualization (`viz`)

Visualization bin size or bin-size sweep settings

3. Statistics (`stats`)

Contrasts specification
Replicate handling
FDR thresholds
EBV annotation options

4. Validation (`validation`)

Downsampling parameters
Spike-in validation settings

A fully working example is provided in configs/iblesse.yaml.

Differential Statistics

The pipeline performs robust statistical analysis across genomic bins:

Method

Per-bin Fisher exact test for break count differences
Benjamini–Hochberg FDR correction for multiple testing

Advanced Options

Replicate-aware analysis via Fisher meta-analysis

Output Files

All tested bins (full results)
Significant bins (filtered by FDR threshold)
Upregulated / downregulated bins (directional results)
Volcano and MA plots (PNG + PDF formats)

EBV Annotation (Optional)

If EBV contigs are present in the reference genome, I-SAGE can:

Capabilities

Annotate bins as EBV vs. non-EBV
Quantify EBV enrichment among significant bins
Report enrichment statistics in summary files

Enable EBV Analysis

Add the following to your configs/iblesse.yaml:

stats:
  ebv_regex: "(?i)^chrEBV$"

Outputs

Results are organized under outdir/:

Output Directory Structure

outdir/
├── viz/              # bedGraph tracks (per bin size)
├── stats/            # Differential statistics, plots, summaries
├── validation/       # Robustness and sensitivity analyses
├── reports/          # HTML reports and execution traces
└── logs/             # Pipeline logs

Key Output Files

Visualization tracks – BigWig/bedGraph format for genome browsers
Statistical tables – TSV files with bin-level results
Plots – Volcano plots, MA plots, and heatmaps (PNG + PDF)
Summary reports – HTML and text-based summaries

Documentation

Full documentation is under development and will be available in the documentation/ directory, including:

Pipeline architecture and design
Module-level descriptions
Statistical methods and validation
Configuration guide and best practices
Output interpretation and usage

Project Status

Phase: Active development (Month 4)
Stability: Core APIs and outputs are stabilizing
Development roadmap: See CHANGELOG.md for recent updates

Citation / Acknowledgment

If this tool supports your work, please cite the repository and acknowledge: “Developed by Pranjul Mishra, under the guidance of Dr. Joanna Borkowska and Prof. Dariusz Plewczyński (Structural and Functional Genomics Laboratory).”

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support & Contribution

For issues, questions, or contributions, please refer to the project repository.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
conf		conf
configs		configs
docs		docs
modules		modules
workflows/iblesse_month2		workflows/iblesse_month2
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yaml		mkdocs.yaml
nextflow.config		nextflow.config

License

SFGLab/I-SAGE

Folders and files

Latest commit

History

Repository files navigation

I-SAGE — iBLESS Analysis Pipeline

Table of Contents

Key Features

Documentation

Pipeline Overview

Requirements

Workflow & Runtime

Bioinformatics Tools

Python

Python Packages

Quick Start

Configuration

Main Configuration Sections

1. Break Calling (break_calling)

2. Visualization (viz)

3. Statistics (stats)

4. Validation (validation)

Differential Statistics

Method

Advanced Options

Output Files

EBV Annotation (Optional)

Capabilities

Enable EBV Analysis

Outputs

Output Directory Structure

Key Output Files

Documentation

Project Status

Citation / Acknowledgment

License

Support & Contribution

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Break Calling (`break_calling`)

2. Visualization (`viz`)

3. Statistics (`stats`)

4. Validation (`validation`)

Packages