I-SAGE is a reproducible, Nextflow-based pipeline for analyzing iBLESS sequencing data and performing genome-wide differential DNA double-strand break (DSB) analysis, particularly under replication stress conditions.
The pipeline integrates alignment, break calling, normalization, visualization, differential statistics, validation, and sensitivity analyses into a single configurable workflow.
- Key Features
- Pipeline Overview
- Requirements
- Quick Start
- Configuration
- Differential Statistics
- EBV Annotation
- Outputs
- Documentation
- Project Status
- License
- Per-base iBLESS break calling
- Strand-aware break aggregation
- Configurable binning for visualization and statistics
- Replicate-aware differential break analysis
- Genome-wide or region-restricted testing (BED)
- Automated validation via downsampling and spike-in
- Bin-size sensitivity analysis
- Optional EBV contig annotation and enrichment
- Publication- and genome-browserβready outputs
The full documentation for I-SAGE is available as a hosted website:
π https://sfglab.github.io/I-SAGE/
The documentation includes:
- Getting started guide
- Full configuration reference
- Pipeline module descriptions
- Statistical methods and assumptions
- Output interpretation
- Developer and contribution guidelines
FASTQ
β
Alignment & Deduplication
β
Break Calling (per-base)
β
Visualization Tracks (binned bedGraph)
β
Normalization
β
Differential Break Statistics
β
Validation & Sensitivity Analyses
I-SAGE is designed to run on HPC systems and relies on a combination of workflow, bioinformatics, and Python-based tools.
- Nextflow β₯ 22
- Java β₯ 11
- Bash / core UNIX utilities
The following tools must be available in the execution environment (typically via modules or Conda):
- samtools β BAM processing, sorting, indexing
- pysam β Python bindings for BAM/CRAM access
- bwa (or equivalent aligner) β read alignment
- bedGraph / UCSC utilities β bedGraph and bigWig handling
Exact tool versions are managed via the active Nextflow profile (e.g. eden_local) and HPC environment.
- Python β₯ 3.9
numpypandasscipymatplotlib
Note:
I-SAGE assumes these tools are provided by the execution environment (HPC modules, Conda, or container).
The pipeline does not install system-level dependencies automatically.
nextflow run workflows/iblesse_month2/main.nf \
-profile eden_local \
-params-file configs/iblesse.yamlFor detailed setup instructions, see the Configuration section below.
Pipeline behavior is controlled via a YAML configuration file (e.g., configs/iblesse.yaml).
- Controls per-base or binned break calling parameters
- Visualization bin size or bin-size sweep settings
- Contrasts specification
- Replicate handling
- FDR thresholds
- EBV annotation options
- Downsampling parameters
- Spike-in validation settings
A fully working example is provided in configs/iblesse.yaml.
The pipeline performs robust statistical analysis across genomic bins:
- Per-bin Fisher exact test for break count differences
- BenjaminiβHochberg FDR correction for multiple testing
- Replicate-aware analysis via Fisher meta-analysis
- All tested bins (full results)
- Significant bins (filtered by FDR threshold)
- Upregulated / downregulated bins (directional results)
- Volcano and MA plots (PNG + PDF formats)
If EBV contigs are present in the reference genome, I-SAGE can:
- Annotate bins as EBV vs. non-EBV
- Quantify EBV enrichment among significant bins
- Report enrichment statistics in summary files
Add the following to your configs/iblesse.yaml:
stats:
ebv_regex: "(?i)^chrEBV$"Results are organized under outdir/:
outdir/
βββ viz/ # bedGraph tracks (per bin size)
βββ stats/ # Differential statistics, plots, summaries
βββ validation/ # Robustness and sensitivity analyses
βββ reports/ # HTML reports and execution traces
βββ logs/ # Pipeline logs
- Visualization tracks β BigWig/bedGraph format for genome browsers
- Statistical tables β TSV files with bin-level results
- Plots β Volcano plots, MA plots, and heatmaps (PNG + PDF)
- Summary reports β HTML and text-based summaries
Full documentation is under development and will be available in the documentation/ directory, including:
- Pipeline architecture and design
- Module-level descriptions
- Statistical methods and validation
- Configuration guide and best practices
- Output interpretation and usage
- Phase: Active development (Month 4)
- Stability: Core APIs and outputs are stabilizing
- Development roadmap: See
CHANGELOG.mdfor recent updates
If this tool supports your work, please cite the repository and acknowledge: βDeveloped by Pranjul Mishra, under the guidance of Dr. Joanna Borkowska and Prof. Dariusz PlewczyΕski (Structural and Functional Genomics Laboratory).β
This project is licensed under the MIT License. See the LICENSE file for details.
For issues, questions, or contributions, please refer to the project repository.