Skip to content

SFGLab/I-SAGE

I-SAGE β€” iBLESS Analysis Pipeline

I-SAGE is a reproducible, Nextflow-based pipeline for analyzing iBLESS sequencing data and performing genome-wide differential DNA double-strand break (DSB) analysis, particularly under replication stress conditions.

The pipeline integrates alignment, break calling, normalization, visualization, differential statistics, validation, and sensitivity analyses into a single configurable workflow.


Table of Contents

  1. Key Features
  2. Pipeline Overview
  3. Requirements
  4. Quick Start
  5. Configuration
  6. Differential Statistics
  7. EBV Annotation
  8. Outputs
  9. Documentation
  10. Project Status
  11. License

Key Features

  • Per-base iBLESS break calling
  • Strand-aware break aggregation
  • Configurable binning for visualization and statistics
  • Replicate-aware differential break analysis
  • Genome-wide or region-restricted testing (BED)
  • Automated validation via downsampling and spike-in
  • Bin-size sensitivity analysis
  • Optional EBV contig annotation and enrichment
  • Publication- and genome-browser–ready outputs

Documentation

Documentation

The full documentation for I-SAGE is available as a hosted website:

πŸ‘‰ https://sfglab.github.io/I-SAGE/

The documentation includes:

  • Getting started guide
  • Full configuration reference
  • Pipeline module descriptions
  • Statistical methods and assumptions
  • Output interpretation
  • Developer and contribution guidelines

Pipeline Overview

FASTQ
  ↓
Alignment & Deduplication
  ↓
Break Calling (per-base)
  ↓
Visualization Tracks (binned bedGraph)
  ↓
Normalization
  ↓
Differential Break Statistics
  ↓
Validation & Sensitivity Analyses

Requirements

I-SAGE is designed to run on HPC systems and relies on a combination of workflow, bioinformatics, and Python-based tools.

Workflow & Runtime

  • Nextflow β‰₯ 22
  • Java β‰₯ 11
  • Bash / core UNIX utilities

Bioinformatics Tools

The following tools must be available in the execution environment (typically via modules or Conda):

  • samtools β€” BAM processing, sorting, indexing
  • pysam β€” Python bindings for BAM/CRAM access
  • bwa (or equivalent aligner) β€” read alignment
  • bedGraph / UCSC utilities β€” bedGraph and bigWig handling

Exact tool versions are managed via the active Nextflow profile (e.g. eden_local) and HPC environment.

Python

  • Python β‰₯ 3.9

Python Packages

  • numpy
  • pandas
  • scipy
  • matplotlib

Note:
I-SAGE assumes these tools are provided by the execution environment (HPC modules, Conda, or container).
The pipeline does not install system-level dependencies automatically.


Quick Start

nextflow run workflows/iblesse_month2/main.nf \
  -profile eden_local \
  -params-file configs/iblesse.yaml

For detailed setup instructions, see the Configuration section below.


Configuration

Pipeline behavior is controlled via a YAML configuration file (e.g., configs/iblesse.yaml).

Main Configuration Sections

1. Break Calling (break_calling)

  • Controls per-base or binned break calling parameters

2. Visualization (viz)

  • Visualization bin size or bin-size sweep settings

3. Statistics (stats)

  • Contrasts specification
  • Replicate handling
  • FDR thresholds
  • EBV annotation options

4. Validation (validation)

  • Downsampling parameters
  • Spike-in validation settings

A fully working example is provided in configs/iblesse.yaml.


Differential Statistics

The pipeline performs robust statistical analysis across genomic bins:

Method

  • Per-bin Fisher exact test for break count differences
  • Benjamini–Hochberg FDR correction for multiple testing

Advanced Options

  • Replicate-aware analysis via Fisher meta-analysis

Output Files

  • All tested bins (full results)
  • Significant bins (filtered by FDR threshold)
  • Upregulated / downregulated bins (directional results)
  • Volcano and MA plots (PNG + PDF formats)

EBV Annotation (Optional)

If EBV contigs are present in the reference genome, I-SAGE can:

Capabilities

  • Annotate bins as EBV vs. non-EBV
  • Quantify EBV enrichment among significant bins
  • Report enrichment statistics in summary files

Enable EBV Analysis

Add the following to your configs/iblesse.yaml:

stats:
  ebv_regex: "(?i)^chrEBV$"

Outputs

Results are organized under outdir/:

Output Directory Structure

outdir/
β”œβ”€β”€ viz/              # bedGraph tracks (per bin size)
β”œβ”€β”€ stats/            # Differential statistics, plots, summaries
β”œβ”€β”€ validation/       # Robustness and sensitivity analyses
β”œβ”€β”€ reports/          # HTML reports and execution traces
└── logs/             # Pipeline logs

Key Output Files

  • Visualization tracks – BigWig/bedGraph format for genome browsers
  • Statistical tables – TSV files with bin-level results
  • Plots – Volcano plots, MA plots, and heatmaps (PNG + PDF)
  • Summary reports – HTML and text-based summaries

Documentation

Full documentation is under development and will be available in the documentation/ directory, including:

  • Pipeline architecture and design
  • Module-level descriptions
  • Statistical methods and validation
  • Configuration guide and best practices
  • Output interpretation and usage

Project Status

  • Phase: Active development (Month 4)
  • Stability: Core APIs and outputs are stabilizing
  • Development roadmap: See CHANGELOG.md for recent updates

Citation / Acknowledgment

If this tool supports your work, please cite the repository and acknowledge: β€œDeveloped by Pranjul Mishra, under the guidance of Dr. Joanna Borkowska and Prof. Dariusz PlewczyΕ„ski (Structural and Functional Genomics Laboratory).”

License

This project is licensed under the MIT License. See the LICENSE file for details.


Support & Contribution

For issues, questions, or contributions, please refer to the project repository.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published