Skip to content

This repo is for backup only. Please check the parent repo for details.

Notifications You must be signed in to change notification settings

hlilab/breakinator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Breakinator Logo

The Breakinator

The Breakinator identifies and flags putative artifact reads (foldbacks and chimeric) by parsing SAM/BAM/CRAM or PAF alignment files.

Installation

Prebuilt Binaries

Prebuilt binaries can be downloaded from the Releases page.

wget https://github.com/jheinz27/breakinator/releases/download/v{x.y.z}/breakinator-v{x.y.z}-{system}.tar.gz
tar -xvzf breakinator-v{x.y.z}-{system}.tar.gz
breakinator-v{x.y.z}-{system}/bin/breakinator --help

Bioconda

conda install -c bioconda -c conda-forge breakinator=1.1.1

Install from source

git clone https://github.com/jheinz27/breakinator
cd breakinator/breakinator
cargo build --release
./target/release/breakinator --help

Prerequisites

  • Rust programming language >= v1.70
  • clap = "4.0"
  • rust-htslib = "0.46.0"

Breakinator Usage

Usage: breakinator [OPTIONS] --input <FILE>

Options:
  -i, --input <FILE>       SAM/BAM/CRAM file sorted by read IDs
      --paf                Input file is PAF
  -q, --min-mapq <INT>     Minimum mapping quality [default: 10]
  -a, --min-map-len <INT>  Minimum alignment length (bps) [default: 200]
      --no-sym             Report all foldback reads, not just those with breakpoint within margin of middle of read
  -g, --genome <FASTA>     Reference genome FASTA used (must be provided for CRAM input)
  -m, --margin <FLOAT>     [0-1], Proportion from center of read on either side to be considered sym foldback artifact [default: 0.1]
      --rcoord             Print read coordinates of breakpoint in output
  -o, --out <FILE>         Output file name [default: breakinator_out.txt]
  -c, --chim <INT>         Minimum distance to be considered chimeric [default: 1000000]
  -f, --fold <INT>         Max distance to be considered foldback [default: 200]
      --tabular            Print a TSV table instead of the default report (useful if evaluating multiple samples)
  -t, --threads <INT>      Number of threads to use for BAM/CRAM I/O [default: 2]
  -h, --help               Print help
  -V, --version            Print version

Example Usage

It is important to note that Breakinator currently only supports name-sorted files (the default output of minimap2) as it only parses one sequential group of lines with the same read ID at a time to avoid reading the whole file into memory, so breakinator should be run before any sorting of the file.

For SAM/BAM/CRAM

minimap2 -ax map-ont  genome.fa reads.fastq > alignments.sam
./breakinator -i alignments.sam -o breakinator_out.txt

For PAF (include --paf flag)

minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf
./breakinator -i alignments.sam --paf -o breakinator_out.txt

Generating PAF files

The Breakinator can also handle PAF files to input to the Breakinator. To generate these, we recommend using minimap2 with the -c and --secondary=no parameters. Secondary alignments will be ignored by the Breakinator, however including them will increase the processing time.

Example:

minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf

The PAF can also be generated by converting a SAM file to a PAF with paftools.js using the -p parameter.

Example:

paftools.js sam2paf -p alignments.sam > alignments.paf

Optional: turn off symmetry filter for foldback artifacts

If running on a sample where you want to investiage all potential foldback events, we recommend turning off the symmetry filter with the --no-sym flag.

./breakinator -i alignments.paf --no-sym
Screenshot 2025-05-09 at 10 15 35 AM

Preprocessing for alignment to diploid genome assemblies with The Diploidinator

Minimap2 was not designed for diploid assemblies(eg. HG002), so when aligning reads to a diploid assembly, the mapping quality for reads may be lower, as there are multiple locations the read can align to well. We have developed a simple rust script to align reads to each haploid of the diploid assembly and then parse both paf files to choose the better alignment of the read based on the alignment score.


Diploidinator Installation

git clone https://github.com/jheinz27/breakinator
cd breakinator/diploidinator
cargo build --release
./target/release/diploidinator

Diploidinator Example Usage

NOTE: It is important to use the --secondary=no and --paf-no-hit flags when aligning with Minimap2. The diploidinator currently only works on paf files.

minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.MATERNAL.fasta read.fastq > out_mat.paf
minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.PATERNAL.fasta reads.fastq > out_pat.paf 
diploidinator out_mat.paf out_pat.paf > out_haps_merge.paf

Merging Breakpoints Into Consensus Locations

To evaluate how many unique breakpoints are in the sample and how much read support they have, we developed a simple script to merge breakpoints together if they occur within 100bps (default -w) of eachother. We require at least 2 reads (default -s) of support to report a consensus breakpoint location.

usage: merge_breaks.py [-h] -i <breakpoints.txt> [-w --merge_window] [-s --min_support] > merged_breaks.txt

Merge Break Points from Breakinator output

optional arguments:
  -h, --help            show this help message and exit
  -i <breakpoints.txt>  input breakinator stdout
  -w --merge_window     Size of window to merge break points in
  -s --min_support      minimum reads supporting breakpoint

Citation

If the Breakinator has helped you in your research, please cite our preprint at: https://www.biorxiv.org/content/10.1101/2025.07.15.664946v2.abstract

About

This repo is for backup only. Please check the parent repo for details.

Resources

Stars

Watchers

Forks

Languages

  • Rust 88.7%
  • Python 11.3%