The Breakinator identifies and flags putative artifact reads (foldbacks and chimeric) by parsing SAM/BAM/CRAM or PAF alignment files.
Prebuilt binaries can be downloaded from the Releases page.
wget https://github.com/jheinz27/breakinator/releases/download/v{x.y.z}/breakinator-v{x.y.z}-{system}.tar.gz
tar -xvzf breakinator-v{x.y.z}-{system}.tar.gz
breakinator-v{x.y.z}-{system}/bin/breakinator --help
conda install -c bioconda -c conda-forge breakinator=1.1.1
git clone https://github.com/jheinz27/breakinator
cd breakinator/breakinator
cargo build --release
./target/release/breakinator --help
- Rust programming language >= v1.70
- clap = "4.0"
- rust-htslib = "0.46.0"
Usage: breakinator [OPTIONS] --input <FILE>
Options:
-i, --input <FILE> SAM/BAM/CRAM file sorted by read IDs
--paf Input file is PAF
-q, --min-mapq <INT> Minimum mapping quality [default: 10]
-a, --min-map-len <INT> Minimum alignment length (bps) [default: 200]
--no-sym Report all foldback reads, not just those with breakpoint within margin of middle of read
-g, --genome <FASTA> Reference genome FASTA used (must be provided for CRAM input)
-m, --margin <FLOAT> [0-1], Proportion from center of read on either side to be considered sym foldback artifact [default: 0.1]
--rcoord Print read coordinates of breakpoint in output
-o, --out <FILE> Output file name [default: breakinator_out.txt]
-c, --chim <INT> Minimum distance to be considered chimeric [default: 1000000]
-f, --fold <INT> Max distance to be considered foldback [default: 200]
--tabular Print a TSV table instead of the default report (useful if evaluating multiple samples)
-t, --threads <INT> Number of threads to use for BAM/CRAM I/O [default: 2]
-h, --help Print help
-V, --version Print version
It is important to note that Breakinator currently only supports name-sorted files (the default output of minimap2) as it only parses one sequential group of lines with the same read ID at a time to avoid reading the whole file into memory, so breakinator should be run before any sorting of the file.
minimap2 -ax map-ont genome.fa reads.fastq > alignments.sam
./breakinator -i alignments.sam -o breakinator_out.txt
minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf
./breakinator -i alignments.sam --paf -o breakinator_out.txt
The Breakinator can also handle PAF files to input to the Breakinator. To generate these, we recommend using minimap2 with the -c and --secondary=no parameters. Secondary alignments will be ignored by the Breakinator, however including them will increase the processing time.
Example:
minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf
The PAF can also be generated by converting a SAM file to a PAF with paftools.js using the -p parameter.
Example:
paftools.js sam2paf -p alignments.sam > alignments.paf
If running on a sample where you want to investiage all potential foldback events, we recommend turning off the symmetry filter with the --no-sym flag.
./breakinator -i alignments.paf --no-sym
Minimap2 was not designed for diploid assemblies(eg. HG002), so when aligning reads to a diploid assembly, the mapping quality for reads may be lower, as there are multiple locations the read can align to well. We have developed a simple rust script to align reads to each haploid of the diploid assembly and then parse both paf files to choose the better alignment of the read based on the alignment score.
git clone https://github.com/jheinz27/breakinator
cd breakinator/diploidinator
cargo build --release
./target/release/diploidinator
NOTE: It is important to use the --secondary=no and --paf-no-hit flags when aligning with Minimap2. The diploidinator currently only works on paf files.
minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.MATERNAL.fasta read.fastq > out_mat.paf
minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.PATERNAL.fasta reads.fastq > out_pat.paf
diploidinator out_mat.paf out_pat.paf > out_haps_merge.paf
To evaluate how many unique breakpoints are in the sample and how much read support they have, we developed a simple script to merge breakpoints together if they occur within 100bps (default -w) of eachother. We require at least 2 reads (default -s) of support to report a consensus breakpoint location.
usage: merge_breaks.py [-h] -i <breakpoints.txt> [-w --merge_window] [-s --min_support] > merged_breaks.txt
Merge Break Points from Breakinator output
optional arguments:
-h, --help show this help message and exit
-i <breakpoints.txt> input breakinator stdout
-w --merge_window Size of window to merge break points in
-s --min_support minimum reads supporting breakpoint
If the Breakinator has helped you in your research, please cite our preprint at: https://www.biorxiv.org/content/10.1101/2025.07.15.664946v2.abstract
