Kssdtree: an interactive Python package for phylogenetic analysis based on sketching technique

Kssdtree is a versatile Python package for phylogenetic analysis, offering three distinct pipelines: the Routine Pipeline, the Reference Subtraction Pipeline, and the GTDB-based Phylogenetic Placement Pipeline.

(1) Routine Pipeline: A general-purpose tool for phylogenetic analysis of user genomic data. (2) Reference Subtraction Pipeline: Designed for intra-species phylogenomic analysis. (3) GTDB-based Phylogenetic Placement Pipeline: Facilitates the search for similar genomes in the Genome Taxonomy Database (GTDB), conducting phylogenetic analysis alongside these genomes and positioning the input genomes within the entire prokaryotic tree of life.

Kssdtree also provides one-stop tree construction and visualization. It can handle DNA sequences in both fasta and fastq formats, whether gzipped or not. Additionally, Kssdtree is compatible with multiple platforms (Linux, MacOS, and Windows) and can be run using Jupyter notebooks.

1. Installation

Kssdtree requires the Python 3 environment and the dependent packages pandas, pyqt5, ete3, and requests. If kssdtree is installed using the pip command, these dependencies will be installed automatically. For MacOS, it requires Python 3.8 or higher version. For Windows, it requires Python 3.6 version and the installation of the gzip tool (https://gnuwin32.sourceforge.net/packages/gzip.htm) for sequence decompression.

1.1 Linux

pip install kssdtree

1.2 MacOS

# (Optional) Install gcc (/opt/homebrew/bin/gcc-12) 
brew install gcc@12

# Create a virtual environment
conda create --name=kssdtree python=3.10

# Activate the virtual environment
conda activate kssdtree

# Install kssdtree
pip install kssdtree

1.3 Windows

# Create a virtual environment
conda create --name=kssdtree python=3.6.13

# Activate the virtual environment
conda activate kssdtree

# (Optional) Install libpython and m2w64-toolchain
conda install libpython m2w64-toolchain -c msys2

# Install kssdtree
pip install kssdtree

2. Quick-Tutorial

2.1 Routine Pipeline

import kssdtree
kssdtree.quick(shuf_file='./shuf_files/L3K10.shuf', genome_files='your input genomes path', output='output.newick',  method='nj', mode='r')

2.2 Reference Subtraction Pipeline

import kssdtree
kssdtree.quick(shuf_file='./shuf_files/L3K10.shuf', genome_files='your input genomes path', output='output.newick', reference='your reference genome path', method='nj', mode='r')

2.3 GTDB-based Phylogenetic Placement Pipeline

import kssdtree
kssdtree.quick(shuf_file='./shuf_files/L3K9.shuf', genome_files='your input genomes path', output='your output path', database='gtdbr214', method='nj', mode='r', N=30)

For 'L3K10.shuf' and 'L3K9.shuf', if set parameter shuf_file='L3K10.shuf' or shuf_file='L3K9.shuf', kssdtree will download automatically them before performing quick or sketch function. If the automatic download fails, you can manually download them from https://zenodo.org/records/12699159 or current directory shuf_files. For other '*.shuf' files, such as 'L2K8.shuf', etc., kssdtree will be generated automatically by shuffle function. More usages about Kssdtree, please see Kssdtree documentation (https://kssdtree.readthedocs.io/en/latest).

3. How to cite

Hang Yang, Xiaoxin Lu, Jiaxing Chang, Qing Chang, Wen Zheng, Zehua Chen, Huiguang Yi, Kssdtree: an interactive Python package for phylogenetic analysis based on sketching technique, Bioinformatics, Volume 40, Issue 10, October 2024, btae566, https://doi.org/10.1093/bioinformatics/btae566

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dnjheaders		dnjheaders
kssdheaders		kssdheaders
njheaders		njheaders
shuf_files		shuf_files
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
align.c		align.c
buildtree.c		buildtree.c
bytescale.c		bytescale.c
cluster.c		cluster.c
co2mco.c		co2mco.c
command_composite.c		command_composite.c
command_dist.c		command_dist.c
command_dist_wrapper.c		command_dist_wrapper.c
command_set.c		command_set.c
command_shuffle.c		command_shuffle.c
distancemat.c		distancemat.c
dnj.c		dnj.c
filebuff.c		filebuff.c
global_basic.c		global_basic.c
hclust.c		hclust.c
iseq2comem.c		iseq2comem.c
kssdtree.py		kssdtree.py
matrix.c		matrix.c
mman.c		mman.c
mytime.c		mytime.c
nj.c		nj.c
nwck.c		nwck.c
pherror.c		pherror.c
phy.c		phy.c
pydnj.c		pydnj.c
pykssd.c		pykssd.c
pynj.c		pynj.c
qseqs.c		qseqs.c
sequence.c		sequence.c
setup.py		setup.py
str.c		str.c
tmp.c		tmp.c
toolutils.py		toolutils.py
tree.c		tree.c
util.c		util.c
vector.c		vector.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kssdtree: an interactive Python package for phylogenetic analysis based on sketching technique

1. Installation

1.1 Linux

1.2 MacOS

1.3 Windows

2. Quick-Tutorial

2.1 Routine Pipeline

2.2 Reference Subtraction Pipeline

2.3 GTDB-based Phylogenetic Placement Pipeline

3. How to cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

yhlink/kssdtree

Folders and files

Latest commit

History

Repository files navigation

Kssdtree: an interactive Python package for phylogenetic analysis based on sketching technique

1. Installation

1.1 Linux

1.2 MacOS

1.3 Windows

2. Quick-Tutorial

2.1 Routine Pipeline

2.2 Reference Subtraction Pipeline

2.3 GTDB-based Phylogenetic Placement Pipeline

3. How to cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages