blockcopy

Copy large files (VM devices, LVM snapshots...) efficiently over network.

Designed for copying from/to NVMe disks over gigabit network. Uses threadpool for computing hashes so that copy speed is not limited by CPU.

Installation

You can just download the file blockcopy.py to /usr/local/bin and run it.

curl -fsSL https://raw.githubusercontent.com/messa/blockcopy/v0.0.2/blockcopy.py \
  -o /usr/local/bin/blockcopy
chmod +x /usr/local/bin/blockcopy

Or you can install this package using pip install and then run the script as blockcopy:

python3 -m pip install https://github.com/messa/blockcopy/archive/refs/tags/v0.0.2.zip

Usage

blockcopy checksum /dev/destination \
  | ssh srchost blockcopy retrieve /dev/source \
  | blockcopy save /dev/destination

Or:

ssh dsthost blockcopy checksum /dev/destination \
  | blockcopy retrieve /dev/source \
  | ssh dsthost blockcopy save /dev/destination

Options

checksum:

--progress - show progress info
--start OFFSET - start reading from this byte offset
--end OFFSET - stop reading at this byte offset

retrieve:

--lzma - compress blocks using LZMA before sending (useful for slow networks)

save:

--truncate - truncate destination file to match source file size
-t, --times - preserve timestamps (atime, mtime) from source file
-p, --perms - preserve permissions from source file
-o, --owner - preserve owner from source file
-g, --group - preserve group from source file
--numeric-ids - use numeric uid/gid instead of looking up user/group names

How it works

The tool uses a three-stage pipeline connected via stdin/stdout:

checksum - Reads the destination file/device, splits it into 128 KB blocks, computes SHA3-512 hash for each block, and outputs the hashes as a binary stream.
retrieve - Reads hashes from stdin while simultaneously reading the source file. For each block, it compares the destination hash with the source block's hash. Only blocks that differ are sent to the output stream.
save - Reads block data from stdin and writes them to the destination file/device at the correct positions.

Each stage uses a ThreadPoolExecutor with multiple worker threads for hash computation, so the copy speed is not limited by single-core CPU performance. This is especially important for fast NVMe disks where sequential read speed can exceed what a single CPU core can hash.

Alternative software

rsync
- Some versions of rsync do not support syncing block device contents.
- The rolling hash algorithm can become too slow on large files (or large block devices). I've experienced slowdowns to 8-15 MB/s when 100 MB/s bandwidth was available.
https://github.com/bscp-tool/bscp/blob/master/bscp
- Slow hash computing (no threadpool)
https://github.com/theraser/blocksync
- Slow hash computing (no threadpool)

Internet discussions I found relevant to the topic of copying block devices over network:

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
add_recursive_plan_v2.md		add_recursive_plan_v2.md
add_recursive_plan_v2.review_v1.md		add_recursive_plan_v2.review_v1.md
blockcopy.py		blockcopy.py
checksum_to_text.py		checksum_to_text.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blockcopy

Installation

Usage

Options

How it works

Alternative software

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

messa/blockcopy

Folders and files

Latest commit

History

Repository files navigation

blockcopy

Installation

Usage

Options

How it works

Alternative software

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages