Capybara is designed with three goals:
- Lightweight default install:
pip install capybara-docsaidinstalls only the coreutils/structures/visionmodules, without forcing heavy inference dependencies. - Inference backends as opt-in extras: install ONNX Runtime / OpenVINO / TorchScript only when you need them via extras.
- Lower risk: enforce quality gates with ruff/pyright/pytest and target 90% line coverage for the core codebase.
What you get:
- Image tools (
capybara.vision): I/O, color conversion, resize/rotate/pad/crop, and video frame extraction. - Geometry structures (
capybara.structures):Box/Boxes,Polygon/Polygons,Keypoints, plus helper functions like IoU. - Inference wrappers (optional):
capybara.onnxengine/capybara.openvinoengine/capybara.torchengine. - Feature extras (optional):
visualization(drawing tools),ipcam(simple web demo),system(system info tools). - Utilities (
capybara.utils):PowerDict,Timer,make_batch,download_from_google, and other common helpers.
pip install capybara-docsaid
python -c "import capybara; print(capybara.__version__)"To learn more about installation and usage, see Capybara Documents.
The documentation includes detailed guides and common FAQs for this project.
pip install capybara-docsaid# ONNX Runtime (CPU)
pip install "capybara-docsaid[onnxruntime]"
# ONNX Runtime (GPU)
pip install "capybara-docsaid[onnxruntime-gpu]"
# OpenVINO runtime
pip install "capybara-docsaid[openvino]"
# TorchScript runtime
pip install "capybara-docsaid[torchscript]"
# Install everything
pip install "capybara-docsaid[all]"# Visualization (matplotlib/pillow)
pip install "capybara-docsaid[visualization]"
# IPCam app (flask)
pip install "capybara-docsaid[ipcam]"
# System info (psutil)
pip install "capybara-docsaid[system]"If you want OpenVINO inference and the IPCam features, install:
# OpenVINO + IPCam
pip install "capybara-docsaid[openvino,ipcam]"pip install git+https://github.com/DocsaidLab/Capybara.gitSome features require OS-level codecs / image I/O / PDF tools (install as needed):
PyTurboJPEG(faster JPEG I/O): requires the TurboJPEG library.pillow-heif(HEIC/HEIF support): requires libheif.pdf2image(PDF to images): requires Poppler.- Video frame extraction: installing
ffmpegis recommended (more stable OpenCV video decoding).
sudo apt install ffmpeg libturbojpeg libheif-dev poppler-utilsbrew install jpeg-turbo ffmpeg libheif popplerIf you're using onnxruntime-gpu, install the compatible CUDA/cuDNN version for your ORT version:
- Capybara images are represented as
numpy.ndarray. By default, they follow OpenCV conventions: BGR, and shape is typically(H, W, 3). - If you prefer working in RGB, use
imread(..., color_base="RGB")or convert withimcvtcolor(img, "BGR2RGB").
from capybara import imread, imwrite
img = imread("your_image.jpg")
if img is None:
raise RuntimeError("Failed to read image.")
imwrite(img, "out.jpg")Notes:
imreadreturnsNonewhen it fails to decode an image (if the path doesn't exist, it raisesFileExistsError).imreadalso supports.heic(requirespillow-heif+ OS-level libheif).
With imresize, you can pass None in size to keep the aspect ratio and have the other dimension inferred automatically.
import numpy as np
from capybara import BORDER, imresize, pad
img = np.zeros((480, 640, 3), dtype=np.uint8)
img = imresize(img, (320, None)) # (height, width)
img = pad(img, pad_size=(8, 8), pad_mode=BORDER.REPLICATE)import numpy as np
from capybara import imcvtcolor
img = np.zeros((240, 320, 3), dtype=np.uint8) # BGR
gray = imcvtcolor(img, "BGR2GRAY") # grayscale
rgb = imcvtcolor(img, "BGR2RGB") # RGBimport numpy as np
from capybara import Polygon, imrotate, imwarp_quadrangle
img = np.zeros((240, 320, 3), dtype=np.uint8)
rot = imrotate(img, angle=15, expand=True) # Angle definition matches OpenCV: positive values rotate counterclockwise
poly = Polygon([[10, 10], [200, 20], [190, 120], [20, 110]])
patch = imwarp_quadrangle(img, poly) # 4-point perspective warpimport numpy as np
from capybara import Box, Boxes, imcropbox, imcropboxes
img = np.zeros((240, 320, 3), dtype=np.uint8)
crop1 = imcropbox(img, Box([10, 20, 110, 120]), use_pad=True)
crop_list = imcropboxes(
img,
Boxes([[0, 0, 10, 10], [100, 100, 400, 300]]),
use_pad=True,
)Morphology operators live in capybara.vision.morphology (not in the top-level capybara namespace).
import numpy as np
from capybara import imbinarize
from capybara.vision.morphology import imopen
img = np.zeros((240, 320, 3), dtype=np.uint8)
mask = imbinarize(img) # OTSU + binary
mask = imopen(mask, ksize=3) # Opening to remove small noiseimport numpy as np
from capybara import Box, Boxes, pairwise_iou
boxes_a = Boxes([[10, 10, 20, 20], [30, 30, 60, 60]])
boxes_b = Boxes(np.array([[12, 12, 18, 18]], dtype=np.float32))
print(pairwise_iou(boxes_a, boxes_b))
box = Box([0.1, 0.2, 0.9, 0.8], is_normalized=True).convert("XYWH")
print(box.numpy())from capybara import Polygon, polygon_iou
p1 = Polygon([[0, 0], [10, 0], [10, 10], [0, 10]])
p2 = Polygon([[5, 5], [15, 5], [15, 15], [5, 15]])
print(polygon_iou(p1, p2))import numpy as np
from capybara import img_to_b64str, npy_to_b64str
from capybara.vision.improc import b64str_to_img, b64str_to_npy
img = np.zeros((32, 32, 3), dtype=np.uint8)
b64_img = img_to_b64str(img) # JPEG bytes -> base64 string
if b64_img is None:
raise RuntimeError("Failed to encode image into base64.")
img2 = b64str_to_img(b64_img) # base64 string -> numpy image
vec = np.arange(8, dtype=np.float32)
b64_vec = npy_to_b64str(vec)
vec2 = b64str_to_npy(b64_vec, dtype="float32")from capybara.vision.improc import pdf2imgs
pages = pdf2imgs("file.pdf") # list[np.ndarray], each page is BGR image
if pages is None:
raise RuntimeError("Failed to decode PDF.")
print(len(pages))Install first: pip install "capybara-docsaid[visualization]".
import numpy as np
from capybara import Box
from capybara.vision.visualization.draw import draw_box
img = np.zeros((240, 320, 3), dtype=np.uint8)
img = draw_box(img, Box([10, 20, 100, 120]))IpcamCapture itself does not depend on Flask; you only need the ipcam extra to use WebDemo.
from capybara.vision.ipcam.camera import IpcamCapture
cap = IpcamCapture(url=0, color_base="BGR") # or provide an RTSP/HTTP URL
frame = next(cap)Web demo (install first: pip install "capybara-docsaid[ipcam]"):
from capybara.vision.ipcam.app import WebDemo
WebDemo("rtsp://<ipcam-url>").run(port=5001)Install first: pip install "capybara-docsaid[system]".
from capybara.utils.system_info import get_system_info
print(get_system_info())from capybara import video2frames_v2
frames = video2frames_v2("demo.mp4", frame_per_sec=2, max_size=1280)
print(len(frames))Inference backends are optional; install the corresponding extras before importing the relevant engine modules.
Note: TorchScript runtime is named Runtime.pt in code (corresponding extra: torchscript).
Runtime (capybara.runtime.Runtime) |
Backend name | Provider / device |
|---|---|---|
onnx |
cpu |
["CPUExecutionProvider"] |
onnx |
cuda |
["CUDAExecutionProvider"(device_id), "CPUExecutionProvider"] |
onnx |
tensorrt |
["TensorrtExecutionProvider"(device_id), "CUDAExecutionProvider"(device_id), "CPUExecutionProvider"] |
onnx |
tensorrt_rtx |
["NvTensorRTRTXExecutionProvider"(device_id), "CUDAExecutionProvider"(device_id), "CPUExecutionProvider"] |
openvino |
cpu |
device="CPU" |
openvino |
gpu |
device="GPU" |
openvino |
npu |
device="NPU" |
pt |
cpu |
torch.device("cpu") |
pt |
cuda |
torch.device("cuda") |
from capybara.runtime import Runtime
print(Runtime.onnx.auto_backend_name()) # Priority: cuda -> tensorrt_rtx -> tensorrt -> cpu
print(Runtime.openvino.auto_backend_name()) # Priority: gpu -> npu -> cpu
print(Runtime.pt.auto_backend_name()) # Priority: cuda -> cpuimport numpy as np
from capybara.onnxengine import EngineConfig, ONNXEngine
engine = ONNXEngine(
"model.onnx",
backend="cpu",
config=EngineConfig(enable_io_binding=False),
)
outputs = engine.run({"input": np.ones((1, 3, 224, 224), dtype=np.float32)})
print(outputs.keys())
print(engine.summary())import numpy as np
from capybara.openvinoengine import OpenVINOConfig, OpenVINODevice, OpenVINOEngine
engine = OpenVINOEngine(
"model.xml",
device=OpenVINODevice.cpu,
config=OpenVINOConfig(num_requests=2),
)
outputs = engine.run({"input": np.ones((1, 3), dtype=np.float32)})
print(outputs.keys())import numpy as np
from capybara.torchengine import TorchEngine
engine = TorchEngine("model.pt", device="cpu")
outputs = engine.run({"image": np.zeros((1, 3, 224, 224), dtype=np.float32)})
print(outputs.keys())All engines provide benchmark(...) for quick throughput/latency measurements.
import numpy as np
from capybara.onnxengine import ONNXEngine
engine = ONNXEngine("model.onnx", backend="cpu")
dummy = np.zeros((1, 3, 224, 224), dtype=np.float32)
print(engine.benchmark({"input": dummy}, repeat=50, warmup=5))EngineConfig / OpenVINOConfig / TorchEngineConfig are passed through to the underlying runtime as-is.
from capybara.onnxengine import EngineConfig, ONNXEngine
engine = ONNXEngine(
"model.onnx",
backend="cuda",
config=EngineConfig(
provider_options={
"CUDAExecutionProvider": {
"enable_cuda_graph": True,
},
},
),
)Before merging, this project requires:
ruff check .
ruff format --check .
pyright
python -m pytest --cov=capybara --cov-config=.coveragerc --cov-report=termNotes:
- Coverage gate is 90% line coverage (rules defined in
.coveragerc). - Heavy / environment-dependent modules are excluded from the default coverage gate to keep CI reproducible and maintainable.
git clone https://github.com/DocsaidLab/Capybara.git
cd Capybara
bash docker/build.bashRun:
docker run --rm -it capybara_docsaid bashIf you need GPU access inside the container, use the NVIDIA container runtime (e.g. --gpus all).
python -m pytest -vvApache-2.0, see LICENSE.
@misc{lin2025capybara,
author = {Kun-Hsiang Lin*, Ze Yuan*},
title = {Capybara: An Integrated Python Package for Image Processing and Deep Learning.},
year = {2025},
publisher = {GitHub},
howpublished = {\\url{https://github.com/DocsaidLab/Capybara}},
note = {* equal contribution}
}