Image Compression with Multicut

Use DL and RL to predict weights for Multicut to segment image into well compressable image slices. Position in original image is stored as well, so reassembly to original image is possible.

Setup and Dependencies

Execute ./setup.sh. This creates the necessary directories, fetches and patches dependencies.

Download the dataset from ImageNet via kaggle. Put the contents of the images in a new directory called /dataset. The images can be converted to the correct format with the image_converter executable.

Libraries

opencv (4.12.0) (+ ximgproc module)
libtorch (2.9.1)

Configuration

See include/configuration.h for configuration options.

Build

NOTE: Build only tested with CUDA architecture 8.9 and CUDA Version 12.6, but other versions may work as well. Change CMakeLists.txt to try different versions.

./build.sh

or build only single targets: ./build.sh <target>, e.g. ./build.sh image_converter

Execute

./build/compress
./build/reassemble
./build/image_converter
./build/pretraining
./build/training

Training

In pre-training, the network learns the edge costs for multicut segmentation based on a segmentation algorithm set in configuration.h. The actual training uses the cumulative sum of the compressed slices' image size as a reward for online reinforcement learning. For this, the costs for the multicut solver are sampled from probability distributions parametrized by the predicted network edge weights.

Multicut

This project uses RAMA which solves the Multicut Problem on the GPU.

Future Work

To make the project work, there are a few improvments necessary:

Fix the oversegmentation of the pre-trained model. The recall of predicted "cut" edges looks good (>0.9), but the precision is bad (<0.25). This leads to oversegmentation which the subsequent RL training can't work with.
Implement actor-critic RL pattern. Right now, the RL training is a primitive stateless REINFORCE adaptation, which does not converge.
Use different image format. At the moment, PNG is used to encode the images. Since the whole pipeline runs on the GPU, a custom PNG file size estimator had to be generated. To encode images directly on the GPU, one could use nvJPEG2k (or nvPNG, if it is released by then).

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
external		external
include		include
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
build.sh		build.sh
clean.sh		clean.sh
rama_cuda_patch.cu		rama_cuda_patch.cu
rama_cuda_patch.h		rama_cuda_patch.h
readme.md		readme.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Compression with Multicut

Setup and Dependencies

Libraries

Configuration

Build

Execute

Training

Multicut

Future Work

About

Uh oh!

Languages

moritzschramm/image-compression-cpp

Folders and files

Latest commit

History

Repository files navigation

Image Compression with Multicut

Setup and Dependencies

Libraries

Configuration

Build

Execute

Training

Multicut

Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages