GitHub - pedbrgs/PyCCEA: A Python package of cooperative co-evolutionary algorithms for feature selection in high-dimensional data.

💡 Overview

PyCCEA is an open-source package developed as part of ongoing doctoral research. It provides cooperative co-evolutionary strategies tailored for feature selection in large-scale and high-dimensional problems. The framework adopts a modular, decomposition-based approach and is intended for researchers and practitioners tackling complex feature selection tasks.

Note: PyCCEA is a work in progress. Stay tuned for improvements and new algorithm implementations.

💻 Installation

To install the PyCCEA package directly from PyPI, use the following command in a Python ≥ 3.10 environment:

pip install pyccea

Alternatively, if you want to install the latest version directly from the GitHub:

pip install git+https://github.com/pedbrgs/pyccea.git

Ensure you have pip and an active internet connection to download dependencies.

🔆 Quickstart

This quickstart demonstrates how to use the CCFSRFG1 algorithm — a CCEA variant with random feature grouping — to perform feature selection on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.

In this example, you will:

Load the dataset using the DataLoader utility.
Configure the dataset and algorithm from .toml files.
Run the optimization process.

import toml
import importlib.resources
from pyccea.coevolution import CCFSRFG1
from pyccea.utils.datasets import DataLoader

# Load dataset parameters
with importlib.resources.open_text("pyccea.parameters", "dataloader.toml") as toml_file:
    data_conf = toml.load(toml_file)

# Initialize the DataLoader with the specified dataset and configuration
dataloader = DataLoader(dataset="wdbc", conf=data_conf)
# Prepare the dataset for the algorithm (e.g., preprocessing, splitting)
dataloader.get_ready()

# Load algorithm-specific parameters
with importlib.resources.open_text("pyccea.parameters", "ccfsrfg.toml") as toml_file:
    ccea_conf = toml.load(toml_file)

# Initialize the cooperative co-evolutionary algorithm
ccea = CCFSRFG1(data=dataloader, conf=ccea_conf, verbose=False)
# Start the optimization process
ccea.optimize()

The best feature subset found is stored in the attribute best_context_vector, a binary array where 1 indicates a selected feature and 0 indicates an unselected one.

📁 Custom datasets

Custom datasets are supported as long as they conform to the PyCCEA input schema (.parquet file with feature columns and a label column). To register a custom dataset at runtime, add an entry to DataLoader and execute the standard preprocessing, splitting, and normalization pipeline:

from pyccea.utils.datasets import DataLoader

# Path to your dataset in PyCCEA schema
data_path = "./custom_data.parquet"
dataset_name = "custom_data"

# Register the dataset path and task
DataLoader.DATASETS = {
    "task": "classification"  # or regression
    "file": data_path
}

# Load and prepare the dataset
dataloader = DataLoader(
    dataset_name=dataset_name,
    conf=data_conf
)
dataloader.get_ready()

If you prefer ready-to-use data, additional datasets already normalized to the PyCCEA format are available in the High-Dimensional datasets repository.

📚 Documentation

Full documentation, including a comprehensive user guide, step-by-step tutorials, an API reference, and contribution guidelines, is available at PyCCEA docs.

📜 Citation info

If you are using these codes in any way, please cite the following paper:

@article{PyCCEA,
    title = {PyCCEA: A Python package of cooperative co-evolutionary algorithms for feature selection in high-dimensional data},
    author = {Venancio, Pedro Vinicius A. B. and Batista, Lucas S.},
    journal = {Journal of Open Source Software},
    volume = {10},
    number = {112},
    pages = {8348},
    year = {2025}
}

📫 Contact

Please send any bug reports, questions or suggestions directly in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 472 Commits
.github/workflows		.github/workflows
docs		docs
paper		paper
pyccea		pyccea
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💡 Overview

💻 Installation

🔆 Quickstart

📁 Custom datasets

📚 Documentation

📜 Citation info

📫 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

pedbrgs/PyCCEA

Folders and files

Latest commit

History

Repository files navigation

💡 Overview

💻 Installation

🔆 Quickstart

📁 Custom datasets

📚 Documentation

📜 Citation info

📫 Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages