A machine learning pipeline for analyzing genetic syndrome embeddings from image data. Developed for Apollo Solutions' ML Developer Practical Test.
- Clone the repository:
git clone https://github.com/yourusername/apollo-genetic-analysis.git- Install dependencies
pip install -r requirements.txtgenetic-syndrome-analysis/
βββ data/
β βββ mini_gm_public_v0.1.p # Raw dataset (embeddings)
βββ results/
β βββ plots/ # Generated visualizations
β β βββ auc_comparison.png
β β βββ class_distribution.png
β β βββ tsne_visualization.png
β βββ flattened_data.pkl # Processed dataset
β βββ knn_results.json # Classification metrics
βββ scripts/
β βββ data_processing.py # Data loading & preprocessing
β βββ eda.py # Exploratory data analysis
β βββ tsne_visualization.py # Dimensionality reduction
β βββ knn_classification.py # KNN implementation
β βββ generate_plots.py # Metric visualizations
βββ main.py # Main pipeline controller
βββ requirements.txt # Dependency list
βββ README.md # This documentpython main.py# Data preprocessing
python scripts/data_processing.py
# Generate EDA visualizations
python scripts/eda.py
# Create t-SNE plot
python scripts/tsne_visualization.py
# Run KNN classification
python scripts/knn_classification.py
# Generate performance plots
python scripts/generate_plots.py