This project explores the Breast Cancer Wisconsin (Diagnostic) dataset from the UCI Machine Learning Repository. The aim is to apply a range of visualisation techniques to better understand the dataset, highlight patterns, and distinguish between benign and malignant diagnoses.
The project demonstrates how visualisation can support exploratory data analysis by:
- Highlighting correlations between features.
- Showing class separability (benign vs malignant) across multiple attributes.
- Identifying trends, outliers, and redundancies in the dataset.
The workflow includes:
- Heatmap (Correlation Plot) – Uses Pearson correlation to examine relationships between features.
- Beeswarm Plot (Quasirandom) – Helps avoid overplotting and makes the separation between benign and malignant cases clearer.
- Histogram – Displays feature distributions, highlighting which features best separate diagnoses.
- Scatter Plots with Ellipses – Provides confidence regions to show overlap or separability between classes.
The dataset contains diagnostic features of breast cancer cases, with each feature having three variations:
- Mean
- Standard error (SE)
- Worst
The target variable indicates whether a case is Benign (B) or Malignant (M).
- Heatmap: Shows the strength and direction of correlations between features.
- Beeswarm plots: Display individual data points without overlap, aiding visual comparison across 30 features.
- Histograms: Reveal variance within features and show how well they separate diagnoses.
- Scatter plots with ellipses: Illustrate 95% confidence regions to highlight areas of overlap and separation between classes.
- Visualisation helps in identifying redundant features with high correlation.
- Certain features clearly separate benign and malignant cases, while others show heavy overlap.
- Scatter plots with confidence ellipses provide a more nuanced view of class separability.
- Python
- Matplotlib / Seaborn
- Pandas / NumPy
- Clone this repository:
git clone https://github.com/yourusername/breast-cancer-visualisation.git cd breast-cancer-visualisation