Data-Visualization-project

This project explores the Breast Cancer Wisconsin (Diagnostic) dataset from the UCI Machine Learning Repository. The aim is to apply a range of visualisation techniques to better understand the dataset, highlight patterns, and distinguish between benign and malignant diagnoses.

Project Overview

The project demonstrates how visualisation can support exploratory data analysis by:

Highlighting correlations between features.
Showing class separability (benign vs malignant) across multiple attributes.
Identifying trends, outliers, and redundancies in the dataset.

The workflow includes:

Heatmap (Correlation Plot) – Uses Pearson correlation to examine relationships between features.
Beeswarm Plot (Quasirandom) – Helps avoid overplotting and makes the separation between benign and malignant cases clearer.
Histogram – Displays feature distributions, highlighting which features best separate diagnoses.
Scatter Plots with Ellipses – Provides confidence regions to show overlap or separability between classes.

Dataset

The dataset contains diagnostic features of breast cancer cases, with each feature having three variations:

Mean
Standard error (SE)
Worst

The target variable indicates whether a case is Benign (B) or Malignant (M).

Visualisations

Heatmap: Shows the strength and direction of correlations between features.
Beeswarm plots: Display individual data points without overlap, aiding visual comparison across 30 features.
Histograms: Reveal variance within features and show how well they separate diagnoses.
Scatter plots with ellipses: Illustrate 95% confidence regions to highlight areas of overlap and separation between classes.

Key Findings

Visualisation helps in identifying redundant features with high correlation.
Certain features clearly separate benign and malignant cases, while others show heavy overlap.
Scatter plots with confidence ellipses provide a more nuanced view of class separability.

Technologies Used

Python
Matplotlib / Seaborn
Pandas / NumPy

How to Run

Clone this repository:

git clone https://github.com/yourusername/breast-cancer-visualisation.git
cd breast-cancer-visualisation

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
Data-Visualization-project.Rproj		Data-Visualization-project.Rproj
Final_project.Rmd		Final_project.Rmd
Final_project.pdf		Final_project.pdf
README.md		README.md
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Visualization-project

Project Overview

Dataset

Visualisations

Key Findings

Technologies Used

How to Run

About

Uh oh!

Releases

Packages

Languages

Franosei/Data-Visualization-project

Folders and files

Latest commit

History

Repository files navigation

Data-Visualization-project

Project Overview

Dataset

Visualisations

Key Findings

Technologies Used

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages