Fraud Detection System

A comprehensive machine learning project for detecting fraudulent financial transactions using Python and scikit-learn.

📊 Project Overview

This project implements a fraud detection system that analyzes financial transaction data to identify potentially fraudulent activities. The system uses a Random Forest classifier to predict fraudulent transactions with high accuracy.

🎯 Key Features

Data Analysis: Comprehensive exploratory data analysis of 6.3M+ transactions
Feature Engineering: Creation of relevant features like balance changes and zero balance indicators
Machine Learning: Random Forest classifier for fraud detection
Performance Metrics: High accuracy with balanced precision and recall
Visualization: Multiple charts and plots for data insights

📈 Dataset Information

Total Transactions: 6,362,620
Fraudulent Transactions: 8,213 (0.13% fraud rate)
Features: 11 columns including transaction type, amount, balances, and fraud indicators
Data Quality: Clean dataset with no missing values or duplicates

Dataset Features

Feature	Description
`step`	Time step (hour)
`type`	Transaction type (PAYMENT, TRANSFER, CASH_OUT, DEBIT, CASH_IN)
`amount`	Transaction amount
`nameOrig`	Origin customer ID
`oldbalanceOrg`	Origin account balance before transaction
`newbalanceOrig`	Origin account balance after transaction
`nameDest`	Destination customer ID
`oldbalanceDest`	Destination account balance before transaction
`newbalanceDest`	Destination account balance after transaction
`isFraud`	Fraud indicator (target variable)
`isFlaggedFraud`	Flagged fraud indicator

🔍 Key Insights

Transaction Types Analysis

CASH_OUT: Most common transaction type (2.2M transactions)
TRANSFER: Highest fraud rate (0.77%)
CASH_IN, DEBIT, PAYMENT: Zero fraud rate

Fraud Patterns

Average Amount: Fraudulent transactions average $1.47M vs $178K for legitimate
Balance Changes: Strong correlation (0.36) with fraud detection
Time Patterns: Fraud occurs in specific time intervals

🛠️ Technical Implementation

Data Preprocessing

Handled categorical variables using Label Encoding
Created engineered features:
- balance_change: Difference between old and new balance
- is_zero_balance: Indicator for zero balance transactions
Applied StandardScaler for feature normalization

Model Architecture

Algorithm: Random Forest Classifier
Parameters: n_estimators=2, random_state=42
Train/Test Split: 80/20 with stratification

Performance Metrics

Accuracy: 99.97%
Precision: 96% (fraud class)
Recall: 76% (fraud class)
F1-Score: 85% (fraud class)

📊 Visualizations

The project includes several visualizations:

Transaction type distribution
Amount distribution histograms
Fraud rate over time
Balance distribution plots
Correlation heatmaps

🚀 Usage

Prerequisites

pip install pandas numpy matplotlib seaborn scikit-learn

Running the Project

Ensure you have the Fraud.csv dataset in the project directory
Open Fraud.ipynb in Jupyter Notebook or JupyterLab
Run all cells to execute the complete analysis

Expected Output

Data exploration results
Feature engineering insights
Model training and evaluation
Performance metrics and predictions

📋 Project Structure

Fraud/
├── Fraud.ipynb          # Main Jupyter notebook
├── Fraud.csv            # Dataset (not included in repo)
└── README.md           # This file

🔧 Model Performance

The Random Forest model demonstrates excellent performance:

High Accuracy: 99.97% overall accuracy
Low False Positives: Only 47 false alarms out of 1.27M predictions
Good Recall: Captures 76% of actual fraud cases
Strong Precision: 96% of predicted fraud cases are actual fraud

🎯 Business Impact

This fraud detection system can:

Reduce Financial Losses: Early detection of fraudulent transactions
Improve Customer Trust: Minimize false positives that affect legitimate customers
Enhance Security: Real-time monitoring capabilities
Scale Operations: Handle large transaction volumes efficiently

🔮 Future Enhancements

Potential improvements include:

Real-time transaction monitoring
Integration with existing banking systems
Additional ML algorithms (XGBoost, Neural Networks)
API development for production deployment
Advanced feature engineering
Ensemble methods for improved performance

📝 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and enhancement requests.

Note: The dataset (Fraud.csv) is not included in this repository due to size constraints. Please ensure you have the dataset before running the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Fraud.ipynb		Fraud.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection System

📊 Project Overview

🎯 Key Features

📈 Dataset Information

Dataset Features

🔍 Key Insights

Transaction Types Analysis

Fraud Patterns

🛠️ Technical Implementation

Data Preprocessing

Model Architecture

Performance Metrics

📊 Visualizations

🚀 Usage

Prerequisites

Running the Project

Expected Output

📋 Project Structure

🔧 Model Performance

🎯 Business Impact

🔮 Future Enhancements

📝 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

ManivardhanDonuri/Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection System

📊 Project Overview

🎯 Key Features

📈 Dataset Information

Dataset Features

🔍 Key Insights

Transaction Types Analysis

Fraud Patterns

🛠️ Technical Implementation

Data Preprocessing

Model Architecture

Performance Metrics

📊 Visualizations

🚀 Usage

Prerequisites

Running the Project

Expected Output

📋 Project Structure

🔧 Model Performance

🎯 Business Impact

🔮 Future Enhancements

📝 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages