Skip to content

ManivardhanDonuri/Fraud-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Fraud Detection System

A comprehensive machine learning project for detecting fraudulent financial transactions using Python and scikit-learn.

📊 Project Overview

This project implements a fraud detection system that analyzes financial transaction data to identify potentially fraudulent activities. The system uses a Random Forest classifier to predict fraudulent transactions with high accuracy.

🎯 Key Features

  • Data Analysis: Comprehensive exploratory data analysis of 6.3M+ transactions
  • Feature Engineering: Creation of relevant features like balance changes and zero balance indicators
  • Machine Learning: Random Forest classifier for fraud detection
  • Performance Metrics: High accuracy with balanced precision and recall
  • Visualization: Multiple charts and plots for data insights

📈 Dataset Information

  • Total Transactions: 6,362,620
  • Fraudulent Transactions: 8,213 (0.13% fraud rate)
  • Features: 11 columns including transaction type, amount, balances, and fraud indicators
  • Data Quality: Clean dataset with no missing values or duplicates

Dataset Features

Feature Description
step Time step (hour)
type Transaction type (PAYMENT, TRANSFER, CASH_OUT, DEBIT, CASH_IN)
amount Transaction amount
nameOrig Origin customer ID
oldbalanceOrg Origin account balance before transaction
newbalanceOrig Origin account balance after transaction
nameDest Destination customer ID
oldbalanceDest Destination account balance before transaction
newbalanceDest Destination account balance after transaction
isFraud Fraud indicator (target variable)
isFlaggedFraud Flagged fraud indicator

🔍 Key Insights

Transaction Types Analysis

  • CASH_OUT: Most common transaction type (2.2M transactions)
  • TRANSFER: Highest fraud rate (0.77%)
  • CASH_IN, DEBIT, PAYMENT: Zero fraud rate

Fraud Patterns

  • Average Amount: Fraudulent transactions average $1.47M vs $178K for legitimate
  • Balance Changes: Strong correlation (0.36) with fraud detection
  • Time Patterns: Fraud occurs in specific time intervals

🛠️ Technical Implementation

Data Preprocessing

  • Handled categorical variables using Label Encoding
  • Created engineered features:
    • balance_change: Difference between old and new balance
    • is_zero_balance: Indicator for zero balance transactions
  • Applied StandardScaler for feature normalization

Model Architecture

  • Algorithm: Random Forest Classifier
  • Parameters: n_estimators=2, random_state=42
  • Train/Test Split: 80/20 with stratification

Performance Metrics

Accuracy: 99.97%
Precision: 96% (fraud class)
Recall: 76% (fraud class)
F1-Score: 85% (fraud class)

📊 Visualizations

The project includes several visualizations:

  • Transaction type distribution
  • Amount distribution histograms
  • Fraud rate over time
  • Balance distribution plots
  • Correlation heatmaps

🚀 Usage

Prerequisites

pip install pandas numpy matplotlib seaborn scikit-learn

Running the Project

  1. Ensure you have the Fraud.csv dataset in the project directory
  2. Open Fraud.ipynb in Jupyter Notebook or JupyterLab
  3. Run all cells to execute the complete analysis

Expected Output

  • Data exploration results
  • Feature engineering insights
  • Model training and evaluation
  • Performance metrics and predictions

📋 Project Structure

Fraud/
├── Fraud.ipynb          # Main Jupyter notebook
├── Fraud.csv            # Dataset (not included in repo)
└── README.md           # This file

🔧 Model Performance

The Random Forest model demonstrates excellent performance:

  • High Accuracy: 99.97% overall accuracy
  • Low False Positives: Only 47 false alarms out of 1.27M predictions
  • Good Recall: Captures 76% of actual fraud cases
  • Strong Precision: 96% of predicted fraud cases are actual fraud

🎯 Business Impact

This fraud detection system can:

  • Reduce Financial Losses: Early detection of fraudulent transactions
  • Improve Customer Trust: Minimize false positives that affect legitimate customers
  • Enhance Security: Real-time monitoring capabilities
  • Scale Operations: Handle large transaction volumes efficiently

🔮 Future Enhancements

Potential improvements include:

  • Real-time transaction monitoring
  • Integration with existing banking systems
  • Additional ML algorithms (XGBoost, Neural Networks)
  • API development for production deployment
  • Advanced feature engineering
  • Ensemble methods for improved performance

📝 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and enhancement requests.


Note: The dataset (Fraud.csv) is not included in this repository due to size constraints. Please ensure you have the dataset before running the notebook.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published