This project implements a CART (Classification and Regression Tree) Classifier in Python, designed to handle classification tasks. It provides a robust framework for training, testing, and evaluating decision trees using gini or entropy criteria. A Streamlit web interface is integrated to provide a user-friendly platform for interacting with the classifier.
- Custom CART Implementation:
- Supports both Gini Index and Entropy as splitting criteria.
- Handles dataset partitioning, threshold calculation, and weighted impurity computations.
- Visualization:
- Displays the confusion matrix and other metrics after evaluation.
- Interactive Tree Traversal:
- Allows users to visualize the decision tree traversal logic.
- User Interface:
- Upload datasets via an intuitive Streamlit web app.
- Configure parameters such as
max_depth,min_samples, andcriterionthrough sliders and dropdowns.
- Real-Time Evaluation:
- Displays metrics like Precision, Recall, and F1-Score instantly after running the classifier.
- Visualizes the decision tree structure and confusion matrix.
- Python 3.8 or above
-
Clone the repository:
git clone https://github.com/yourusername/cart-classifier.git cd cart-classifier -
Install required libraries:
The key libraries include:
Streamlitnumpypandasscikit-learnmatplotlib
- Modify the script
CART_Classifier.pyto load and train your dataset. - Train and evaluate the classifier by calling the methods:
from CART_Classifier import CartClassifier from DataHandler import DataHandler data = DataHandler("path_to_dataset.csv", label_column="target_column") classifier = CartClassifier(max_depth=5, min_samples=2, criterion="gini") classifier.fit(data) precision, recall, f1 = classifier.evaluate(data, "conf_matrix.png") print(f"Precision: {precision}, Recall: {recall}, F1-Score: {f1}")
- Start the Streamlit server:
streamlit run app/main.py
- Open your browser and navigate to http://localhost:8501.
- Use the app to:
- Upload a dataset.
- Configure training parameters.
- Train, evaluate, and visualize the decision tree.
The Output generated is a code and an image representing the decision tree and the confusion matrix of the model
Example Code
import sys
def predict(x):
if x[5] <= 9.795053004:
if x[1] <= 9.0:
if x[3] <= 21.0:
if x[0] <= 65.0:
return 0
else:
return 0
else:
if x[0] <= 48.0:
return 1
else:
return 0
else:
if x[3] <= 13.0:
return 0
else:
if x[3] <= 24.0:
return 1
else:
return 0
else:
if x[1] <= 7.0:
if x[4] <= 18.0:
if x[4] <= 2.0:
return 1
else:
return 0
else:
if x[0] <= 68.0:
return 1
else:
return 0
else:
if x[1] <= 11.0:
if x[4] <= 11.0:
return 1
else:
return 1
else:
if x[3] <= 53.0:
return 1
else:
return 0
x = eval(sys.argv[1])
result = predict(x)
print(result)- Dataset Upload: Allows users to upload a CSV dataset.
- Model Configuration:
- Adjust
max_depth,min_samples, andcriterion.
- Adjust
- Results Visualization:
- View metrics (Precision, Recall, F1-Score).
- Display and save the confusion matrix.
- Visualize the decision tree traversal logic.
- Upload Dataset: Select a dataset via the Streamlit app.
- Configure Parameters: Choose training parameters (e.g.,
max_depth= 5). - Train the Model: Run the CART classifier.
- View Results: Check metrics, visualize the tree, and download outputs.
- Tree Visualization: Implement graphical tree visualization using libraries like Graphviz.
- Multi-Classifier Support: Add support for other decision tree algorithms.
- Model Export: Enable saving and loading of trained models.
For queries or contributions, reach out at aswajith707@gmail.com.