Skip to content

A Python script analyzing price data by computing comprehensive technical indicators using TA-Lib, then applying correlation analysis and heatmap visualization to identify redundant features. Implements automated filtering of highly correlated indicators (>0.5 threshold) to optimize feature sets for quantitative trading models.

Notifications You must be signed in to change notification settings

AllFatherZA/FeatureEngineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Stock Price Correlation Analysis and Feature Selection This Python script performs correlation analysis on stock price data and conducts feature selection based on correlation thresholds. It utilizes libraries such as pandas, numpy, talib, seaborn, and matplotlib to achieve this.

Prerequisites Before running the script, ensure you have the following installed:

Python 3.x pandas library numpy library talib library seaborn library matplotlib library ta library You can install the required libraries using pip:

bash Copy code pip install pandas numpy talib seaborn matplotlib technical-analysis Getting Started Clone this repository or download the analyze_stock_data.py file.

Prepare your stock price data in a CSV file named GasPrices2023.csv. The CSV file should contain columns for 'Open', 'High', 'Low', 'Close', and 'Volume'.

Open the Python script in your preferred Python environment or IDE.

Run the script, and it will load the stock price data, calculate various technical indicators using TA-Lib, plot a correlation heatmap, conduct feature selection, and plot the updated correlation heatmap.

How to Use Ensure your CSV file GasPrices2023.csv is in the same directory as the Python script.

Open the script and locate the following section:

python Copy code

Load stock price data from CSV file

df = pd.read_csv('GasPrices2023.csv')

Calculate technical indicators using TA-Lib

df = add_all_ta_features( df, open="Open", high="High", low="Low", close="Close", volume="Volume", fillna=True) If your CSV file columns have different names, modify the arguments passed to add_all_ta_features() accordingly.

Run the script, and it will display a correlation heatmap, highlighting the relationships between various features.

Based on the correlation heatmap, the script will drop features that are highly correlated (above a specified threshold) using the function drop_correlated_features(). By default, it uses a threshold of 0.5.

After feature selection, the script will plot the updated correlation heatmap.

Note The script uses the ta library to calculate various technical indicators based on the provided stock price data. Make sure the data contains the required columns for the 'Open', 'High', 'Low', 'Close', and 'Volume' prices.

The correlation heatmap is useful for identifying relationships between features. However, removing correlated features is a basic feature selection technique and may not be sufficient for complex modeling tasks.

Feel free to experiment with different correlation thresholds and feature selection methods to suit your specific data analysis needs.

Disclaimer: This script is for educational and informational purposes only. It does not provide financial advice. Always do your own research and consult with a qualified financial advisor before making any investment decisions.

Happy analyzing! 📊

About

A Python script analyzing price data by computing comprehensive technical indicators using TA-Lib, then applying correlation analysis and heatmap visualization to identify redundant features. Implements automated filtering of highly correlated indicators (>0.5 threshold) to optimize feature sets for quantitative trading models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages