Centralized data analysis platform for the Cornell Innovation and Entrepreneurship Lab. This repository contains scripts for data collection, data cleaning, and data analysis.
- Python 3.9
- pip
- virtualenv
- Cornell Email
- Clone the repository
git clone- Create a virtual environment
virtualenv venv- Activate the virtual environment
source venv/bin/activate- CD into the server repository
cd server- Install the dependencies
pip install -r requirements.txt- Create a .env file in the server directory
touch .env- Add the following environment variables to the .env file
export CORNELL_NETID = "your_cornell_netid"
export CORNELL_PASSWORD = "your_cornell_password"
export CAPITAL_IQ_USERNAME = "your_capital_iq_username"
export CAPITAL_IQ_PASSWORD = "your_capital_iq_password"- Source the .env file
source .env- Run the server
python app.py- Open a new terminal window and CD into the client repository
cd cornell-data- Install the dependencies
npm install- Run the client
npm startThe platform could be used to collect companies data in the following ways:
- Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, individually.
cd scrapingpython index.py --source- Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, in bulk.
python index.py