Socat are a collection of functions to stream and analyze social media contributions.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Python >= 3.7.2withpip >= 19.0.1(FOR LOCAL INSTALLATION WITHOUT A VIRTUALENV. NOT RECOMMENDED!)virtualenvwrapper >= 4.8.0(RECOMMENDED WAY IF YOU DONT WANT TO USE DOCKER)Docker >= 18.09.2(RECOMMENDED WAY IF YOU WANT TO USE DOCKER)
If you want to install the project without the use of virtualenv you just need to execute
pip install -r requirments.txt
which will install all required python packages.
If you want to install the project with the use of virtualenv or virtualenvwrapper you first need to create a new environment. For example with virtualenvwrapper you just need to execute
mkvirtualenv -p python3 YOUR-ENV
Don't forget to specify the python version -p if python3 is not the default version.
Then check if the environment is activate. You should see the name of your created environment on the left side of your shell.
(YOUR-ENV)
For further information, for example how to enter and leave an environment, check out the website of either virtualenv or virtualenvwrapper. Now you should be able to install the packages just as mentioned before.
pip install -r requirments.txt
This time the dependencies are saved in the environment you created before.
With docker you just need to run
docker build -t YOUR-TAG .
Docker will now create a new image with all dependencies. The standard execuation behavior is to run all unittests. If you want to enter the docker container you could execute the following command
docker run -ti YOUR-TAG bash
For further information how to use and create docker containers check out the website of docker.
Socat comes with a command line interface. To see how to use it just type in:
python src/socat.py
You should see an error message with informations about the options you could use.
If you want to stream social media entries you could run for example the following command
python src/socat.py stream tweets -m 100 -l -v
which will start streaming tweets with an upper limit of 100, log and verbose mode enabled.
The recived entries getting written to the data folder in your current working directory. If no data folder exist one will created.
For each social media source you need to put your credentials in the .env file. Check the .env.example to see how it should look like. Also check out the config.py inside the stream directory. This file is used to hand in some default parameters, in the twitter case bounding boxes, languages and words to track.
To analyze and plot logfiles just run the following command
python socat.py analyze -p /path/to/logfiles logs
The -p path option is required and needs to be an directory containing valid logfiles. You can overwrite the PLOT_CONF dict inside the src/analyze/config.py to change the log configuration.
To start the topic detection process just run the following command
python socat.py analyze -p /path/to/social_media_entries text -m KM -lang de
The -p path option is required and needs to be a directory containing valid social media entries. There are more optional arguments like -m which can be used to actually run the topic detection process only with a specific methode, like k-means. Also you can prefilter the entries by a language -lang de. You can find a list of all supported langauges here: langdetect. The results of the topic detection process getting printed to the stdout. Also you can overwrite the default configs of the different steps and methods. Just have a look inside the src/analyze/config.py.
To run all test cases, first make sure you are in the src directory, then just type in:
python -m unittest discover
This command executes all tests inside the tests directory of the src folder.
- Python - Python programming language
- pip - Package installer for python
- scikit-learn - Machine learning in python
- langdetect - Language detection library
- matplotlib - Matplotlib is a Python 2D plotting library
- Docker - Operating-system-level virtualization (containerization)
- argeparse - Write user-friendly command-line interfaces
- python-dotenv - Reads the .env file and adds them to environment variable
- Frederik Aulich - Initial work - Kiesen
This project is licensed under the MIT License - see the LICENSE.md file for details