SberMegaMarket parses which is controlled over a telegram bot and is able to collect data to PostgreSQL.
Parser with controlls via telegram bot. Makes it easy to collect data and manage captcha by using telegram app and sending commands to the bot.
python = "^3.11.0"
poetry = "^1.2.0"
If you want to deploy this bot (render.com or similar), you have to set PYTHON_VERSION and POETRY_VERSION envs, or install poetry. Alternatively usage of requirements.txt will be added in the future.
-
Clone the repository:
git clone https://github.com/malvere/SberRight.git cd yourproject -
Install dependencies
poetry install
-
Run bot
python main.py
Set BOT_TOKEN which is obtained from @BotFather bot in telegram. You can also set DB_URL if you wish to use PostgreSQL, otherwise - .csv file will be generated.
-
Send
/initto bot. Command triggers PLaywright instance and starts scrape process -
If captcha is found, bot will send you screenshot of it, you need to solve it and send back to bot via
/captcha <decyphered_text>command. -
If captcha is entered succesfully, bot will trigger a golang script which will then parse pages
htmlcontent. Golang script source could be found here.
Feel free to contribute on this project.
MIT