Skip to content

iMCrazyDev/ycombinator-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YC S25 Live Parser

Author: Arsen Andrian @DeadPonySkye
Time spent: 2 hours total
Idea: Came up with the idea myself, implemented it with the help of ChatGPT


run:

streamlit run yc_streamlight_app.py

The report is also via promt in the chatgpt. The entire code and report were checked and looked at by me;

The quality can be improved by looking at corner cases, but as far as I understand, this was not requested in the task.

How parsing works

1. YC Company List

Y Combinator uses the algolia.net AI search infrastructure.
I simply extracted their API key from a network request and reused it — parsing the ready-made JSON response directly in the app.

Filter for search:

facetFilters=[[\"batch:Summer 2025\"

Why not use Selenium or Playwright?
Because it’s overhead — unnecessary dependencies. Why emulate a browser when you can just replicate the exact same request?


2. Extract LinkedIn URL

All the required company metadata is stored in a JavaScript variable on each YC company page.
I locate the data-page="..." attribute using a regex, then HTML-unescape its contents and extract the LinkedIn URL using another regex:

https://(www.)?linkedin.com/company/[^"\s<]+


3. Check for YC S25 Mention

The company name appears in the <title> tag of the LinkedIn page.
You can’t just scan the full page content — LinkedIn shows recommended companies where “YC S25” might appear even if it doesn’t belong to the current profile.
So I only check the <title> tag.

I also ran into cases where the LinkedIn URL is either broken or private — for example:

https://www.linkedin.com/company/96694490/

For those links, I don’t return true or false — they’re considered inaccessible (private).


Tech Notes

  • No proxy or advanced User-Agent tricks were used.
  • Everything is wired into a real-time Streamlit app, as requested in the task.
  • Deduplicataion implemented.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages