Author: Arsen Andrian @DeadPonySkye
Time spent: 2 hours total
Idea: Came up with the idea myself, implemented it with the help of ChatGPT
run:
streamlit run yc_streamlight_app.py
The report is also via promt in the chatgpt. The entire code and report were checked and looked at by me;
The quality can be improved by looking at corner cases, but as far as I understand, this was not requested in the task.
Y Combinator uses the algolia.net AI search infrastructure.
I simply extracted their API key from a network request and reused it — parsing the ready-made JSON response directly in the app.
Filter for search:
facetFilters=[[\"batch:Summer 2025\"
Why not use Selenium or Playwright?
Because it’s overhead — unnecessary dependencies. Why emulate a browser when you can just replicate the exact same request?
All the required company metadata is stored in a JavaScript variable on each YC company page.
I locate the data-page="..." attribute using a regex, then HTML-unescape its contents and extract the LinkedIn URL using another regex:
https://(www.)?linkedin.com/company/[^"\s<]+
The company name appears in the <title> tag of the LinkedIn page.
You can’t just scan the full page content — LinkedIn shows recommended companies where “YC S25” might appear even if it doesn’t belong to the current profile.
So I only check the <title> tag.
I also ran into cases where the LinkedIn URL is either broken or private — for example:
https://www.linkedin.com/company/96694490/
For those links, I don’t return true or false — they’re considered inaccessible (private).
- No proxy or advanced User-Agent tricks were used.
- Everything is wired into a real-time Streamlit app, as requested in the task.
- Deduplicataion implemented.