Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was inevitably going to get more popular after the new API pricing was announced. Unfortunately this also means they're going to start being more aggressive on bot detection.

We may soon find ourselves needing to run headless browsers to scrape data, like with TikTok: https://nullpt.rs/reverse-engineering-tiktok-vm-1



These puppeteer, playwright wrappers make bots so easy. And you can distribute so easy too. Back in my day (IE6) it wasn't quite so simple. Trying to make your site undcrape-able while still readable is (still) impossible.


You can still do a few things to make it super annoying to scrape your site. Cloudflare bot detection, captchas, 2fa, obfuscated JS, api/ip rate limits, georestrictions, anti-replay tokens, not exposing API resources that can be enumerated, and a few other techniques can make most people not care enough... You can prevent 50% of scrapers with only a little bit of effort, 90% with more effort, and 99% if you wanna be a try-hard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: