Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I doubt it. People who write scrapers are talented engineers. Talented engineers understand the limitations of scraping compared to using an official API, and know to try to get things done with the official API first.

The Twint README calls out reasons for going beyond the API at the start - things like Twitter's increasingly strict rate limits and the limit of only 3,200 historic tweets for a user account.



I've been maintaining a browser extension[1] that has used various versions of the unofficial API over a period of six years -- from literally navigating the site as the user and grabbing HTML, to the various incarnations of JSON and HTML hybrid (SSR) APIs that their web and mobile clients have used internally over that time.

Believe me, I would LOVE if the official API supported threads. I have tried several times to make it work, but the official APIs are stuck in a circa-2012 idea of how Twitter works. Replies just aren't a thing to it.

[1] https://github.com/paulgb/Treeverse


I don't think this program actually solved the 3,200 limit you mentioned, not sure though.


People who write robust, performant, maintainable systems that don’t collapse under their own weight are talented engineers.

Most scrapers including this one lack these characteristics.


I totally agree, writing scrapers doesn't necessarily make one a talented engineer. You may often find it easier to write a scrapers than going through the API documentation or series of authentication procedures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: