This was inevitably going to get more popular after the new API pricing was announced. Unfortunately this also means they're going to start being more aggressive on bot detection.
These puppeteer, playwright wrappers make bots so easy. And you can distribute so easy too. Back in my day (IE6) it wasn't quite so simple. Trying to make your site undcrape-able while still readable is (still) impossible.
You can still do a few things to make it super annoying to scrape your site. Cloudflare bot detection, captchas, 2fa, obfuscated JS, api/ip rate limits, georestrictions, anti-replay tokens, not exposing API resources that can be enumerated, and a few other techniques can make most people not care enough... You can prevent 50% of scrapers with only a little bit of effort, 90% with more effort, and 99% if you wanna be a try-hard.
A lot of sites use GraphQL under the hood for the official frontend but expose more restrictive REST APIs for third party users. I suppose this is understandable because it's easier to document REST APIs and people generally can't cause quite as much unexpected trouble with them.
Pretty sure overusing this will get your account banned if not now eventually.
If you want to use it then you should seriously limit the usage. The rate limits are there to protect the servers but there are things that will notice that you are overall using the system at a rate that an actual person would not use and will ban you.
The fact that people will pay for what's already possible for free (with not that much effort) says a lot about what's wrong with the state of the world today.
Around the turn of the century and before, there was no such sentiment. People just RE'd like it was completely natural, and in general weren't "afraid to read" what they had access to. As the saying goes, "no source, no problem." As a result, multiple alternative clients for IM and other services flourished.
> The fact that people will pay for what's already possible for free (with not that much effort) says a lot about what's wrong with the state of the world today.
There are these places near me where they just heap up a bunch of food. You can literally walk in, grab something to eat and walk out. Nobody will stop you. Yet people for some reason queue up to pay for the food. I believe this is what is wrong with the state of the world today.
Just to spell it out: read it as sarcasm. Do you just steal stuff unless they keep it under lock and key? If usually not what makes this case different?
I'm sure companies wouldn't mind if a singular person was doing singular things with the API, but when someone uses this API to behave like more than 1 person is where companies get annoyed.
To use an annoying analogy or two, take a penny leave a penny but someone shoves their hand in there and takes all of it.
It costs money to service requests and twitter et all make that money back through advertising and subscriptions. If someone uses an unauthorized API, it could use more than they've budgeted for.
Actually they'd also get annoyed if everyone just used something like this for themselves. They want to control your experience. And as usual, remember you are not the customer, you are the product.
> As a result, multiple alternative clients for IM and other services flourished.
Feels like a lot of RE'ing has piped down these days. The fact I can download Pidgin but none of the major proprietary clients I'd want to use on it have first party support only as third party plugins says it all imho. I miss the old days of MSN being easy to use on Linux and everyone else having it as well.
I'm between then and now there were some significant court cases establishing that it's very very tricky to legally RE networked setvices in the USA. That had a chilling effect, especially on commercial reverse engineering.
For some reason Twitter’s frontend uses a hard-coded bearer token, at least for anonymous users. You’ll see exactly the same string if you load a Twitter page and look at the XHR requests in your own browser. (It seems to change occasionally, but old ones keep working in my experience.)
FWIW, I have never logged in to Twitter and I have always been able to retrieve all tweets. At first, I used mobile.twitter.com in a text-only browser, no token required. Since they started using GraphQL, I retrieve tweets as JSON. They have changed the token once. The current one is
YouTube does the same thing. I never run Javascript from YouTube. I do not use youtube-dl nor its JS interpreter written in Python. I search YouTube and retrieve YouTube JSON from the command line.
It's funny how people commenting on HN often automatically assume the presence of a token is some sort of "security".
For YouTube search and browse I use "WEB" key AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8
For YouTube player I use "ANDROID" key AIzaSyA8eiZmM1FaDVjRy-df2KTyQ_vz_yYM39w
It's like how web pages used to (and probably still do) use "type=hidden" in HTML forms to submit some value that the user does not enter. Hideen does not mean "secret" it just means not visible on the rendered page.
There's an obvious expectation that some users look at HTTP response headers and HTML when there's headers like "If you're reading this, we're hiring" and silly ASCII art in the HTML that's obviously meant for an external audience. YouTube even has some nonsensical line about a "robot uprising in the year 2000" in its robots.txt.
If the backend is going to perform operations in the context of an identity, it makes sense to consistently give one to all users, including anonymous ones.
I do this a lot, good ol' 0xDEADBEEF makes it easier to track whether the header is actually missing (eg misconfigured) or just undefined but coming through correctly.
As each big entity/celebrity quits twitter or starts having serious conflicts, approach them to cross-post their content to the new clone site.
In a year the masses will follow.
Once Musk misses a billion-dollar-per-year payments a few times to the Saudis they will own twitter and then it will be like TikTok censorship the next time they murder a journalist they disagree with.
We may soon find ourselves needing to run headless browsers to scrape data, like with TikTok: https://nullpt.rs/reverse-engineering-tiktok-vm-1