Show HN: Twitter API Reverse Engineered

TobyTheDog123 · on April 13, 2023

This was inevitably going to get more popular after the new API pricing was announced. Unfortunately this also means they're going to start being more aggressive on bot detection.

We may soon find ourselves needing to run headless browsers to scrape data, like with TikTok: https://nullpt.rs/reverse-engineering-tiktok-vm-1

djbusby · on April 13, 2023

These puppeteer, playwright wrappers make bots so easy. And you can distribute so easy too. Back in my day (IE6) it wasn't quite so simple. Trying to make your site undcrape-able while still readable is (still) impossible.

pocket_cheese · on April 13, 2023

You can still do a few things to make it super annoying to scrape your site. Cloudflare bot detection, captchas, 2fa, obfuscated JS, api/ip rate limits, georestrictions, anti-replay tokens, not exposing API resources that can be enumerated, and a few other techniques can make most people not care enough... You can prevent 50% of scrapers with only a little bit of effort, 90% with more effort, and 99% if you wanna be a try-hard.

simonw · on April 13, 2023

Huh, TIL Twitter uses GraphQL under the hood: https://github.com/trevorhobenshield/twitter-api-client/blob...

hobenshield · on April 13, 2023

https://blog.twitter.com/engineering/en_us/topics/infrastruc...

dmatech · on April 16, 2023

A lot of sites use GraphQL under the hood for the official frontend but expose more restrictive REST APIs for third party users. I suppose this is understandable because it's easier to document REST APIs and people generally can't cause quite as much unexpected trouble with them.

halfmatthalfcat · on April 13, 2023

I believe its running on Sangria (Scala's biggest GraphQL lib)

icelancer · on April 13, 2023

Perhaps the reaction to this is responsible for tons of people being unable to post tweets to the site using the website. Fun stuff.

dawnerd · on April 13, 2023

Nothing new. Nitter for example has been using these endpoints for a while.

hobenshield · on April 13, 2023

yup, I'm just accessing /i, /1.1, and some /2 endpoints in addition to /graphql

JeremyNT · on April 13, 2023

It continues to surprise me that Elon hasn't shut out Nitter yet.

Maybe there's just not enough engineering talent left at Twitter to do it.

v0idzer0 · on April 13, 2023

There’s more engineering talent now not less

bob-09 · on April 14, 2023

I have yet to see evidence of that. I've seen more issues than improvement.

Dave3of5 · on April 13, 2023

Pretty sure overusing this will get your account banned if not now eventually.

If you want to use it then you should seriously limit the usage. The rate limits are there to protect the servers but there are things that will notice that you are overall using the system at a rate that an actual person would not use and will ban you.

TheObviousOne · on April 13, 2023

Is this replacement for paying to Twitter new policy subscription?

userbinator · on April 13, 2023

The fact that people will pay for what's already possible for free (with not that much effort) says a lot about what's wrong with the state of the world today.

Around the turn of the century and before, there was no such sentiment. People just RE'd like it was completely natural, and in general weren't "afraid to read" what they had access to. As the saying goes, "no source, no problem." As a result, multiple alternative clients for IM and other services flourished.

krisoft · on April 13, 2023

> The fact that people will pay for what's already possible for free (with not that much effort) says a lot about what's wrong with the state of the world today.

There are these places near me where they just heap up a bunch of food. You can literally walk in, grab something to eat and walk out. Nobody will stop you. Yet people for some reason queue up to pay for the food. I believe this is what is wrong with the state of the world today.

Just to spell it out: read it as sarcasm. Do you just steal stuff unless they keep it under lock and key? If usually not what makes this case different?

userbinator · on April 13, 2023

How is it "stealing" to access a service you already have access to for free?

Go to twitter.com in a browser.

This is just a different type of browser.

Physical analogies never work right with digital data.

xeromal · on April 13, 2023

I'm sure companies wouldn't mind if a singular person was doing singular things with the API, but when someone uses this API to behave like more than 1 person is where companies get annoyed.

To use an annoying analogy or two, take a penny leave a penny but someone shoves their hand in there and takes all of it.

It costs money to service requests and twitter et all make that money back through advertising and subscriptions. If someone uses an unauthorized API, it could use more than they've budgeted for.

dariusj18 · on April 13, 2023

Actually they'd also get annoyed if everyone just used something like this for themselves. They want to control your experience. And as usual, remember you are not the customer, you are the product.

giancarlostoro · on April 13, 2023

> As a result, multiple alternative clients for IM and other services flourished.

Feels like a lot of RE'ing has piped down these days. The fact I can download Pidgin but none of the major proprietary clients I'd want to use on it have first party support only as third party plugins says it all imho. I miss the old days of MSN being easy to use on Linux and everyone else having it as well.

iudqnolq · on April 13, 2023

I'm between then and now there were some significant court cases establishing that it's very very tricky to legally RE networked setvices in the USA. That had a chilling effect, especially on commercial reverse engineering.

_5hxt · on April 13, 2023

Related but archived (scraper): https://github.com/twintproject/twint

banditelol · on April 13, 2023

I noticed it uses fixed Auth header, is it from your session or consistent across session/user?

wolfgang42 · on April 13, 2023

For some reason Twitter’s frontend uses a hard-coded bearer token, at least for anonymous users. You’ll see exactly the same string if you load a Twitter page and look at the XHR requests in your own browser. (It seems to change occasionally, but old ones keep working in my experience.)

1vuio0pswjnm7 · on April 13, 2023

FWIW, I have never logged in to Twitter and I have always been able to retrieve all tweets. At first, I used mobile.twitter.com in a text-only browser, no token required. Since they started using GraphQL, I retrieve tweets as JSON. They have changed the token once. The current one is

Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

IME, the old token will not work.

YouTube does the same thing. I never run Javascript from YouTube. I do not use youtube-dl nor its JS interpreter written in Python. I search YouTube and retrieve YouTube JSON from the command line.

It's funny how people commenting on HN often automatically assume the presence of a token is some sort of "security".

For YouTube search and browse I use "WEB" key AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8

For YouTube player I use "ANDROID" key AIzaSyA8eiZmM1FaDVjRy-df2KTyQ_vz_yYM39w

It's like how web pages used to (and probably still do) use "type=hidden" in HTML forms to submit some value that the user does not enter. Hideen does not mean "secret" it just means not visible on the rendered page.

There's an obvious expectation that some users look at HTTP response headers and HTML when there's headers like "If you're reading this, we're hiring" and silly ASCII art in the HTML that's obviously meant for an external audience. YouTube even has some nonsensical line about a "robot uprising in the year 2000" in its robots.txt.

1vuio0pswjnm7 · on April 14, 2023

Here's an example of a site using GraphQL without using a token. A simple HN search script to fetch Algolia JSON. No need to be logged in to HN.

   #!/bin/sh
   test $# -gt 0||exec echo usage: $0 query
   DATA=$(echo '{"query":"'$@'","analyticsTags":["web"],"page":0,"hitsPerPage":30,"minWordSizefor1Typo":4,"minWordSizefor2Typos":8,"advancedSyntax":true,"ignorePlurals":false,"clickAnalytics":true,"minProximity":7,"numericFilters":[],"tagFilters":["story",[]],"typoTolerance":"min","queryType":"prefixNone","restrictSearchableAttributes":["title","comment_text","url","story_text","author"],"getRankingInfo":true}');
   HOST=uj5wyc0l7x-3.algolianet.com
   _PATH="/1/indexes/Item_production_sort_date/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.0.2)%3B%20Browser%20(lite)&x-algolia-api-key=8ece23f8eb07cd25d40262a1764599b1&x-algolia-application-id=UJ5WYC0L7X"
   # HTTP client (curl)
   #curl -A "" -d "$DATA" "https://$HOST$_PATH"
   # TCP client
   #echo "
   #foreground=no
   #[x]
   #accept=127.0.0.8:80
   #client=yes
   #connect=167.114.119.142:443
   #options=NO_TICKET
   #options=NO_RENEGOTIATION
   #renegotiation=no
   #sni=
   #sslVersion=TLSv1.3
   #" |stunnel -fd 0;
   #tr @ '\r' <<eof|openssl s_client -connect $HOST:443 -ign_eof
   #tr @ '\r' <<eof|bssl s_client -connect $HOST:443 
   #tr @ '\r' <<eof|nc -vvn 127.8 80
   tr @ '\r' <<eof|socat stdio,ignoreeof ssl:$HOST:443,verify=0
   POST $_PATH HTTP/1.1@
   host: $HOST@
   content-length: ${#DATA}@
   content-type: x-www-form-urlencoded@
   connection: close
   @
   $DATA
   eof
   #x=$(ps ax|sed -n "/stunnel.-fd.0/{s/ *//;s/ .*//p;q}")
   #test ! $x||kill $x

1vuio0pswjnm7 · on April 15, 2023

Anyone who monitors what is being sent from their own computers over their own networks sees the Bearer token.

Everyone, including any member of the public, who visits twitter.com gets the same Bearer token.

No need to have an "account" with Twitter or to be "logged in".

One can simulate this with cURL.

   js=$(curl -sA "" https://twitter.com|grep -m1 -o "https://abs.twimg.com/responsive-web/client-web-legacy/main[^\"]*");
   curl -A "" $js|tr , '\n'|grep -o \"AAAA.*\"

The same Bearer token value is used by people around the web for retrieving public tweets. It's public information. For example,

https://stackoverflow.com/questions/61140863/python-download...

https://github.com/twintproject/twint/raw/master/twint/run.p...

https://pypi.org/project/ScrapeTweets/

https://stackoverflow.com/questions/67137294/twitter-scrapin...

https://github.com/m4fn3/pytweetdeck/blob/master/pytweetdeck...

https://github.com/jonbakerfish/TweetScraper/issues/127

https://github.com/JustAnotherArchivist/snscrape/issues/536

https://gist.github.com/codemasher/67ba24cee88029a3278c87ff9...

https://github.com/HoloArchivists/twspace-dl/issues/26

https://gist.github.com/AzureFlow/01cff883b9f1b22e8d0c094df9...

https://greasyfork.org/hu/scripts/454409-video-downloader-fo...

https://gist.github.com/moxak/ed83dd4169112a0b1669500fe85510...

https://gist.github.com/ceres-c/7c16a40c10cb476cce2c4b902334...

https://gist.github.com/theowenyoung/d4a62746025f7af8cdd8bfb...

userbinator · on April 13, 2023

I believe YouTube does the same thing.

If the backend is going to perform operations in the context of an identity, it makes sense to consistently give one to all users, including anonymous ones.

jaggederest · on April 13, 2023

I do this a lot, good ol' 0xDEADBEEF makes it easier to track whether the header is actually missing (eg misconfigured) or just undefined but coming through correctly.

thereisnoself · on April 13, 2023

Has anyone built a js version of this?

ck2 · on April 13, 2023

If this can be built, build a twitter clone.

As each big entity/celebrity quits twitter or starts having serious conflicts, approach them to cross-post their content to the new clone site.

In a year the masses will follow.

Once Musk misses a billion-dollar-per-year payments a few times to the Saudis they will own twitter and then it will be like TikTok censorship the next time they murder a journalist they disagree with.

v0idzer0 · on April 13, 2023

Do you really expect the richest man in the world to miss payments? He has enough money to buy Twitter 4 more times

jjulius · on April 13, 2023

https://www.cnn.com/2023/01/03/tech/twitter-landlord-lawsuit...

dylan604 · on April 13, 2023

I don't care if twitter dies, but humans DO NOT need to die for that to happen.

wartijn_ · on April 13, 2023

The parent post isn’t saying people need to die, they’re referring the Saudi government murdering a journalist.

https://en.m.wikipedia.org/wiki/Assassination_of_Jamal_Khash...