Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Join a browser fingerprinting study (fau.de)
62 points by elsombrero on Feb 5, 2021 | hide | past | favorite | 32 comments


This [0] is a great set of configurations for the Firefox browser that remove many fingerprinting vectors. It turns on several fingerprint-homogenization features that the TBB project upstreamed into Firefox. Check it out!

(By using this, G--gle will hound you with impossible capchas and privacy-forward search engines will think that you're a bot.)

Most of these fingerprinting vectors are from the browser and the scourge of JS. I wonder about the footprint left by a user who doesn't use a browser, or instead uses some kind of parsing client that just fetches HTTP or data from API endpoint. Aside from the IP address, there are other ways to fingerprint a user based on network requests from various protocol leaks, many are presented here [1][2]. Are there any leak vectors missing from this list?

[0]: https://github.com/pyllyukko/user.js

[1]: https://www.whonix.org/wiki/Protocol-Leak-Protection_and_Fin...

[2]: https://www.whonix.org/wiki/Data_Collection_Techniques


>By using this, G--gle will hound you with impossible capchas and privacy-forward search engines will think that you're a bot

A friend of mine had an alternative theory: by making it very very hard to track you, you have shown to be somewhat intelligent and you have shown that you know more about computers than the average Joe. Goolag, knowing you're a somewhat intelligent human, gives you the very hard captchas it needs to train.


I wonder if the answer is in-between.

If Google has a way to verify correctness of an answer, which they seemingly would have to - then would it matter? Smart person or smart bot - if they are given a tough question and answer correctly Google gains confidence in the answer _(or however it works haha)_.

Because really i don't think Google cares at all about bots. They just want data. And the captcha system is an impressive system to pull training data out of intelligent beings/code.


Doesn't turning off all this make you stand out more?

Surely if you want to hide your need to look like everyone else and hide in the crowds.


It's a tough call. You're right that blending in is better but also turning off JS cuts off access to a lot of fingerprinting vectors. That lack of extra information may result in you looking like many others.


Firefox makes a fuss about anti-fingerprinting, and I have "Enhanced Tracking Protection" set to Strict, yet I see uncommon values for WebGL Vendor and WebGL Renderer in https://amiunique.org/fp . Unless this study feeds into changing things like that, it seems a bit pointless.


> yet I see uncommon values for WebGL Vendor and WebGL Renderer

I don’t know how Firefox does it but instead of trying to make the WebGL fingerprint the same for everyone every time they could also try to make it unique for everyone every time and it would have the same effect.

If every time you loaded a page your WebGL fingerprint differed then a website can’t use that to tell if it was the same browser that loaded the same page previously or any other page anywhere else previously.

(Assuming that the WebGL fingerprint anonymization was so good that it could indeed not be correlated between different fingerprints in any meaningful way.)


Part of the problem with either approach is that certain identifying information is also necessary for just making the web work. For example, on the EFF's Cover Your Tracks page (what is basically a fingerprinting demonstration), it shows that the screen resolution of my monitor conveys 16.25 bits of information. I use a particularly wonky landscape display mounted in portrait mode, which doesn't help the matter, but there's a problem: we can't lie about it.

You see, a while back we decided to allow writing CSS that changed the design of a website based on the size of it's containing viewport. This is called "responsive design" and is very useful; however, it also means that websites rely on having a correct window size in order to display content correctly. We cannot be inconsistent about our lies: if we were to, say, lie about the screen resolution but still handle media queries faithfully, then not only can the fingerprinter see through our lie, it can use the fact that we lied as extra information. (Remember how DNT served as an effective tracking indicator?) So that would mean browsers would have to start, say, snapping browser windows to certain common viewports or capping the number of distinct breakpoints a website's CSS is allowed to have; both of which have UX or compatibility implications.


A better long term solution would be to make websites behave more like print has for the past 500 years, and only allow them to pull info from the server while greatly restricting what they can transmit back. Only allow POST buttons to send home strictly textual data the users themselves typed out. Yeah, this would break certain cool legit websites like those that run HTML5 games that need to constantly transmit info back to the server. But this wouldn't have to break sites like HN/Reddit/Twitter/news/science/whatever. If you wanted a web that could give you beautiful images and typography without fingerprinting, such a thing could be built. It would break big Adtech tracking though so we'll never see this simplifying improvement in a big-Adtech funded browser (like Chromium).


>and only allow them to pull info from the server while greatly restricting what they can transmit back

The first forms of user tracking involved 1px GIFs that existed purely so that the server could log the request. If you allow any code execution at all, then the client can send data back to the server by asking for data from the server. Reads are just bidirectional writes.


Those 1px GIFs were so that some server other than the one you are currently interacting with can track you. So if I go to nytimes.com I might get served a 1px GIF from BigAdTechCorp.com. The proposal is that all images, text, and data only come from the server you are currently pointing your browser at. So if you go to nytimes.com then only nytimes.com can send you text and images, only the nytimes.com server sees what content you request and when you request it. Once upon a time people purchased printed newspapers and magazines and there weren't all the invasive ways to spy on how long readers engaged with articles and images in said periodicals and yet we all managed. Marketing firms made ad buys all the same with this old tech and many successful ad campaigns happened, all without the invasive tracking.


> Only allow POST buttons to send home strictly textual data the users themselves typed out

I’m almost with you but in that case even shopping online would not work. Unless you force the user to manually type in the SKU of each item they want to add to their cart and so on. That’s not gonna happen :p


Good point. There would be a few details like this to work out, of course. My first suggestion would be to allow text-boxes to be pre-populated with form data (like QTY and SKU) subject to tight restrictions. (There would be no script code sniffing the user's fingerprint before the text boxes are populated with SKU data, for example.) So a web dev could still create a shopping page with Add-to-Cart buttons such that clicking the button tells the server the SKU and QTY added to cart for session SESSION_ID. And the user transparently observes exactly which sequence of Unicode codepoints is being transmitted to the server upon clicking the Add-to-Cart button.


Yep, despite good browser hygiene my most unique browser identifier is my display size and color depth (2560x1600x24) Guess my original Apple Cinema HD display will clearly identify me as a dumb cheapskate who pays for expensive things and then uses them until they break. I'm a terrible advertising target :)


You can lie about screen size. If you don't have your browser fullscreen then the browser can claim the screen size is the browser size. The TOR browser handles this by having certain fixed sizes meant for different sized displays so at worse you fall in to one of a few categories.


If you're claiming the screen size is the browser size, you've provided more fingerprintable information. Browser viewport sizes are more numerous than screen sizes.

I'm assuming Tor Browser doesn't actually alter the CSS stylesheet layout engine, so here's a quick way around that: Construct a CSS stylesheet that styles a div a certain way based on the media query size. Encode the success or failure of the media queries in the width of the element and then read out the width of the element to tell if the browser is lying to you. Even if the browser throttles CSS queries, you'll still know that a lie happened, which is an extra fingerprinting bit.

If you're thinking of prohibiting dynamic CSS, you'll break all sorts of harmless JS - and still not fix the problem. You could maintain two sets of computed layouts, of course, but that would break JavaScript layout scripts (e.g. masonry.js). The option that breaks no scripts is to lock browser widths to rendering at specific viewports - if Tor Browser does that, then it probably has adequately resisted this particular fingerprinting vector, at the expense of some user experience.


Just tested on latest TOR Browser. It limits the viewport sizes to a small set (multiples of 200px I think). The JS `screen.width` etc are undefined. The media queries can only query the viewport size, not the screen size.


> We cannot be inconsistent about our lies: if we were to, say, lie about the screen resolution but still handle media queries faithfully, then not only can the fingerprinter see through our lie, it can use the fact that we lied as extra information.

Depends on to whom we lie. The OS usually[1] should not lie to the browser and the browser has to know the truth to do its duty and for example render the page.

The browser lying to the server is a different story. Not a simple story, but in principle there is no reason the server needs to learn rendering details from the client.

[1] Location faking apps are one example where OS lying works with little downsides.


If the browser doesn't lie while rendering the page, then JavaScript can deduce the window size by measuring elements on the page. And obviously JavaScript can send this information to the server.


Unfortunately, I think these studies are a bit naive because the proprietary, data-driven, probabilistic fingerprinting models used by Facebook and LinkedIn (to name two of the most elaborate fingerprinters) are years ahead of anything a few researchers could come up with.

1. Get a gazillion users on your site

2. Require a user account tied to a real person

3. Log IP, host, geolocation, and as many JavaScript/browser APIs as you can (there are hundreds at this point)

4. Among the fields you track, find the ones that ones that are the most stable and unique over time

5. Assign some probabilities to these fields to eliminate false positives

6. Generate personas for users for when they are at home, work, one their phone, etc.


> Log IP, host, geolocation, and as many JavaScript/browser APIs as you can (there are hundreds at this point)

That's fingerprinting, traditionally. Hence, the "Cookieless tracking" header right there on the page. If you are tying in other data, that's data aggregation for your business case and is fundamentally unrelated.

I mean, generating personas and whatever "false positives" mean, has nothing to do with fingerprinting. If you cant differentiate from an anon user to another, that's data too.


There's more subtle signals too. Luis Leiva is a researcher who has some work on mouse cursor trails as a fingerprinting technique, and ways to counteract it: https://www.researchgate.net/publication/348739714_My_Mouse_...


7. Try to figure out what to do with the data generated by step 3 through 6.

8. Give up and stop at 2 instead.


Hah, yes. Although, once you have user account tied to a real person, it becomes much easier to analyze the data you generate with the fingerprinter.


> Facebook and LinkedIn (to name two of the most elaborate fingerprinters) are years ahead of anything a few researchers could come up with.

Not only them. It is available to the masses[1] and I am afraid GDPR has given this trend a boost.

[1] https://fingerprintjs.com/


This title lead me to believe I'd be finding the result of a study.. not receiving the sales pitch. There have been studies before[0] with available tools[1] and advice[2].. perhaps this should be linked to the results [3] OR the title updated to "Join a browser fingerprinting study"

[0]: https://coveryourtracks.eff.org/static/browser-uniqueness.pd... [1]: https://coveryourtracks.eff.org/ [2]: https://restoreprivacy.com/browser-fingerprinting/ [3]: https://browser-fingerprint.cs.fau.de/statistics?lang=en


Ok, we'll put that in the title above.

Since the same URL was posted as long ago as 2016 (https://news.ycombinator.com/item?id=11266172), it's not clear how current this is.


I've signed up to see what it could find, but when I open the fingerprinting page in Firefox I just get loading spinners for most of the JS based tracking.

I think Privacy Possum has prevented fingerprinting. I'll disable it for now, but I do wonder what this would mean for the results of the study.


It seems the intention of the study is precisely to determine the impacts that these plugins have on prevention (or facilitation) of fingerprinting.

I think it should eventually complete, since mine did and I use a similar constellation of privacy plugins -- I think there is a timeout that eventually occurs once the various fingerprinting methods fail.


I noticed several fingerprints without JS in my account when I browsed the results. I suppose some of the fingerprinting works, but even after waiting a minute my screen didn't work with the JS version.


I think we killed it.


I am sorry. Not going to tie my email address to something trying to fingerprint my browser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: