Somehow torvalds/linux is in Fronterra, next to JS projects, awesome-X lists, an...

anvaka · on Dec 16, 2024

Jaccard similarity is not particularly good for "celebrity" projects.

They are similar because they are popular, not because there is semantic relationship.

It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.

Stil works great for smaller, non-celebrity projects :D

supriyo-biswas · on Dec 16, 2024

I wonder if code embeddings might have been a better way to organize the projects, although probably infeasible given the amount of resources required to download and compute embeddings for each file.

machiaweliczny · on Dec 16, 2024

Embeddings are super cheap to compute

dataviz1000 · on Dec 16, 2024

Perhaps the same reason heat maps are often really the underlining population map https://xkcd.com/1138/

wodenokoto · on Dec 16, 2024

That’s why in NLP we use term frequency over inverse document frequency. It gives you a measure of common uncommon things are.

Wonder how you’d implement that in a heat map. Just call each pixel a document and see where it takes you?

bravura · on Dec 16, 2024

People have been critiquing the collaborative filtering aspect of this work vs content analysis ("[why use stars instead of code similarity]") but there's something elegant about the simplicity of using less priors here.

A tf*idf matrix could be applied to the star-feature matrix too. Document = github repo. Term = name of user who starred it.

THUS, users who overstar are simply less important for computing similarities.

This would mitigate the phenomenon of massively popular github repos being clustered together because of folks who blithely star the most well known stuff.

supriyo-biswas · on Dec 16, 2024

Winsorize the data points to remove outliers and then divide it by the population count for the case of the heatmap?

revskill · on Dec 15, 2024

Because of react ?

jensenbox · on Dec 16, 2024

That was my first reaction.

moffkalast · on Dec 16, 2024

What's your angle?

dbrans · on Dec 16, 2024

Can you elaborate?

odo1242 · on Dec 16, 2024

Wait, why React?