I love this and wanted to build this - but https://www.alphaxiv.org/ already exists, and it gets no social action (hardly any papers have comments), so this makes me doubtful about this.
I am interested to hear if anyone knows why the format may not resonate with researchers or those reading papers in general?
My own reason is that to get value from a "social" site the number of interactions has to be high and of a fast speed for people to continue to engage, which is maybe not possible to hit on research papers.
People will not flock somewhere unless they sense some potential return on investment. If a website looks like it will disappear in a few months, it does not make sense for a user to invest time and effort into it.
You have to either invest a lot to get a critical mass to join your site, or make it extremely entertaining to be there from the start. Apart from all the criticism, this is what Facebook, Instagram, Twitter, and LinkedIn got right from the start. For their intended audiences, it is either useful or fun to be on their platforms.
I don't see much added value for most arXiv extensions, except for SemanticScholar [1], which might have been lucky being one of the first.
> My own reason is .. maybe not possible to hit on research papers.
I think fancy people with appropriate credentials and .edu emails are all using openreview? So the audience is what, the unwashed masses who also happen to be doing some light reading at the bleeding edge of knowledge? Surely there are dozens of us I tell you, dozens! =P But yeah, maybe not enough to sustain a social network.
Never heard of alphaxiv, will try. I would also love for this to work, probably not willing to risk slogging through science twitter/bluesky/mastodon. Honestly HN would be the obvious place if it would add a pretty simple tagging system as most of the people interested are probably already here. I don't think we'll see that, because if we had filters no one would go to the front page, and that'd be a bad thing for certain interests.
Personally, I think social media and academic publishing timescales, rewards and social conventions don't match very well. Social networking feels transient and impersonal. I would like to take some time to form my opinion about a paper, not jump in with a post. Maybe some comment box would be ok to write a couple of nice things about a paper, but it doesn't feel the place to write harsh criticism or have complex discussion where things could be misunderstood. Rather than write in the public record, if I think the paper has a deep flaw I would prefer to contact the authors first. This can be followed up by discussions in your own papers. Others may have different opinions, of course.
I could see the author using GenAI video creation to summarize and make short videos about each paper. I believe this format could do wonders for paper discovery - say choose "Computer science" and you could flip through 20 papers in a few minutes getting an idea of what research recently has been published.
Other formats are dense and require reading and internalizing the content
This is a fair question, but not one I feel we can let people self answer.
I doubt many people will honestly admit they did no design, testing and that they believe the code is sub par.
It does give me an idea that maybe we need a third party system which can try and answer some of the questions you are asking… of course it too would be LLM driven and quite subjective.
> I doubt many people will honestly admit they did no design, testing and that they believe the code is sub par
I'd doubt any engineer that doesn't call most of their own code subpar after a week or two after looking back. "Hacking" also famously involves little design or (automated) testing too, so sharing something like that doesn't mean much, unless you're trying to launch a business, but I see no evidence of that for this project.
> I doubt many people will honestly admit they did no design, testing and that they believe the code is sub par.
Well no, but if people want to see a statement like this, and given that most people will want to be at least halfway honest and not admit to slop, maybe it will help nudge things in the right direction.
Fair play for launching this, it looks like a neat project.
However I feel it will be an uphill battle competing with OpenAI and Anthropic, I doubt your harness can be better since they see so much traffic through theirs.
So this is for those who care about the harness running on their own infra? Not sure why anyone would since the LLM call means you are sending your code to the lab anyway.
Sorry I don’t want to sound negative, I am just trying to understand the market for this.
We are not trying to compete with OpenAI and Anthropic! We open source it because there's interest from other startups.
Teams would use Anthropic and OpenAI, but they shouldn't just use Anthropic or OpenAI. We see much better results from calling the models independently and do adversarial review and response.
This doesn't replace your need for the models, but you certainly don't need to rely on any of the cloud agent solutions out there that call these models underneath the hood.
I think this used to be the trick, but now with AI it is such a general purpose technology I am not sure that makes sense. Users can “niche down” in a generic app, using prompts and configs.
I think if you believe that you're either lying or experiencing psychosis. LLMs are the greatest innovation in information retrieval since PageRank but they are not capable of thought anymore than PageRank is.
I actually noticed the same. Having it work on Mithril.js instead of React seems (I know it's all just kind of hearsay) to generate a lot cleaner code. Maybe it's just because I know and like Mithril better, but also is likely because of the project ethos and it's being used by people who really want to use Mithril in the wild. I've seen the same for other slightly more exotic stacks like bottle vs flask, and telling it to generate Scala or Erlang.
That makes sense. There's less training data but it is better training data. LLMs were trained on really bad pandas code, so they're really really good at generating bad pandas. Elixer, there's less of it, but what there is, is higher quality, so then what it outputs is off higher quality too.
I am interested to hear if anyone knows why the format may not resonate with researchers or those reading papers in general?
My own reason is that to get value from a "social" site the number of interactions has to be high and of a fast speed for people to continue to engage, which is maybe not possible to hit on research papers.
reply