Hacker Newsnew | past | comments | ask | show | jobs | submit | singularity2001's commentslogin

Most importantly, the reinforcement loop is used during training. I don't agree with Sutton's original hypothesis, but it holds even less after reinforcement learning.

RLVR still does not expand beyond the base distribution though, it only mode-seeks within it.

i.e, evaluation, retention yes. variation or "planning" no.

That is not to say you cannot use LLMs. Alpha evolve does exactly that. It uses an external simple evolutionary planner. The overarching point he's making is that our planner is still "dumb" and we need to work on it.

When you iteratively guide an LLM in claude code, you are the external planner. That also works.


or could it be the other way around that actual privacy is forbidden in Europe because they want to read your messages

This. They come up with so many laws under so many pretences that want to take away the freedom of private communications

It's not about how it is but how they made it sound. Let's not get ideological here.

do they not have data centers in Europe yet

They have some for sure for iCloud. Do they have enough to handle this volume of compute AND is Gemini allowed to be run on those? That was more what I was questioning/curious about.

some of the greatest ideas are proposed in a ridiculed manner first

"allows the important stuff like LLMs writing code so long as you disclose."

Are you sure? It says:

"It's fine to use LLMs to answer questions, analyze, distill, refine, check, suggest, review. But not to *create*."


Yes. The policy is pretty clear on what the rules are for LLM generated code. You need a reviewer to agree to review LLM generated code, you need to read the code yourself, etc.


It's a pretty strict ban, with an exception.

That exception is experimental and somewhat limited; Only allows "well tested, high-quality" PRs on parts of the codebase that have a low probability of causing soundness issues, and it has a seperate review process with much higher standards.

And it requires the reviewer to agree to the use of LLMs ahead of time, before the PR is opened.

IMO, it has a high likelihood of degrading to a closed system, where some programmers with a good track record have little issue merging LLM generated PRs, while anyone without a reputation will struggle to even open an ai-assisted PR.


Interesting when I read the book I wanted to rename it to "How to win fake friends and manipulate people." Maybe I missed the humble passages.


Formal proofs are made to be done by AI.

If a green checkmark goes away so be it. AI might or may not understand how to fix it but it's no burden to the user / developer.


Flagged for not defining what RF engineering is


The US has massacred millions of people of other countries, is that better?


You dont even have to look abroad, the USA kills its own citizens all the time. Police brutality is a huge issue here, we had some large protests here and the country ended those with the realization that nothing can be done about it. Kids get shot in school all the time in the US and once again, nothing gets done about it ever. The USA has a gigantic prison population and you guessed it: nothing gets done about it.


I thought cursor became mostly obsolete with Claude Code and Codex TUIs?


> I thought cursor became mostly obsolete with Claude Code and Codex TUIs?

I wouldn't think so. At work I have both cursor and claude code and while I use both, cursor is by far the most pleasant to use. If I had to give one up, I'd let claude go.


Are TUIs not yesterday’s hot thing?

The way I work now in the Codex desktop app is that I spin up 3-5 conversations which work in their dedicated git worktree.

So while the agent works and runs the test suite I can come back to other conversations to address blockers or do verification.

Important is that I can see which conversation has an update and getting desktop notifications.

Maybe I could set this up with tabs in the Terminal, but it does not sound like the best UX.


That's probably more a personal preference than objective measurement. A lot of people already spent most of their dev time in the terminal, so for someone like myself that uses neovim claude code or codex cli are much easier than using the GUIs.


The solution is use both. They both have their usecases. Cursor's autocomplete and quickly highlight a few lines -> throw into context, plus it's got a very good file index/API (which burns much less tokens than Claude's grep'ing) and whatever else they are doing underneath to optimize it for coding.

Claude is still gold standard if you're not in an IDE though.


Grep'ing doesn't use tokens, it uses grep.


Reading files is always the biggest token burning when coding. If it can't find stuff quickly or has to use less and head to trim it before finding it, then you're just wasting context window

Cursor both lets you highlight specific lines multiple times per chat and is much quicker at finding stuff.


Claude has to use more tokens to read the grep output.


That matches my anecdatal experience with a couple dozen devs. Many wnet hard on the Cursor train and have mostly gotten off now with CC and Codex TUIs available


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: