Eye catching - "Open ended problems" claude code session success rate jumped from 20% (pre opus 4.5 release) to 70% after sometime after opus 4.6 was released.
Yeah this seems true. Claude Code are famously dubbed as best AI coding agent, but google doesn't care about that niche I guess. Somehow, I still rely on google search as they have diversified it.
If you ask questions, it will enable "AI overview" , but if we search about particular object/platform like "Google stock" or "bbc news", it will give the old classic search experience and we woulnd't need to swallow "AI overview" pill in that case.
> This is the opposite of the “10x productivity” slop-cannon style of development that most people imagine when they think of vibe coding, but I find it very satisfying.
I can relate to this. When I spend time on writing unit test , even the one which takes 1% of code coverage, it will be honestly wholesome moment for me to ship it confidently.
As you said it's distributed across - People, conversations, AI agents , tooling, etc... , can't the LLM Knowledgebase/ wiki ( a.k.a. org's second brain) solve this ? I think if , second brain exists, no one needs to pay cognitive debt.
I use to do this and then do test manually to validate everything works as expected in my small open source project. But then over the time I saw that some bugs crept in which I was unable track since I was doing manual testing. So I wrote some e2e tests with playwright and I think that gives a bit relief (at least).