Hacker Newsnew | past | comments | ask | show | jobs | submit | steve_adams_86's commentslogin

What if this is modeled around the premise that in any situation where reasoning can be used, someone would have access to super-human reasoning?

Where does the human in the loop somehow manage to utilize super-human reasoning better than another person?

I'm not suggesting it's impossible, so much as wondering if we can reach a place where the human is truly irrelevant to the process, and can't make a better decision than the superhuman entity.

I'm not sure this is ever possible. It's more of a thought experiment. What's between here and there? Right now we can use pseudo-intelligence from silicon to our advantage, and being smarter than average is clearly a massively outsized advantage. It's similar to how being able to automate tasks gives you an outsized advantage, yet in so many more ways. But what if that advantage thins or even vanishes?


But, what if people putting their energy into ensuring society adapts with the technology safely and positively would be better than focusing on finding ways to capitalize off of whatever happens to occur instead?

I'm not saying one person can do that alone, but if we collectively believe we should focus on capitalization instead, then there's no one present to influence a more constructive, pro-social, sustainable course for society.

So I don't think it's ridiculous to think it's acting against their interests. Money won't get your kids very far if the thing that made you wealthy also pulled the rug from under them. There needs to be more of a strategy than capital.


I can't seem to find any evidence of that. The oldest reference I can find is less than a week ago: https://www.reddit.com/r/TeslaFSD/comments/1shqut4/hw3_vehic...

Maybe I'm not sleuthing hard enough. Most reporting I can find on it is from today.


So if read the link that you posted - there's link to posters profile:

https://www.facebook.com/wildduce22/posts/pfbid0reXge89aqZGa...

April 10 at 3:34 AM


That's still 5 days ago (less than a week) as I mentioned, not weeks.

Oh right. Brain fart.

Reading tho he said he floored it himself tho.


"Cognitive inbreeding" is an interesting (though maybe not entirely accurate) term for something I dislike a lot about LLMs. It really is a thing. You're recycling the same biases over and over, and it can be very difficult to tell if you don't review and distill the contents of your discourse with LLMs. Especially true if you're only using one.

I do think there's a solution to this—kind of—which dramatically reduces the probability and allowing for broad inductive biases. And that's to ask question with narrower scopes, and to ensure you're the one driving conversation.

It's true with programming as well. When you clearly define what you need and how things should be done, the biases are less evident. When you ask broad questions and only define desired outcomes in ambiguous terms, biases will be more likely to take over.

When people ask LLMs to build the world, they will do it in extremely biased ways. This makes sense. When you ask it specifics about narrow topics, this is still be a problem, but greatly mitigated.

I suppose what's happening is an inversion of cognitive load, so the human is taking on more and selecting bias such that the LLM is less free to do so. This is roughly in line with the article's premise (maybe not the entire article, though), which is fine; I think I generally agree that these are cognitive muscles that need exercising, and allowing an LLM to do it all for you is potentially harmful. But I don't think we're trapped with the outcome, we do have agency, and with care it's a technology that can be quite beneficial.


One of my "let's try out this vibecoding thing" toy projects was a custom programming language. At the time, I felt like it was my design, which I iterated on through collaborative conversations with Claude.

Then I saw someone's Show HN post for their own vibecoded programming language project, and many of the feature bullet points were the same. Maybe it was partly coincidence (all modern PLs have a fair bit of overlap), but it really gave me pause, and I mostly lost interest in the project after that.


Thats the thing about a normalization system, it is going to normalize outputs because its not built to output uniqueness, its to winnow uniqueness to a baseline. That is good in some instances, assuming that baseline is correct, but it also closes the aperture of human expression.

I agree in a "the purpose of a system is what it does" sense but I'm not sure they're inherently normalization systems.

Token selection is based off normalization, even if you train a model to produce outlier answers, even in that process you are biasing to a subset of outliers, which is inherently normalizing.

Could you elaborate on "token selection is based off normalization"?

Sure;

https://arxiv.org/pdf/1607.06450

Depending on the model architecture, there is normalization taking place in multiple different places in order to save compute and ensure (some) consistency in output. Training, by its very nature, also is a normalization function, since you are telling the model which outputs are and are not valid, shaping weights that define features.


Such a great idea. It's nice that with cannabis, despite there being so many cultivars, it's such a large industry based around essentially one plant. And while some varieties can look quite different, I think your API should generally be effective.

I've been thinking about similar systems for tissue cultures but I can't seem to find a way to generalize and still get good training data or effective results. Once you lose track of white balance, species, optical clarity and distortion from the vessel, etc... Results decline quite a bit in my experience. It makes it a neat yet fairly useless system outside of itself.

Granted, I have no idea what I'm doing and these could be solvable problems. Certainly much easier to solve by focusing on a single species.

I'm impressed with how well it classifies based on the image examples. A little over a million images is probably what makes it possible. My experiments have been much smaller. Maybe with more material I could overcome those limitations I mentioned, but I have a feeling the multi-species pipeline really drags it down.

Have you found that light temperature no longer skews feedback after so much training data? For me it really matters, causing classification to confuse light sources with actual plant condition (hence the colour card for white balance helping so much)


Thanks! Yeah, the single-species focus does a lot of the work. Under the hood it's not one big model - there's a cannabis verification gate, then routing into disease vs pest vs deficiency, then narrower classifiers from there. Each one has a simpler job so accuracy stays high.

Early on the photography thing was a real problem. Training data was mostly decent shots, then inference would come in as some blurry phone photo under purple LEDs.

Confident misclassifications. The fix wasn't clever - just more data that looks like how people actually take photos of their plants. Messy, badly lit, half the leaf out of frame. Once there was enough of that in the training set the models stopped caring about white balance. About 1.1 million augmented images now and light temperature just isn't a factor. No color card needed.

For tissue culture - I'd bet the multi-species part is what's killing you. I'd pick the single highest-value species, collect a probably-uncomfortable amount of well-labeled data for just that one, and see if things change. Right now you might not be able to tell what's a data problem vs a fundamental limitation, because the generalization overhead masks both.


> there's a cannabis verification gate, then routing into disease vs pest vs deficiency, then narrower classifiers from there. Each one has a simpler job so accuracy stays high.

That never occurred to me. That's a great insight.

> I'd pick the single highest-value species, collect a probably-uncomfortable amount of well-labeled data for just that one

I think you're right. If I want to move forward with it I think it's the only feasible way to validate a proof of concept. Generalizing can't produce a useful tool at my scale.

Thank you! I think this was a helpful nudge. Narrow classifiers could make some things a lot easier. Do you know of any reading materials about routing like this? Is it just programmatic decision tree stuff, or is there something more clever I'm unaware of?


Glad it helps. As for narrow classifiers, it's decision tree logic as you say, and best done via trial and error than over-engineering and theory. Cleverness comes from your own experience :)

Why do you think it sucks?

I used to dislike JavaScript a lot after learning it and PHP, then using languages like C#. Then TypeScript came along making JS much easier to live with, but has actually become quite nice in some ways.

If you use deno as your default runtime, it's almost Go-like in its simplicity when you don't need much. Simple scripts, piping commands into the REPL, built-in linting, testing, etc. It's not that bad!

Of course you're welcome to your opinion and we'd likely agree about a lot of what's wrong with it, but I guess I feel a bit more optimistic about TS lately. The runtime is improving, they've got great plans for it, it's actually happening, and LLMs aren't bad at using it either. It's a decent default for me.


I think it sucks because it transpiles to JavaScript and is an interpreted language. Users have to resolve the dependencies themselves and have the correct runtime. I definitely prefer my CLI tools be written in a compiled language with a single binary.

I agree, though one cool thing arriving lately (albeit with some major shortcomings) is the ability to compile binaries with deno or bun (and nodejs experimentally, I think).

With Go you can compile binaries with bindings for other binaries, like duckdb or sqlite or so on. With deno or bun, you're out of luck. It's such a drag. Regardless, it's been quite useful at my work to be able to send CLI utilities around and know they'll 'just work'. I maintain a few for scientific data processing and gardening (parsing, analysis, cleaning, etc) which is why the lack of duckdb bundling is such a thorn. I do wish I could use Go instead and pack everything directly into the binary.


you can already "compile" TS binaries with deno, but it'll include the runtime in it and etc. so it'll take some disk space but I think these days it's less of a concern than before

Totally, it's inconsequential for our use cases.

I think the binaries wind up being somewhere around 70mb. That's insane, but these are disposable tools and the cost is negligible in practice.


Not OP, I use TS but only because it’s the only option. TS is a build your own typing sandbox, more than enough rope to hang yourself.

Coming from typing systems that are opinionated, first class citizens of their languages, it doesn’t stand up.


This is one of my dislikes as well.

You look at libraries like Effect, and it's genuinely incredible work, but you can't help feeling like... Man, so many languages partially address these problems with first-class primitives and control flow tooling.

I'm grateful for their work and it's an awesome project, but it's a clear reflection of the deficiencies in the language and runtime.


Sometimes it's fine to be content with trivial things. Sometimes that's all you've got. It isn't wrong to be grateful and happy when small things happen for you. A lot of us should practice appreciating it more, in my opinion.

And frankly, the bigger things, the more substantial things; those are fewer and farther between. They're harder to populate a map like this with. They're certainly preferably in some ways, but realistically, it's not the primary stuff of surveys like this.



Haha, speaking of simple pleasures. One of my favourite experiences to have these days is reading these with my son.

Some of my top strips are the ones where Calvin and Susie Derkins are grown up and Calvin is having successive crises about everything she says or does.

I brought a surprise!

Let's hope it's a divorce...

https://i.redd.it/myocdlddt02d1.jpeg


Those also being wonderful parodies of soap opera comics like Rex Morgan is great, especially for Comics Curmudgeon enjoyers: https://joshreads.com

I generally agree that the harness isn't good, but it works and gets the job done and that seems to be the singular goal of the top 4 or 5 companies building them.

We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre work, but the takeaway seemed to be 'yeah but it works and they've got crazy revenue'.

That's where we're at. The harness is kind of buggy. The LLM still wanders and cycles in it sometimes. It's a monolithic LLM herding machine. The underlying model is awesome and the harness works well enough to make it super effective.

We can do so much better but we could also do worse. It's a turbulent time. I'm not super pleased with it all the time, but it's hard to criticize in many ways. They're doing a good job under the circumstances.

I see it kind of like they're at war. If they slow down to perfect anything, they will begin to lose battles, and they will lose ground. It's a highly contentious space. The harness isn't as good as it could be under better circumstances, but it's arguably a necessary trade off Anthropic needs to make.


> We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre work

Based on this, are there any open source harnesses that have objectively good-to-excellent work in their code?


I've been using OpenCode until yesterday (with some plugin to let me use their model until they implemented what it seems very sophisticated detection to reject you).

It just has a sane workflow it's easy to use, doesn't bother you with 1000 questions if you allow this or that to run and generally it feels like the model is dumber and makes more mistakes since yesterday since I have to use claude code.


pi.dev

very minimal, extensible.


Agreed, this is the best I've seen so far.

> We saw what Claude Code looks like inside, and it's objectively bad-to-mediocre

Do you have an example to contrast by what measure is good besides your word?


I don't love seeing slop everywhere and I don't feel good about models being trained on people's hard work, but... I also have a hard time believing my work was ever much different. I've always regurgitated and synthesized existing solutions. I took them from open source examples. I read people's blogs. I'm basically a really slow LLM most of the time. Is that a form of deception too? I really wonder how much of a difference it is sometimes. Maybe LLMs are just a shortcut of sorts to get where we've previously gotten using very similar means. Just absorbing and recycling ideas, learning by reinforcement, so on.

No, building on top of other peoples work is fine. Taking credit for work you didn't do is not the same.

That's a valid take. I think there's substance to that claim. Maybe what I've been struggling with lately is how blurry the lines seem to be. When am I building on top of something, and when am I claiming credit I don't deserve?

Along these lines, an interesting category of work is when I have an LLM do something I could do myself. I totally understand the code, I instruct it all the way, I have it redo things, revise, rejig, etc... But I don't actually write any code. How responsible am I for any of that?

At work there are a ton of small scripts I use for piping data around ad-hoc, and this is often how I do it. Claude can make dumb pipes really well and remarkably quickly with reasonably clear specs given to it. I compose all kinds of specs, reports, plans, etc. using this workflow. And I find myself wondering... How much of this is me? How much credit do I deserve? The code is gone, the outputs remain, and I can't quite tell how responsible I am for the end product. It's a strange experience.


To me it's about effort more than anything. If you're sharing your work with others, people want to see things in which the author put effort into. Likewise, if I put effort into an endeavour I can feel good about the result.

In my experience no, but I don't think that's a problem.

It's fascinating to see so many ideas and so much enthusiasm. I sometimes wonder if the fervor will die down as people realize it's still really hard to make truly fantastic software, but it's hard to say. There's a ton of inertia behind the vibe coding rush.

I also wonder if vibe coding is actually somewhat incompatible with the states of mind and contemplation that's often required to figure out how to solve problems properly. It isn't clear if you can brute force great solutions without putting in the initial domain distillation and idea incubation and so on. I'm sure there are exceptions but I have a feeling it'll never be trivial to come up with truly good and novel ideas for software, and vibing to get there might not make it any easier.


Without giving away exactly how old I am…

I am old enough to remember old programmers complaining about the wave of new shareware/freeware apps that people made with Visual Basic when that came out. Many of the apps were visually awful because it opened up desktop app development to people with no aesthetic experience.

I don’t see that awful style any more despite those tools for rapid UI creation still existing, did those people get better or did they get bored and move on to other things?

I guess the same will happen with vibe-coders, they’ll get the experience to make better software or their poor quality apps won’t give them what they want and they’ll move on.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: