does this mean im actually able to try object detection in opencv now? i mean i know basic image processing techniques, and i know "in theory" how ML works but ive never really seen a case where i can just say "heres an image now detect all the apples". theres always 1. find a model that has the knowledge, 2. hook it up to an inference engine, 3. do something useful. i always get stuck at 1.
YOLO has basically solved that for my use cases for a couple years now. If you want labels that are not in the pretrained labels it's also easy to fine-tune, provided you're willing to label 200 or so images
If you need something less restricted to existing labels (say wanting all the red apples, or all cardboard signs) SAM3 is great, as the sibling comment says
Yep- this is what I do. I use a high quality VLM to generate labelled boxes (in my case, around tardigrades in a microscope image), do some light editing to fix the small number of errors, and then train YOLO26 with it. Works great, saved me tens of hours of labelling. It's a bit scary that there is a VLM that works as well as my fine-tuned model (although much slower).
thats a fantastic strategy thank you, and thanks to all the other helpful posters as well here. do you have any tips for how to choose the base yolo model? or just any generic one will do?
How do you handle object disambiguation with YOLO? All the examples I've played with have the problem where if two "cars" get too close to each other then the tracking IDs keep switching between them, meaning we'd need an additional kinetic model for disambiguation.
Large general models have taken over in NLP, and (outside of embedded/low latency applications) it seems like they are coming for CV next.
So you should soon be able to have large generic model that can detect whatever for you.
It's already pretty much possible with open-vocabulary detectors like SAM3, where you could just prompt it with "Apple": https://ai.meta.com/research/sam3/
this is so true, its hard to make a decision when that actually means you're losing a bunch of other possibilities. so no-decision becomes a decision and then you're left with nothing.
For the cognoscenti it is—like Linux, but for the vast majority it's not. If you've ever run an IT department in a large operation (which I have) then you'd never say that.
People insist on Windows at work because it's so ubiquitous, when they go home their modus operandi doesn't have to change.
Forcing workers to change OSes against their will only puts one's job on the line (management will side with workers as it's the path of least resistance). QED.
This article is wrong. LLM's encode all the domain knowledge you could possibly want. As a software dev I can query an LLM, become a domain expert in a short amount of time, and then code up a solution. If people think their niche is safe from automation, think again. Even the people who think theyre the masterminds at the top.
Edit: Yes "expert" was too strong a word. Proficient would be better. A lot of the barrier to entry in a field is just not understanding the domain.
I won't over-generalize here, because maybe your statement is true in some cases, but I will provide a counterpoint: this is not true (in my experience) in real estate title insurance and escrow services.
I've consulted for and led large teams for real estate title insurance and escrow companies for many years, and the domain expertise is so incredibly deep, nuanced, and multivariate (especially depending on jurisdiction) that building valuable and viable products in the space is incredibly difficult - before LLMs, and even now, with LLMs.
Without getting too deep into it, I'm pretty bullish on AI (and have been very close to it and deep in it for a long time, while also very apprehensive about the effects it'll have on society), and I can tell you, from extensive attempts from myself and many on my teams to leverage the latest frontier LLMs to bring deep domain experience to bear to help drive valuable products: we have not yet seen success. It's not helping engineering folks, it's not helping product folks. It's creating a ton of questionable output and hasn't resulted in real ROI, and it's not capable of accurately answering deep domain questions without hallucinations or assuming what works in one jurisdiction works in all.
I've seen success in many other areas, but not this domain - and, importantly, the regulatory environment in which title insurance operates is incredibly complex and strict, meaning you can't just YOLO LLM output into production (as much as we'd love to try so we can learn at a faster clip).
And the kicker: we've found the way for us to build the best products is still going out into the field, sitting with escrow and title folks, watching them work, asking them questions, and designing for the real world, the regulatory nuances, the local client nuances, etc. You can't get that from an LLM.
Agree with you there. What you are working on and the commenter below talking about surgery, they are all valid counter examples where the degree of expertise is quite extreme. But most people are not living on the edge of domain expertise. Im guessing 80% of the domain knowledge out there is up for grabs. For example: I dont have to go get a job at a security software company to figure out how security camera systems work and the principles involved, I can ask probing questions of an LLM and get most of it. The domain knowledge is embedded in the model.
We have put lots of effort at documenting the domain, creating precise unambiguous language, glossaries, E2Es written as user stories etc, etc.
And still models are simply not able to translate Jira tasks to clear specs, even for this well understood and common use case.
Also, they don't understand how changes in one part of the business domain will impact other parts. They can get it right 9 times out of 10, but even that is too little and compounds to deeply wrong implementations.
And they don't understand or know the people involved in these processes and what they REALLY care for or what the real priorities are. Very often political.
And that's not even mentioning the code, that ends up with the lack of proper abstractions or harness.
Or the lack of push back against bad ideas at business or code level.
Once someone taught me "you can do xyz reading a book, but you cant do surgery by reading a book". Now replace the book with LLM. This is what "domain expertise" look like for some domain.
LLM? Any verbose, struggling to focus article now looks generated rather than the work of somebody with better ideas than technique. Or they’re being paid by the word. I wonder whether there is a jargon problem with the word “great”…. These’s no way Trump or Cameron would be considered “great” but the world changed through their direct actions. One could argue that they just happen to be there when underlying forces interact and that the lone actor model of history is naive.
Many of us have written the “was Hitler inevitable” paper at uni and elsewhere. His particular phobias were extensive, but that time and place was ripe for such rule to appear.
reply