Hacker Newsnew | past | comments | ask | show | jobs | submit | EmanuelB's commentslogin

I can't notice any difference to 4.6 from 3 weeks ago, except that this model burns way more tokens, and produces much longer plans. To me it seem like this model is just the same as 4.6 but with a bigger token budget on all effort levels. I guess this is one way how Anthropic plans to make their business profitable.

During the past weeks of lobotomized opus, I tried a few different open weight models side by side with "opus 4.6" on the same issue. The open weights outperformed opus 4.6, and did it way faster and cheaper. I tried the same problem against Opus 4.7 today and it did manage to find one additional edge case that is not critical, but should be logged. So based on my experience, the open weight models managed to solve the exact problem I needed fixed, while Opus 4.7 seem to think a bit more freely at the bigger picture. However Opus 4.7 also consumed way more tokens at a higher price, so the price difference was 10-20x higher on Opus compared to the open weights models. I will use Opus for code review and minor final fixes, and let the open weights models do the heavy lifting from now on. I need a coding setup I can rely on, and clearly Anthropic is not reliable enough to rely on.

Why pay 200$ to randomly get rug-pulled with no warning, when I can pay 20$ for 90% of the intelligence with reliable and higher performance?


Its funny to think that with a model release Anthropic can slide in some instructions ("be a bit more detailed" or something similar) that affect the token output by a few percent, 5-10%, which will not be noticeable by most users but over the course of the year would bring solid growth (once the VC craze is over, if ever) and increase income.

"Regular companies" would love to have a growth like that without effectively doing anything.


I like how some people are accusing them of reducing the overall token usage to screw over Claude Code users and then there are yet other people that are accusing them of deliberately increasing token usage to screw over API users (or maybe to get subscription users to upgrade, I'm not really sure)

I suspect the real issue is that they just change stuff "randomly" and the experience gets worse/better cheaper/more expensive.

Since you have no way of knowing when they change stuff, you can't really know if they did change something or it's just bias.

I've experienced that so many times in the last month that I switched to codex. The worst part is, it could be entirely in my head. It's so hard to quantify these changes, and the effort it takes isn't worth it to me. I just go by "feeling".


They don't even need to do anything. LLMs are effectively random anyway. Even ignoring temperature and inadvertent nondeterminism in inference, the change in outputs from a change in inputs is unpredictable and basically pseudorandom. That's not to say they aren't useful, just that Anthropic could make zero changes and people would still see variations that they'd attribute to malice.

The issue is business and transparency. Transparency is often in the customer's interest at the individual business's expense.

There are very, very few things that can be completely transparent without giving competitors an advantage. The nice solution solution to this is to be better and faster than your competitors, but sometimes it's easier just to remove transparency.


I expect "model transparency" to become the new "SSO" enterprise feature differentiator.

Enterprise use cases have to have it (or else pawn the YOLO off on their users), so it will be a key way to bucket customers into non-enterprise vs enterprise pricing.


Nobody is accusing them of making the models more efficient.

People are complaining they are changing how many tokens you get on a subscription plan.

Why would anyone dislike getting more service for less (or the same) amount of money?


> People are complaining they are changing how many tokens you get on a subscription plan.

They didn't change this. It's the same number of tokens just a different tokenizer.


They absolutely do change this all the time - session limits vary wildly. The most damning proof of this is that there's absolutely no information about how many tokens you get per session with each subscription level, it's just terms like 5x, 20x. But 5x what? Who knows?

That's not proof of anything. Also the usage is not solely based on tokens because you also have to factor in things like prompt caching costs (and savings). So it's based on the actual API cost.

You and I have no way of knowing that.

Except that the API cost is literally logged on disk for every session and it's easy to analyze those logs.

We aren't talking about API costs or number of tokens consumed, we are talking about number of tokens in a monthly subscription.

Again, it is not based on number of tokens. If it was solely based on number of tokens then things like cache misses would not impact the usage so much. It's based on the actual cost which includes things like the caching costs.

I think this is the case. In the early GPT-4 days I tested the same model side by side across the subscription and API. The API always produced a longer better answer. To me it felt like the API model was working how it was supposed to work while the subscription model tried to reduce its token usage. From a business perspective that would make sense. I then switched to API only because I felt like it was worth the extra cost.

I did a similar test with sonnet about 6 months ago and noticed no difference, except that the subscription was way cheaper than API access. This is not the case anymore, at least not for me. The subscription these days only lasts for a few requests before it hits the usage limit and goes over to ”extra usage” billing. Last week I burned through my entire subscription budget and 80$ worth of extra usage in about 1h. That is not sustainable for me and the reason I started looking at alternatives.

From a business perspective it all makes sense. Anthropic recently gave away a ton of extra usage for free. Now people have balance on their accounts that Anthropic needs to pay for with compute, suddenly they release a model that seem to burn those tokens faster than ever. Last week I felt like the model did the opposite, it was stopping mid implementation and forgetting things after only 2 turns. Based on the responses I got it seemed like they were running out of compute, lobotomized their model and made it think less, give shorter answers etc. Probably they are also doing A/B testing on every change so my experience might be wildly different from someone else.


The UIs all bake in system prompts and other tunable configs that the API leaves open, so does Claude Code and other harnesses. So anything you notice different over the API when you're controlling the client is almost certainly that. Note that this is kind of something they have to do because consumer UI users will do stuff like ask models their name or date, or want it to respond politely and compassionately, and get upset/confused when they just get what's in the weights.

The problem with subscriptions for this kind of stuff is that it's just incompatible with their cost structure. The worst being, subscription usage is going to follow a diurnal usage pattern that overlaps with business/API users, so they're going to have to be offloaded to compute partners who most likely charge by the resource-second. And also, it's a competitive market, anybody who wants usage-based pricing can just get that.

So you basically end up with adverse selection with consumer subscription models. It's just kind of an incoherent business model that only works when your value proposition is more than just compute (which has a usage-based, pretty fungible market)


> In the early GPT-4 days I tested the same model side by side across the subscription and API. The API always produced a longer better answer.

If you are comparing responses in ChatGPT to the API, it's apples and oranges, since one applies a very opinionated system prompt and the other does not.

Since you haven't figured that out in 3 years, I didn't bother reading the rest of your comment.


this comment feels pretty rude and disrespectful for no real reason?

I don’t know about ChatGPT, but in Claude Code I _have_ been able to do a side-by-side comparison of API-based metered billing vs subscription billing, in the same UI. You just switch from one to the other using /login.

You should probably not be so quick to dismiss what people say as nonsense.


It's almost as if there are different people with different motivations and ideas about how the world should work

They have switched tokenizer to one that generates 1-1.35x (i.e. up to 35% more) tokens for the same input.

They have changed default CC effort to xhigh.

They have said that Opus 4.7 will generate more tokens than 4.6 at same effort level.

They have increased their image input resolution meaning more tokens per image.

etc.

Maybe they are also extracting another 5% tokens from you by prompting it to not talk like a caveman, but that would hardly be noticeable.



If open weight models are sufficient for your engineering problems, then you should absolutely use them. But I haven't seen a single open weight model that can get even close to the complexity in my projects. They sometimes work for small toy examples or leetcode puzzles, but not very any real project. Really curious what models you've found that could replace current state of the art.

I've been using devstral2 with great success for a few months now. The hosted version, not running one locally or such. Devstral is open.

Devstral is good, Opus better. But not much. For me, "good" is "good enough". The difference, IME lies in context engineering: skills, agents.md, subagents, tools, prompts. A Devstral with good skills performs far better than an "blank" claude code. Claude with good skills performs even better, but hardly noticable, IME.

I am convinced I've plateaued. Better performance comes from improving skills and other "memory", prompting smarter, better context management and, above all, from the tooling around it and the stability of the services.

I do still run Claude with Opus alongside Mistral with Devstral2. Sometimes to just compare outputs, often to doublecheck, but mostly to doublecheck my statement that the difference between Devstral2 and Opus is marginally and easily covered by better context engineering.


Perhaps. I’d like to like Devstral because I’d rather give my money to an European business.

My experience with it in an existing codebase has been that it gets to results much more reliably than Gemini Flash or Haiku, but it will cut corners and write incomprehensible code even with a good Opus plan to boot.

It’s true that the context and tooling might help, but setting everything up and finding the arcane mix of correct MCPs/skills is a job in itself right now. What I do see is that I’ve wasted months trying to get good code out of Gemini, Devstral2, and a good experience out of stuff like OpenCode and everything under the sun.


> is a job in itself right now.

Yes, exactly. I consider this the core of my job now: herding agents.

I reminds me of the time that I "herded" juniors, interns and new hires very much.

And my experience is that OpenCode et.al. don't do a "Good Enough" job. It's better, than e.g. Devstral2, but without guidance, still not sufficient. I think that mostly has to do with a combination of my experience and standards and of my languages and niches.

All of them are good enough for throwing out a react spagetti, one you'd expect from fiverr or from an intern: don't look under the hood, just drive it (launch it and leave it). Claude is far better in such a "benchmark" than e.g. Devstral2.

But when I need a hexagonal-architectured, TDD and BDD covered microservice in python with zero type warnings, all models fail spectacularly out of the box. I presume their training body isn't "used" to such patterns: it's statistically unlikely to ignore type warnings in Python (wink). Just like it's statistically unlikely to write a few files of typescript for a feature, instead of pulling in an node package. Turns out esp. with claude code, it's statistically likely to comment out failing tests if the rule is "ensure all test pass" and this one hard to fix¹.

So to get this level of what we require, I need tons of rules, guidelines, skills and whatnot. On every model. So I'll just as well - indeed - pipe my money into an EU company that's cheaper and has the option of self-hosting when s* starts hitting fans.

--- ¹ I think I finally found the "context" to fix this, though. What I used to tell my interns/juniors is to take a step back and re-think the shape of things: a difficult or complex test usually means the code it is testing needs re-architecturing. Something most agents will refuse: and good, because it's side-tracking them. My solution is to tell agents to stop, document the problem, and if obvious, document the solution as well in a dedicated "technical debt" markdown file. Then in future I'll direct another agent at this file and tell it to start fixing them one at a time.


I agree with all you’ve said.

Gemini loves deleting tests as well, and all of them will relentlessly stub things to make unit tests ‘easy’.

What experience brought me is knowing where to steer them, e.g. scraping all their shitty glue code and hand-holding Sonnet into implementing classes, DI, and unit tests that aren’t brittle at all. In that way, the agents have been nice to work with: they remind us of why cleaner code and good practices make for maintainable code. I hate their React spaghetti, but most places I’ve worked had tons of React spaghetti anyway…

All of this said: I actually miss steering juniors instead. Humans are frustrating to work with, but they are also adaptable, grow with time, and are… you know, human.

Mentoring Claude isn’t exactly fun or rewarding, in the way mentoring a colleague would be. And thankfully we have memory MCP servers, otherwise it would be like mentoring a brand new intern every time you fire up Claude.


Someone just asked my what I dislike most about Mistral and about Claude code.

I run both in zed editor. Claude codes' integration is subpar - it's ACP does not report tasks, doesn't give diffs and so on.

Mistral has rate limits that I hit just too often. I'm now using Mistral Pro, where this is worse, using pay-as-you-go is better but costs me 10x the pro. The agent then stops with an error.


I find the most value to be in eval loops and multi-agent setups where a specialized or cheap model gets tasks that take load off the smarter model.

Most of the value in agentic development IMO is in the feedback loop/ability for the model itself to intelligently pull in context, but if you want to push a lot of context or have steps that are more proscribed, it's kind of a waste of money to have the big model do that. Much better to use it as a kind of pre-processing/noise-reduction step that filters out junk context.

I would say that right now the benefits are largest for this kind of work with medium-sized multimodal models. For example I have hooks/automation that use https://github.com/accretional/chromerpc to automatically screenshot UIs and then feed it into qwen-family models. It's more that I don't want to pay Opus to look at them or remember/be instructed to do that unless it goes through QA first.


> I find the most value to be in eval loops and multi-agent setups where a specialized or cheap model gets tasks that take load off the smarter model.

Yes, in theory, this should hold up, at least according to evaluations.

According to real, practical use though, none of the open weight models are generally strong enough to handle coding and programming in a professional environment though, unless you have tightly controlled scope and specialized models for those scopes, which generally I don't think you have, but maybe it's just me jumping around a lot.

Even with feedback loops, harnesses and what not, even the strongest local models I can run with 96GB of VRAM don't seem to come close to what OpenAI offered in the last year or so. I'm sure it'll be ready at one point, but today it isn't.

With that said, if you know specific models you think work well as a general and local programming models, please share which ones, happy to be shown wrong. Latest I've tried was Qwen3.6-35B-A3B which gets a bit further but still instruction following is a far cry from what OpenAI et al offered for years.



Fundamentally they're the same technology with the same exact algorithms under the hood; only the post-training alignment differs.

That is, the difference you see is either placebo effect or you being lucky and better aligning with model post-training bias.


Sorry, I was not specific enough. I did not mean that open source itself is not enough, I meant that an open source model that can actually run locally on my machine is not enough. a 32B model can not compete with a 250B+ state of the art model, at least that's my experience and seems to be the experience of many others as well.

Yes they're not as powerful, that means you need to feed them smaller tasks and rely more on plan mode.

Saying it "cannot compete" is like saying that a Kia cannot compete with a BMW.

Technically true in some sense, but fundamentally the two are the same exact thing and it's highly unlikely you have a task that actually requires a BMW.


Also my experience

Which open weights model?

It goes to a different school, you wouldn't know if

> Which open weights model?

Yes, I'm also wondering!

Currently I'm testing out gemma4:26b and qwen3.6:35b-a3b-q4_K_M locally on my M2 Max Macbook Pro.

Not the fastest, but reasonable.

However, I am also interested in getting as close as possible in performance to Opus 4.6 while minimizing my costs.


> I am also interested in getting as close as possible in performance to Opus 4.6 while minimizing my costs.

Aren’t we all? ;)


Remember, Open Weight doesn't necc. mean local. They are probably running on a larger version online, closer to Claude specs. (lol and probably distilled from Claude)

Gemma4 on an m2? That sounds promising. I have an m3 max, going to try that today

I'm actually seeing a similar thing when comparing 4.6 and 4.5. It burns a lot more tokens, does show more how it is thinking along the way, but I don't see a strong difference in the end result. Occasionally 4.6 even seems to get stuck in its 'processing' phase, while 4.5 doesn't on the same task.

Yeah my rate limits are getting exhausted way faster now. Its also way slower and overplans unless you steer it closely.

I can’t rely on this anymore.


Which open weights models did you use for this comparison, and how are you running them?

I just don't believe you.

The vast gulf between open weights and frontier models that existed 6 months ago has suddenly disappeared?

It's far more likely you're just bad at assessing model output.


Or that gulf doesn't exist for the problems they are trying to solve?

Their problem space may be just fine with open weight models regardless, but yes the release of gemma 4, GLM 5.1 and qwen 3.5 (and now 3.6!) have all happened in the last 6 months

> Why pay 200$ to randomly get rug-pulled with no warning, when I can pay 20$ for 90% of the intelligence with reliable and higher performance?

Then go do that. Good luck!


Solo project since 4+ years: https://kastanj.ch/en?mid=hn47741527

The goal is to make every recipe foolproof on the first try, similar to when you walk into a restaurant and just pick what you want to eat without thinking about the details. The goal is to have the same experience, just pick what you want to eat, with recipes that tells you exactly what to do with no magic involved.

Technically it is probably very different from other recipe apps. The database is a huge graph that captures the relations between ingredients and processes. Imagine 'raw potato'->'peeled potato'->'boiled potato'->'mashed potato'. It is all the same ingredients but different processing. The lines between the nodes define the process and the nodes are physical things. Recipes are defined as subsets of the graph. The graph can also wrap around into itself, which is apparently needed to properly define some European dishes in this system. The graph also has multiple layers to capture different relationships that are not process related.

Why was it designed it in this way? Because food/cooking is complex to define. This design is the only way I have found that can capture enough of these complex relationships that the computer can also 'understand' what is going on.

My favourite thing about this is that each recipe is strictly defined in the graph. If the recipe skips a step, or something is undefined, the computer knows that the recipe is incomplete. It won't ask you to do 10 things at the same time and then have something magically appear out of nowhere. It is like compile time checking but for recipes.

It also enables some other superpowers, for example: • Exclude meat part of the graph = vegetarian. Same thing works with allergies. • Include meat part of graph = only show me recipes that contain meat. • Recursive search: search for 'potato' and the computer will know that french fries are made from potato. It can therefore tell you that you could make the hamburger meal, but you will need to complete the french fries recipe first, which should take 60 minutes. • Adjustable recipe difficulty (experimental): It knows which steps can be done in parallell, and which can't based on how the nodes connect. A beginner can get a slower paced recipe with breathing room between steps, while someone more experienced can do a faster pace and do more things in parallell.

If I knew what it would take to build this, I would never have gotten started. I completely underestimated the complexity of the problem I was trying to solve. But here we are, and now it is basically done and working.

The website captures the key points from a non-technical point of view, and you can enter your email and get notified when it will launch in your country.


Sounds interesting, but if I may: the website is exceedingly sluggish, something like 1-2s interval between re-renders when trying to scroll the page (Firefox Linux). Not seeing any reason to explain this based on the page content but it's not happening with anything else on my system atm

edit: maybe `WARNING: Falling back to CPU-only rendering. Reason: webGLVersion is -1` from the console explains why, although I don't get why the page would need webGL


I took another look at this and it seem like the issue is a bug where firefox on linux for some reason fails to enable hardware acceleration when using WebGL. The landing page requires WebGL to function properly and will fall back to non accelerated rendering if it fails. I could not reproduce this on arch Linux with latest firefox build, plain default install. So it could be that some setting or extension in your browser triggers this issue. But thank you for letting me know about this :)

I can't reproduce this on stock firefox on linux. Will have to take a deeper look into this. Thank you for letting me know about it!

Solo project since 4+ years. Zero AI, but algorithms that will probably make people believe otherwise: https://kastanj.ch/en?mid=hn47700460

The goal is to make every recipe foolproof on the first try, similar to when you walk into a restaurant and just pick what you want to eat without thinking about the details. The goal is to have the same experience, just pick what you want to eat, with recipes that tells you exactly what to do with no magic involved.

Technically it is probably the most advanced recipe app ever made. The database is a huge graph that captures the relations between ingredients and processes. Imagine 'raw potato'->'peeled potato'->'boiled potato'->'mashed potato'. It is all the same ingredients but different processing. The lines between the nodes define the process and the nodes are physical things. Recipes are defined as subsets of the graph. The graph can also wrap around into itself, which is apparently needed to properly define some European dishes in this system. The graph also has multiple layers to capture different relationships that are not process related.

Why was it designed it in this way? Because food/cooking is extremely complex. This design is the only way I have found that can capture enough of these complex relationships that the computer can also 'understand' what is going on.

My favourite thing about this is that each recipe is strictly defined in the graph. If the recipe skips a step, or something is undefined, the computer knows that the recipe is incomplete. It won't ask you to do 10 things at the same time and then have something magically appear out of nowhere. It is like compile time checking but for recipes.

It also enables some other superpowers, for example:

• Exclude meat part of the graph = vegetarian. Same thing works with allergies.

• Include meat part of graph = only show me recipes that contain meat.

• Recursive search: search for 'potato' and the computer will know that french fries are made from potato. It can therefore tell you that you could make the hamburger meal, but you will need to complete the french fries recipe first, which should take 60 minutes.

• Adjustable recipe difficulty (experimental): It knows which steps can be done in parallell, and which can't based on how the nodes connect. A beginner can get a slower paced recipe with breathing room between steps, while someone more experienced can do a faster pace and do more things in parallell.

If I knew what it would take to build this, I would never have gotten started. I completely underestimated the complexity of the problem I was trying to solve. But here we are, and now it is basically done and working.

The website is slightly outdated but captures the key points from a non-technical point of view, and you can enter your email and get notified when it will launch in your country.


I am working on Kastanj. It aims to make cooking as foolproof as it can get. Anyone should be able to cook any recipe and get it right on the first try. Clear step by step images and instructions for everything etc.

It also features a recipe manager with family/friends sync. This makes it possible to upload your grandmother’s cookbook and share them with your whole family.

https://kastanj.ch/


https://kastanj.ch/

A recipe app you will actually want to use. No bloat, no ads, very minimalistic but everything works well and bugs gets fixed.

Why? Because most recipe apps and websites are frankly painful to use. I am trying to create the absolute best cooking/recipe experience possible. Something that just works.


https://kastanj.ch/

A recipe app I built primarily for me and my wife, but realized along the way that others might find it useful too. Tried to make the whole cooking experience as smooth at possible. The recipes are made for the app, and the app is made for the recipes. Tight integration which enables some really cool features. Currently working on algorithmic optimization of recipes based on how fast you work and how many things you can do in parallel. User configurable. This makes it possible to either do very beginner friendly one thing at a time, or speedrun recipes and do multiple tasks at the same time for more skilled people.

First launch will be in 2026 in Swedish. Later in 2026 English launch planned, and then based on demand other languages.


I think the future is both bright and dark. It has never been this easy to create anything yourself. Anything from software to hardware, you can buy and build the tools and make something amazing in your spare time that would only 20 years ago would take a small team with some funding.

There are 2 kinds of companies: 1. The greedy kind that always want more. They see extracting money out of their customers as their sole purpose. 2. The kind that want to build good stuff and help people.

A lot of companies start out as nr 2, but with time and growth, greedy people have a tendency to climb the ladder and turn the nr 2 companies into nr 1 unless the original team knows about this and resist such change. This also means that the founders must be okay living their whole life without owning a Bugatti. VC companies make it hard to stay as nr 2, because even if you are good, if you make a deal with a VC firm that wants to 100x their investment through you, then you have already let the greedy devil through the front door.

A nr 1 company will over time turn into a parasite. Once they extract more value than they give, it is a downward spiral of destruction on the way down. A big part of the US tech economy (as seen from Europe) have "evolved" into parasites. They say they fuel the economy. What they actually mean is that they cause a lot of money to flow around. A parasite that suck a lot of blood will also make a lot of blood flow, so it is not a good measurement of health.

The good news are that parasites die eventually. A lot of people (especially outside US) are very much aware about how toxic american companies have become. In Europe there is right now a whole sector growing rapidly that is doing "X but European" and it looks very promising. This is not only a Europe thing, the same is happening in Asia, but European laws and culture have accelerated it.

What this means is that you will see a lot of destruction and downfall as these giant parasites have to die or kill their hosts. Don't be the host. Don't rely on them. Don't do business with them. Don't work for them. Avoid them like the plague, and don't stand in their way when they fall. They will cause collateral damage, regulators should have stepped in a long time ago, but greed prevented that. These companies are already leaving big billion dollar holes in the market. They can't win those markets back, because in business trust is everything and they have lost trust from these markets. As they continue falling, they need to continue sucking out more blood of their remaining hosts which will further erode trust and create new bigger markets looking for a non toxic alternative. Be that alternative. It has never been easier. The future is bright if you want to.

Random info: "money is the root of all evil" is false. It is actually "the love of money is the root of all kinds of evil". If you dig into the original bible texts, it is clear that it is not talking about money as being evil. It is talking about a spirit (mentality) that in English would be more accurately translated as "greed". It is very clear that greed causes destruction and suffering on all levels. From companies managing to erode the middle class to Putin wanting more land. There are companies that steal (legal with greedy corrupt leaders) water only to sell it back to the local population. Why? Greed. It caused Boeing planes to fall from the sky, it caused the 2008 crash and it will cause the AI crash. No matter how much they eat, they remain hungry without limits. This is what greed does. Greed weaponizes good companies with good ideas and turn it into a money sucking machine with no limits. If you want to resist this, you have to start with yourself. Everything starts with one person. Be that person.


Samsung did something similar 10 years ago with their phones that had a pulsoximeter sensor. It could show on some scale between 0-100 how stressed you was and compare that to previous days. Probably more useful for most people than raw values for many kinds of data.


I had the exact same problem and have been working for the past 4 years on solving this issue.

https://kastanj.ch/

I think it is insane how much time people collectively spend on feeding themselves. It should be much simpler. Currently your options are something like this:

Option 1: Buy a cooking machine that can do some kinds of food quite well. Will be expensive, can't cook all types of food but works okay according to my colleagues that use them.

Option 2: Learn how to cook 5 recipes well. You will gain speed over time, but you will eat the same 5 things over and over for the rest of your life. This was my personal solution to this problem. This worked great until I met my wife. I am someone that can eat the exact same meal, everyday for months (yes I have done that) and not get tired of it. My wife is not this kind of person. Therefore option 2 stopped being an option after getting married.

Option 3: Learn how to cook for real. This will take a lot of time, failed meals, frustrations etc. But over time you can save good money because you learn how things work from the ground up. You will also gain speed over time, however you constantly need to learn new things, otherwise you will be back at option 2 but with the 20 meals you memorized. Consider this a lifestyle to do well.

Option 4: Only eating pre-made meals. Very expensive. Not good for long term health.

Option 5: Kastanj. An app that helps you cook good food without having to learn everything. If you just know how to hold a knife and what a pan is, then you have sufficient knowledge. The app will guide you through everything step-by-step with pictures. It is as fool-proof as cooking can get. Beta launch is planned in 2026.

The core ideas behind the app: - Instructions need to be idiot proof so younger me could understand them. - All instructions needs pictures, because "cut the carrot into (fancy word)" meant nothing to me. - I am the "robot". The app tells me what to do. I should not have to think and understand. Just following along needs to be enough to succeed. - Better to have 100 recipes that work 100% of the time, than 1000 recipes that work 50% of the time.

We take recipe quality very seriously. Every recipe is developed and photographed in house. Every recipe is tested at least 3 times with some variation to account for user errors. The app and all content is constantly improving to maximize success. For example, an alpha user recently managed to fail (consistency was a bit off) with one recipe despite following the instructions. The recipe was soft-banned and we set up a test where we cooked that recipe 9 times over until we managed to pinpoint what went wrong and updated the recipe accordingly. We do not accept bad recipes. This means we can't brag about having the biggest recipe collection, because developing recipes like this is slow. However the benefit is that, our users can simply scroll the app like a restaurant menu and feel confident that anything they see, they can make.


https://kastanj.ch/

A recipe app that is not dependent on AWS or Cloudflare. When everything else goes down, at least you can cook :)

Launch is planned in 2026.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: