i used to mix remote and local minimax 2.7(q3) on my strix halo, it run at 30 tg and 220 tokens pp... it was a bit painful slow, but it was a good feeling i could stay offline. unfortunately m3 which is at opus .8 levels is 460b parameters and doesn't even fit in 128gb of memory, let alone a big context. strix halo feels like a toy for ai purposes. https://kyuz0.github.io/amd-strix-halo-toolboxes/
My strix halo board is feeling more useful and less toylike with the recent performance gains combined from MTP, better quantization, and generalized performance improvements across the stack. For example, I can run Unsloth's Gemma4-31B 4-bit QAT model with around 30tg and 200pp. I don't find that to be too slow at all. Particularly because it's nearly full accuracy and good enough for a lot of different stuff I throw at it.
I think it also helps that I'm using my machine to do home server stuff. It excels at all of the traditional workloads. Then I can lean on the AI to help with automation here and there. I find it deeply satisfying.
you can absolutely use it for some workloads, but as soon as you have some extra complexity for a big repo it'll take forever and the economics are so silly to the point that the electricity bill would be comparable to a subscription. I love having the possibility of running things locally if some random dude decide to pull them plug, and give me solice the fact that i can have 100% private inference, but as the main driver during the day? shoot me
>It's such a weird "Gotcha" that seems to only assume that Chinese LLMs might censor something.
We are not assuming anything; it is illegal, and you will get prison time just for talking about it. Yeah, sure, everyone distorts reality, but there is a huge gap between hiding and enforcing.
So yeah, having models respond accordingly is unexpected. There are probably multiple variants tuned differently.
This is probably a semantic issue seeing as we don't have a widely agreed definition of it.
I like to think about it in terms of self-reflective, subjective experience.
I'm not even sure if emotions would be a requirement and was surprised to see Chiang so hung up on them. Would he consider humans which can have a variety of mental disorders, causing a complete lack of some of them to not poses consciousness?
I had a subscription before the price was cut down; the model kept randomly looping the with same character (burning 30% of the budget in one shot), and the overall performance for agentic purposes is, simply put, terrible.
It finds non-existing bugs and randomly removes chunks of code to fix them, then even presents it as an "extra fix".
Maybe it's a good generalistic model; I haven't tested it in that regard.
MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks.
I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need?
Funny, I had the opposite experience with MiniMax and Mimo when using OpenCode. MiniMax got stuck with looping through broken tool calls all the time and MiMo just powered through things and for the most part just worked.
similarly for me, MiniMax is kind of horrible that it somewhat regularly fall into loops that I had to save it from. DeepSeek & MiMO rarely got stuck. wonder how you get completely reversed experience.
I did develop my own agent around MiniMax. I did see weird behavior when I messed up the loop, like omitting pieces of remove thinking; maybe it's an agent bug, some models/providers just ignore/normalize the broken input, some don't.
reply