Hacker Newsnew | past | comments | ask | show | jobs | submit | disiplus's commentslogin

Diagnosed with ADHD, ultimately does not change anything for me even through i had the same idea as you. Reason is that i can now start even more stuff in parallel. And some part of them get finished more before i can just prompt more when in focus, but instead of finishing i add more features.


same, but you need more then 100k of hw to run something like kimi k2.6 for a bigger team. on the other hand there is a ds4 flash that you can run on a macbook with 128gb ram. an that one is perfectly usable for a lot of tasks.

https://github.com/antirez/ds4


I think the quote came out to $107k. 4 AMD MI300A's. Around 60k tokens per second, 512GB of GPU memory.

https://www.gigabyte.com/Enterprise/GPU-Server/G383-R80-AAP1


Which model are you running ?


The problem is not website, the problem is discovery and discovery is on Instagram, TikTok, and social networks. You don't have any incentive to build a website for a regular audience. What you might do is build an audience on a social network and then try to move them to a website.

But at that point you're big enough to build it properly.


You can always follow the POSSE pattern [1] (except for platforms that actively punish links to your own site of course). That way you get both the reach and remain independent in terms of content moderation.

[1] https://www.citationneeded.news/posse/


i dont know what are you talking about, i replaced an older gpt4o with a finetuned qwen. there is a huge amount of "AI, that can be done with those models, or partly by those models." Huge amount of people would not notice the difference. And if you prepare the context correctly, even bigger slice of people would not notice.


If it helps, I mean it in a really literal sense. qwen3.6 27b is currently $3.20 per million tokens on openrouter right now which is way overpriced. As good as the 27b is, kimi k2.5 $3.00 and it's just in another league in terms of capability. There's no reason to spend money on it.

And even alibaba's own qwen3.6-plus is $1.95, so it's kinda easy to come to a conclusion that alibaba (nor anyone else) is really interested in hosting that model.

And don't get me wrong, I fully agree with you, qwen3.6 27b is an amazing model. I run it on my own hardware and every day I'm constantly surprised with what it can zero shot.


Genuinely curious, what are you "fine tuning" these smaller models to do reliably? I hear this talked about a lot but very few people actually cough up examples, and I'd love to actually hear of one.


depends, a super small one finetuned to do function calling instead sending it to big model and waiting, instead, you ask for a revenue in last month, i do a small llm function call -> show results. some bigger ones, analysis, summary, classification. what is great with smaller ones, and im looking at 2b, 4b is you can get a huge throughput with just vllm and a couple of consumer gpus. what i usually do is basically distillation of a big one onto smaller one.


nice, will run it later agains qwen3.6 27b, the speed was one of the reasons why in was running qwen and not gemma. the difference was big, there is some magic that happpens when you have more then 100tps.


Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.


was part of the beta, its properly good model, in some sense i forgot that im not on opus or gpt. opus is still better. gpt is the one struggling for me. it has some niche in backend work but you can get the same with opus with skills, its lacking in almost all others.


Funny, for me Opus is struggling since about February.

4.7 made no difference, so for the first time in many moons I am cancelling my subscription.


It looks like its called prolite.

https://snipboard.io/jmGKfM.jpg


yet


i have glm and kimi. kimi was in most of the cases better and my replacement for claude when i run out of tokens. Now im finding myself using glm more then kimi. Its funny that glm vs kimi, is like codex vs claude. Where glm and codex are better for backend and kimi and claude more for frontend.

as kimi did a huge amount of claude distilation it seems to be somewhat based in data

https://www.anthropic.com/news/detecting-and-preventing-dist...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: