Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Adaptive speculative decoding: picking draft lengths at runtime (fergusfinn.com)
3 points by hasheddan 3 days ago | past | discuss
InfiniBand, RoCE, and All That (fergusfinn.com)
4 points by hasheddan 3 days ago | past | discuss
InfiniBand, RoCE, and All That (fergusfinn.com)
3 points by kjeetgill 5 days ago | past | discuss
InfiniBand, RoCE, and All That (fergusfinn.com)
4 points by kkm 6 days ago | past | discuss
UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms (fergusfinn.com)
8 points by kkm 10 days ago | past | discuss
Anatomy of a high-performance EP kernel (fergusfinn.com)
16 points by kkm 15 days ago | past | 1 comment
The Economics of Speculative Decoding (fergusfinn.com)
30 points by kkm 17 days ago | past | 6 comments
Speculative KV coding: losslessly compressing KV cache by up to ~4× (fergusfinn.com)
155 points by kkm 21 days ago | past | 48 comments
70x faster cold(ish) starts for SGLang (fergusfinn.com)
1 point by kkm 22 days ago | past
Bringing Up DeepSeek-V4-Flash on AMD MI300X (fergusfinn.com)
120 points by kkm 23 days ago | past | 25 comments
Pushing memory bound CUDA kernels past the speed of light with data compression (fergusfinn.com)
2 points by somnial 28 days ago | past
Speculative KV coding: ~4× losslessly compressed KV cache using a small model (fergusfinn.com)
2 points by somnial 44 days ago | past
In search of wasted bits: how much information do LLM weights carry? (fergusfinn.com)
1 point by gmays 46 days ago | past
Redundant Information in LLM Weights (fergusfinn.com)
5 points by mezark 51 days ago | past
Tans: Precomputing RANS (fergusfinn.com)
3 points by mezark 56 days ago | past
Also-RANS: Asymmetric Numeral Systems for Entropy Coding (fergusfinn.com)
25 points by mezark 56 days ago | past
70x faster cold(ish) starts for SGLang (fergusfinn.com)
1 point by somnial 59 days ago | past
70x faster cold(ish) starts for SGLang (fergusfinn.com)
4 points by mezark 62 days ago | past
Parallel Primitives for Multi-Agent Workflows (fergusfinn.com)
1 point by mezark 5 months ago | past
LLM powered data structures: A lock-free binary search tree (fergusfinn.com)
1 point by somnial 5 months ago | past
Parallel Primitives for Multi-Agent Workflows (fergusfinn.com)
1 point by somnial 5 months ago | past
Scheduling in LLM Inference (fergusfinn.com)
1 point by somnial 7 months ago | past
How fast can an LLM go? (fergusfinn.com)
2 points by kkm 7 months ago | past
How fast can an LLM go? (fergusfinn.com)
2 points by gmays 7 months ago | past
How fast can an LLM go? (fergusfinn.com)
2 points by somnial 7 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: