Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
Adaptive speculative decoding: picking draft lengths at runtime
(
fergusfinn.com
)
3 points
by
hasheddan
3 days ago
|
past
|
discuss
InfiniBand, RoCE, and All That
(
fergusfinn.com
)
4 points
by
hasheddan
3 days ago
|
past
|
discuss
InfiniBand, RoCE, and All That
(
fergusfinn.com
)
3 points
by
kjeetgill
5 days ago
|
past
|
discuss
InfiniBand, RoCE, and All That
(
fergusfinn.com
)
4 points
by
kkm
6 days ago
|
past
|
discuss
UCCL-EP: DeepEP-style expert parallelism on any NIC, no GPU-initiated comms
(
fergusfinn.com
)
8 points
by
kkm
10 days ago
|
past
|
discuss
Anatomy of a high-performance EP kernel
(
fergusfinn.com
)
16 points
by
kkm
15 days ago
|
past
|
1 comment
The Economics of Speculative Decoding
(
fergusfinn.com
)
30 points
by
kkm
17 days ago
|
past
|
6 comments
Speculative KV coding: losslessly compressing KV cache by up to ~4×
(
fergusfinn.com
)
155 points
by
kkm
21 days ago
|
past
|
48 comments
70x faster cold(ish) starts for SGLang
(
fergusfinn.com
)
1 point
by
kkm
22 days ago
|
past
Bringing Up DeepSeek-V4-Flash on AMD MI300X
(
fergusfinn.com
)
120 points
by
kkm
23 days ago
|
past
|
25 comments
Pushing memory bound CUDA kernels past the speed of light with data compression
(
fergusfinn.com
)
2 points
by
somnial
28 days ago
|
past
Speculative KV coding: ~4× losslessly compressed KV cache using a small model
(
fergusfinn.com
)
2 points
by
somnial
44 days ago
|
past
In search of wasted bits: how much information do LLM weights carry?
(
fergusfinn.com
)
1 point
by
gmays
46 days ago
|
past
Redundant Information in LLM Weights
(
fergusfinn.com
)
5 points
by
mezark
51 days ago
|
past
Tans: Precomputing RANS
(
fergusfinn.com
)
3 points
by
mezark
56 days ago
|
past
Also-RANS: Asymmetric Numeral Systems for Entropy Coding
(
fergusfinn.com
)
25 points
by
mezark
56 days ago
|
past
70x faster cold(ish) starts for SGLang
(
fergusfinn.com
)
1 point
by
somnial
59 days ago
|
past
70x faster cold(ish) starts for SGLang
(
fergusfinn.com
)
4 points
by
mezark
62 days ago
|
past
Parallel Primitives for Multi-Agent Workflows
(
fergusfinn.com
)
1 point
by
mezark
5 months ago
|
past
LLM powered data structures: A lock-free binary search tree
(
fergusfinn.com
)
1 point
by
somnial
5 months ago
|
past
Parallel Primitives for Multi-Agent Workflows
(
fergusfinn.com
)
1 point
by
somnial
5 months ago
|
past
Scheduling in LLM Inference
(
fergusfinn.com
)
1 point
by
somnial
7 months ago
|
past
How fast can an LLM go?
(
fergusfinn.com
)
2 points
by
kkm
7 months ago
|
past
How fast can an LLM go?
(
fergusfinn.com
)
2 points
by
gmays
7 months ago
|
past
How fast can an LLM go?
(
fergusfinn.com
)
2 points
by
somnial
7 months ago
|
past
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: