More

yorwba · 2026-06-08T05:22:51 1780896171

Those are somewhat separate concerns. You could have companies making independent hiring decisions while systematically discriminating against demographic groups, and you could also have companies all use the same system that systematically disadvantages certain individuals, but it's unrelated to their demographics and instead based on things like their resume not being easy to OCR.

In this case, the claim is that both are happening: companies aren't making decisions independently and they're doing so in a way that discriminates against certain demographics. But the evidence needed for each half of the claim is different.

yorwba · 2026-06-07T08:48:03 1780822083

An attempt at a summary of the argument:

- Human brains are estimated to have a few hundred trillion synapses. If you tried to replicate this in a neural network model with one parameter per synapse, it would be much larger than the largest models in use today.

- Conventional wisdom in form of the Chinchilla scaling law suggests that to train such a gargantuan model, you would need an even more gargantuan training corpus.

- But no human has read anywhere near as much as even relatively small Chinchilla-optimal models. In fact, rather than acquiring as much data as possible as efficiently as possible, children might rather rewatch the exact same video for the umpteenth time. When they learn arithmetic, it's from just a paltry few examples provided by the teacher in school.

- Large neural networks trained on such little training data would quickly memorize it perfectly and overfit horribly.

- Individuals with photographic memory demonstrate that human brains indeed have the memorization capacity you would expect based on synapse count, and appear to show difficulties with generalization as a side-effect.

- Speculatively, typical humans forget and generalize instead of memorizing because synaptic strengths are reduced during sleep in an analogue to regularization by weight decay.

- Therefore, maybe we should train extremely large models on little data with extremely strong weight decay to counteract memorization, and hope a large learning rate will quickly "catapult" it to a generalizing solution.

What I'm missing is a discussion of how much this would cost, even if you handle deployment by distillation into smaller, faster, less data-efficient models.

logicchains · 2026-06-07T09:48:54 1780825734

>But no human has read anywhere near as much as even relatively small Chinchilla-optimal models

They're missing that humans don't consume raw text. They consume non-stop high resolution, high FPS audio and video imagery. If you tokenized the input to human eyes and ears in the first few years of life, that's more data than even the largest LLMs are trained on.

yorwba · 2026-06-07T10:35:36 1780828536

I didn't include it in my summary (it took me an hour to read the whole thing, obviously a lot had to be cut) but the article does actually address the "high resolution" argument in a three-paragraph bullet point under the "Sample Inefficiency" subheading: https://gwern.net/llm-catapult#sample-inefficiency If you read it on a 4K screen at 120 FPS, you should be able to take in its information content in less than a microsecond.

lostmsu · 2026-06-07T11:56:32 1780833392

They "address" it by making false statement that the video stream is highly predictable. Sure, you might be able to predict 99% of video stream (for which you'd need to have a physics model, negating the whole point of baby fast learning), but the remaining 1% is still in terabytes if not petabytes per year.

nnoremap · 2026-06-07T13:36:13 1780839373

I think this is addressed in the blog post:

  And on the human side, disabled people are not much less intelligent than normal humans: deaf/blind people are much worse at language tasks, but their fluid intelligence often remains normal. If the sensory bandwidth were so critical, this would be impossible.

throw310822 · 2026-06-07T09:40:52 1780825252

> Human brains are estimated to have a few hundred trillion synapses. If you tried to replicate this in a neural network model with one parameter per synapse...

Note that LLM parameters don't map to synapses in the same naive way they would for a fully connected network. Each attention parameter is applied thousands or millions of times to the inputs at each inference pass, so it's more like each param might code for a neural circuit repeated thousands of times.

I think of attention as a sort of convolution: in a NN, each convolution kernel gets applied repeatedly to all parts of an image, but in the human visual cortex I imagine these circuits are effectively all separate and parallel. The few parameters of a convolution kernel map to thousands of identical circuits in the visual cortex.

yorwba · 2026-06-07T10:09:02 1780826942

A biological synapse's weight takes effect whenever its input changes. So although it cannot be copied and applied in parallel to different inputs at the same time (and hence your visual cortex has a bunch of more-or-less identical edge-detection circuits) it can still be applied sequentially to different inputs at different times. And when LLMs do operate in sequential mode, generating tokens one at a time, they typically access each parameter at most once per forward pass.

Though there are things like looped transformers that reuse the same parameters multiple times even for a single token, so maybe those will finally give us AGI if scaled up to a trillion parameters and looped hundreds of times. (Sounds expensive!)

RaftPeople · 2026-06-07T17:54:02 1780854842

> A biological synapse's weight takes effect whenever its input changes.

I don't think it makes sense to try to compare our brains to ANN's, they are apples and oranges.

A synapse's weight is dynamically modulated by the astrocyte on multiple time scales (millisecond, sub-second, minutes), and the astrocyte itself is receiving inputs and performing computation (in addition to impacting the neural network).

yorwba · 2026-06-06T10:54:38 1780743278

The &emdash; is probably human error, other parts of the HTML correctly use — or Unicode em-dashes. Also: https://github.com/alexispurslane/rsync-analysis/commit/740b...

yorwba · 2026-06-06T08:46:46 1780735606

I think you might be confusing the Rokkasho Reprocessing Plant (not yet operational, intended for plutonium extraction from spent fuel) and the Rokkasho Uranium Enrichment plant, which has been running at 75 tSWU/year (I think that should be kSWU or tSW) since 2023-08-24 https://www.jnfl.co.jp/ja/business/about/uran/daily/enrichme... 112.5 tSWU/year since 2025-06-26 https://www.jnfl.co.jp/ja/business/about/uran/daily/enrichme... and 150 tSWU/year since 2025-11-20 https://www.jnfl.co.jp/ja/business/about/uran/daily/enrichme...

It's a bit weird though that they have a graph of tons of uranium hexafluoride shipped that shows the last shipment in 2018 and nothing since then.

yorwba · 2026-06-05T20:37:32 1780691852

Proposition 16. UHATs have polynomially bounded expansion over LTL. In particular, given an LTL formula φ, one can construct in polynomial time a UHAT T such that L(T) = L(φ).

i.e. the blowup is only exponential in one direction.

measurablefunc · 2026-06-05T21:20:35 1780694435

That says every LTL formula can be compiled into UHAT w/ polynomial overhead. It doesn't say that all languages recognizable w/ UHATs necessarily do not have succinct recgonizers in LTLs or RNNs.

Edit: Actually nevermind. If UHAT could be compiled into LTL w/ polynomial overhead then that would also work for the languages that have exponential overhead in LTL but since they don't there is a strict separation.

yorwba · 2026-06-05T08:27:04 1780648024

I thought the point of index funds weighting by market cap is that they don't require rebalancing, because the weight of stocks in the index exactly tracks price movements. You just keep holding the exact same number of shares, and more valuable stocks automatically take up more of your portfolio.

dlenski · 2026-06-05T14:30:18 1780669818

Yes, this is one of the benefits of a cap-weighted index fund.

It doesn't eliminate the need for the fund to rebalance, because of companies moving in and out of the index criteria.

But it certainly vastly reduces the need of the fund manager to trade.

(Also, stock buybacks and new share issuance should in principle not change a company's index weight, but in practice they sometimes do.)

baobabKoodaa · 2026-06-05T08:36:27 1780648587

(deleted)

yorwba · 2026-06-05T09:24:24 1780651464

If you pick stocks with the correct weight to track the index, you're effectively running an index fund. And so you don't have to rebalance to keep tracking the index.

pid-1 · 2026-06-05T09:59:33 1780653573

1 If you never rebalance, you're never adding new stocks to the index, nor removing stocks that do not belong to it anymore.

2 You need to rebalance to take corporate events into account: new stocks, buybacks, dividends, etc...

yorwba · 2026-06-05T10:29:13 1780655353

You can add stocks whenever you put money in. Whether that's because you got your paycheck or a dividend or some other income is kind of irrelevant. And you can remove stocks when you take money out. But you probably shouldn't start selling one stock to buy another just because their prices moved, unless you have information that lets you time the market.

malfist · 2026-06-05T12:22:14 1780662134

But then you wind up with a portfolio that isn't balanced and isn't tracking like an index fund. An index fund doesn't simply buy a flat amount of stock and hold it, they buy stock in proportion to the relative weight of the exchange. Which is always moving

wbl · 2026-06-05T16:05:30 1780675530

Market cap weighting is special. If company A has 500 shares, company B 500 also, than a fund that has 5 shares of A and 5 of B is market cap weighted.

malfist · 2026-06-06T11:51:24 1780746684

And what happens if company A issues more stock? Company B is delisted? Company C is now listed? Company A and C merge? Company A spins off it's most valuable side business into it's own independent listing company?

wbl · 2026-06-06T14:46:08 1780757168

Most transactions just getting the results is all you need

tionate · 2026-06-06T05:50:04 1780725004

This is only true if those 500 shares had identical value, as market cap is the number of shares x the price.

sobani · 2026-06-06T08:15:15 1780733715

They do have identical value.

500 shares of company A is worth 100% of the market cap of company A.

500 shares of company B is also worth 100% of the market cap of company B.

So if you have 5 shares of each, you'll have 1% of the market cap of each, even if one of those companies finds the cure for cancer or turns out to be a money furnace.

jimmydorry · 2026-06-05T09:44:27 1780652667

Indexes rebalance frequently. The "correct weight" today, won't be the correct weight in a year.

UncleDiaz12 · 2026-06-05T09:51:53 1780653113

What are you talking about? Those index fund are constantly rebalancing. This is why you buy an index fund, so you don’t have to constantly rebalance your portfolio.

yorwba · 2026-06-04T09:53:30 1780566810

> number array of x long (not sure how long x is, or if it's variable), which then gets projected down to a token representing that space (3-4 long as alpha-numeric)

There is no such projection step. The array of x numbers is the token. For text, there is a one-to-one correspondence between the textual representation of a token, its index in the vocabulary of the model, and the array of x numbers that is fed into the linear algebra of the model, so people often equivocate between them; but for images or sound, there is no discrete vocabulary and no textual representation, only the array of x numbers.

yorwba · 2026-06-03T09:40:44 1780479644

This is now limited to only some restricted industries: https://www.ndrc.gov.cn/xxgk/zcfb/ghxwj/202504/P020250424307... Yes, the list is long, but it's a significant improvement to the before times when all industries were off limits save for a few exceptions where foreign investment was allowed. Notably, the car industry has been mostly unrestricted for a few years now.

yorwba · 2026-06-03T08:57:32 1780477052

It's possible to simulate the classical physics of fairly large game worlds using fairly small classical computer. If you wanted to model it using quantum physics instead (where quantum computers would theoretically have an advantage) said computer would need so many qubits that it would be much larger than the world it's supposed to simulate, while the additional realism would be essentially imperceptible to the player. You'd be better off using analog computing by putting a telepresence robot inside a real-world game arena.

echoangle · 2026-06-03T09:14:04 1780478044

> It's possible to simulate the classical physics of fairly large game worlds using fairly small classical computer.

Not really, almost everything is faked and not really a physics sim. Imagine a world like GTA but every material has realistic deformation and destruction.

I’m not saying quantum computers would be able to do that, but it’s not like current video games are at a point where more compute wouldn’t improve them.

tripledry · 2026-06-03T09:40:18 1780479618

> while the additional realism would be essentially imperceptible to the player

Personally for me this is the relevant part.

I can ofc imagine some niche games like Kerbal Space Program with complete realism, but I'm not convinced it makes it more enjoyable to play. Would be interesting to see for sure.

yorwba · 2026-06-03T08:31:23 1780475483

The technical report https://microsoft.ai/wp-content/uploads/2026/06/main_2026060... has a lot of detail about decontaminating their training data and developing new in-house benchmarks to ensure reliable evaluation. If other models were just overfit to public benchmarks while Microsoft produced something that generalizes better to unseen data, they could've used those in-house benchmarks to argue that point.

Instead, they only do cherry-picked comparisons against Anthropic's small models, and not the full spectrum of competitors.

Without evidence to the contrary, I'll interpret this as just what happens when you're late to the party and insist on doing everything from scratch.

Maybe coaxing reasoning behavior out of their base model without kickstarting it by distilling from existing models provided them with valuable experience that will help improve their future models, or maybe it was an unnecessary waste of time.

fmajid · 2026-06-03T13:32:46 1780493566

If their model was trained purely on properly licensed data, the reduced legal liability could be a selling point