> Matches that occur early enough in π to attain significant compression will not be varied. That is, it isn't possible to use π to compress interesting, real-world data because real-word strings are unlikely to arise early.
for
> Calculate the number of bits to encode that value using log2(938933556), which is ~29.8
This is roughly same as saying: "If you rewrite 938933556 as a binary number / usize, it will need 30 bits".
Sanity check: 1101111111|0110111111|0100110100 (| delimits every 10 bigits).
> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.
This statement is a bit more subtle. As a first ord approximation, we can see pi sort of as a RNG.
If we write pi (ignore the decimal point), as a binary number, we get:
11011001111111011110010101011110001010101111101101110001001100001...
You can... kind of squint and pretend this is a random sequence of 1s and 0s.
Now, if you had a file that is 128 bits (so lots of intermingling 0s and 1s), and each next digit of pi is effectively a coin flip. Pretend 1s are heads, and 0s are tails. You basically have to get the exact 128 consecutive coin flips of the same result as your file to get your file back.
Imagine now, PI not as a number, but a sequence of experiments of flipping the coin 128 times.
You have to try, on expectation, quite a few times to win this game! Now, you could easily get lucky for sure. But on average, your chance of winning per attempt is roughly 0.5^128! So, how many times do you have to try to win this game? Something like 2^128 times - and you have to consider that each attempt uses 128 bits as well. So more like 2^135. But you don't have to start fresh in each attempt, you can see it as like this:
- 11011................00100...
- ( 128 flips )
- ( another 128 )
- ( )
- ... so on and so on
I do have the Qwen 3.6 (35B) MTP implementation running (in LM Studio; it doesn't need a separate drafter), along with non-MTP Gemma 4 26B, and I can see that Unsloth Studio can run the new QAT, but I can't see how you can run the assistant/drafter. Yet.
It's just a constantly changing landscape. Don't get me wrong, it's fascinating and for various reasons I am pleased I can keep up even slightly, but eeeehhh :-)
Yes. I'm using Gemma-4 31B (gemma-4-31B-it-assistant.Q4_K_M.gguf) with llama.cpp to attribute quotations throughout chapters of my sci-fi novel. I started with Qwen3, but couldn't get it to work. Qwen3 TTS Voice Design, on the other hand, is incredible (Qwen3-TTS-12Hz-1.7B-VoiceDesign). I'm using both for an audiobook generator that produces a variety of voices.
I developed https://keenwrite.com for my hard sci-fi novel. I started with OpenOffice and a spreadsheet and then realized I could combine a character sheet with a Markdown editor. The character sheet became a YAML file with interpolated strings. The editor calls out to ConTeXt for typesetting to PDF. To create an audiobook, the same character sheet identifies the characters for gemma4:31b, which excels at quotation attributions when given a cast of characters and curated list of emotions. Next, I feed the chapters coupled with JSON-formatted attributions, pronunciation guides, and voice descriptions into qwen3 (VoiceDesign, Base, and 32b) to produce an audiobook with a full cast of characters.
Here's some output (to console, not JSON for brevity here) from gemma:
Unknown (chanting): "Free the food, free the people."
Unknown (chanting): "Border walls trap us all."
Chloé Angelos (focused): "Let's see,"
Yūna Futaba (serious): "The push draws ever nearer,"
Chloé Angelos (commanding): "Yūna, buzz the CDC,"
Unknown (formal): "CDC Emergency Operations Centre. What's your emergency?"
Chloé Angelos (urgent): "Pandora's brew. Populated areas. Releasing soon. Loop in Beale Air Force Base."
I haven't listed all the minor characters, yet, which is why the LLM attributed "unknown" to some quotations.
I'm in the process of containerizing the solution. If interested, email me.
However, these are likely not the "hard" problems you've mentioned. I feel like I can architect solutions at a higher-level now, without having to be completely caught up in many technical nuances. I'd rather not learn the extensive PDFjs API, for example, because it would take weeks of effort to understand.
Dependency-free, performance, FORTRAN, and it would take me more than ten minutes to find and integrate a highlighter that works across all of my code bases.
I searched for PHP-based Git libraries. All of them either invoked "git" using a system call or offered write abilities to the repo. I wanted a pure PHP solution that did not write to any files or invoke executable files (for security purposes). Maybe I didn't search long enough; at some point it becomes faster to tell the LLM what's wanted than to find a solution that fits.
* https://impacts.to/downloads/lowres/impacts.pdf#page=9
* https://impacts.to/bibliography.pdf
reply