The original Kleene Star Regex was invented to model neural networks. Have you tried throwing a transformer at the problem /s? Also O(n²) but at least you get hardware acceleration ¯\(ツ)/¯
Here's Kleene's Representation of Events in Nerve Nets and Finite Automata:
I think it would be interesting if frequentist stats can come up with more generative models. Current high level generative machine learning all rely on Bayesian modeling.
I'm not well versed enough, but what would a frequentist generative model even mean?
The entire generative concept implicitly assumes that parameters have probability distributions themselves that naturally give rise to generative models...
You could do frequentist inference on a generative model, sure, but generative modelling seems fundamentally alien to frequentist thinking?
I am more familiar with Bayesian than frequentist stats, but given that they are mathematically equivalent, shouldn't frequentist stats have an answer to e.g. the loss function of a VAE? Or are generative machine learning inherently impossible to model for frequentist stats?
Though if you think about it, a diffusion model is somewhat (partially) frequentist.
I guess you have me thinking more... things like Parzen window estimators or other KDEs are frequentist...
But while it's a probability distribution, to a frequentist they are estimating the fixed parameters of a distribution.
The distribution isn't generative, it just represents uncertainty - and I think that's a bit of the deep core philosophical divide between frequentists and Bayesians - you might use all the same math, but you cannot possibly think of it as being generative.
50 knots rotation is perfectly fine for a plane that size. A Cessna Skyhawk is certified to rotate at 55 knots fully loaded (and since the stall speed is around 40knots, for specialty take-offs like soft fields it's much lower, 50knots is more than enough).
We are building an agentic ad tech system optimized for real time and scale. The process of making an ad, from ideation to distribution, is traditionally exceptionally labor intensive. We are making it possible to target, design, and distribute ads at scale and in real time.
Personalized ads enable personalized lying by advertisers. Politicians in the 2016 election would target voters for party with enraging content while the other got shadow posts that lied to them about their candidate in a way that would not be seen, to discourage them from voting (Source: Careless People book).
I have been working on the next generation of Canva and Photoshop for highly regulated verticals where there are specific demands placed on the generation and edit flow.
Copilot and many coding agents truncates the context window and uses dynamic summarization to keep costs low for them. That's how they are able to provide flat fee plans.
If you want the full capability, use the API and use something like opencode. You will find that a single PR can easily rack up 3 digits of consumption costs.
Gerring off of their plans and prompts is so worth it, I know from experience, I'm paying less and getting more so far, paying by token, heavy gemini-3-flash user, it's a really good model, this is the future (distillations into fast, good enough for 90% of tasks), not mega models like Claude. Those will still be created for distillations and the harder problems
Maybe not, then. I'm afraid I have no idea what those numbers mean, but it looks like Gemini and ChatGPT 4 can handle a much larger context than Opus, and Opus 4.5 is cheaper than older versions. Is that correct? Because I could be misinterpreting that table.
I don't know about GPT4 but the latest one (GPT 5.2) has 200k context window while Gemini has 1m, five times higher. You'll be wanting to stay within the first 100k on all of them to avoid hitting quotas very quickly though (either start a new task or compact when you reach that) so in practice there's no difference.
I've been cycling between a couple of $20 accounts to avoid running out of quota and the latest of all of them are great. I'd give GPT 5.2 codex the slight edge but not by a lot.
The latest Claude is about the same too but the limits on the $20 plan are too low for me to bother with.
The last week has made me realize how close these are to being commodities already. Even the CLI the agents are nearly the same bar some minor quirks (although I've hit more bugs in Gemini CLI but each time I can just save a checkpoint and restart).
The real differentiating factor right now is quota and cost.
> You'll be wanting to stay within the first 100k on all of them
I must admit I have no idea how to do that or what that even means. I get that bigger context window is better, but what does it mean exactly? How do you stay within that first 100k? 100k what exactly?
Attention based neural network architectures (on which the majority of LLMs are built) has a unit economic cost that scales (roughly) n^2 i.e. quadratic (for both memory and compute). In other words, the longer the context window, the more expensive it is for the upstream provider. That's one cost.
The second cost is that you have to resend the entire context every time you send a new message. So the context is basically (where a, b, and c are messages): first context: a, second context window: a->b, third context window: a->b->c. It's a mostly stateless (there are some short term caching mechanisms, YMMV based on provider, it's why "cached" messages, especially system prompts are cheaper) process from the point of view of the developer, the state i.e. context window string is managed by the end user application (in other words, the coding agent, the IDE, the ChatGPT UI client etc.)
The per token cost is an amortized (averaged) cost of memory+compute, the actual cost is mostly quadratic with respect to each marginal token. The longer the context window the more expensive things are.
Because of the above, AI agent providers (especially those that charge flat fee subscription plans) are incentivized to keep costs low by limiting the maximum context window size.
(And if you think about it carefully, your AI API costs are a quadratic cost curve projected into a linear line (flat fee per token, so the model hosting provider in some cases may make more profit if users send in shorter contexts, versus if they constantly saturate the window. YMMV of course, but it's a race to the bottom right now for LLM unit economics)
They do this by interrupting a task halfway through and generating a "summary" of the task progress, then they prompt the LLM again with a fresh prompt and the "summary" so far and the LLM will restart the task from where it left of. Of course text is a poor representation of the LLM's internal state but it's the best option so far for AI application to keep costs low.
Another thing to keep in mind is that LLMs have poorer performance the larger the input size. This is due to a variety of factors (mostly because you don't have enough training data to saturate the massive context window sizes I think).
There are a bunch of tests and benchmarks (commonly referred to as "needle in a haystack") to improve the LLM performance at large context window sizes, but it's still an open area of research.
The thing is, generally speaking, you will get a slightly better performance if you can squeeze all your code and problem into the context window, because the LLM can get a "whole picture" view of your codebase/problem, instead of a bunch of broken telephone summaries every dozen of thousands of tokens. Take this with a grain of salt as the field is changing rapidly so it might not be valid in a month or two.
Keep in mind that if the problem you are solving requires you to saturate the entire context window of the LLM, a single request can cost you dollars. And if you are using 1M+ context window model like gemini, you can rack up costs fairly rapidly.
Using Opus 4.5, I have noticed that in long sessions about a complex topic, there often comes a point when Opus starts spouting utter gibberish. One or two questions earlier it was making total sense, and suddenly it seems to have forgotten everything and responds in a way that barely relates to the question I asked, and certainly not to the "conversation" we were having.
Is that a sign of having having surpassed that context window size? I guess to keep them sharp, I should start a new session often and early.
From what I understand, a token is either a word or a character, so I can use 100k words or characters before I start running into limits. But I've got the feeling that the complexity of the problem itself also matters.
It could have exceeded either its real context window size (or the artificially truncated one) and the dynamic summarization step failed to capture the important bits of information you wanted. Alternatively, the information might be stored in certain places in the context window where it failed to perform well in needle in haystack retrieval.
This is part of the reason why people use external data stores (e.g. vector databases, graph tools like Bead etc. in the hope of supplementing the agent's native context window and task management tools).
The whole field is still in its infancy. Who knows, maybe in another update or two the problem might just be solved. It's not like needle in the haystack problems aren't differentiable (mathematically speaking).
Don't forget volume. Just having well structured content isn't enough, you need large volumes of content such that it makes a statistical difference during the training process.
Google ads are the cheapest yes, but depending on your audience they may not be looking on Google now.
For ChatGPT (and similar) you need to have a strong FAQ page and lots of content marketing to increase the likelihood of being the suggested answer when a user asks ChatGPT a relevant question (it's a highly probabilistic system, look up AEO/GEO).
CloudFlare for example offers an option to block AI scraping bots by default. If you are in the services business, this is the opposite of what you want because having AI crawlers scrape your site would drive traffic down the road when users ask a related question.
I would also suggest having accounts with major chatbot companies and enabling the "allow training on my conversations" option and then talk to it about your services. Ultimately you just want to get your brand into the training data corpus, and the rest is just basic machine learning statistics.
Facebook ads were the cheapest for me ten years ago.
We were marketing a product that many people were happy to know existed. The dashboard gave us tools to really delve into demographics. Of all the ridiculous personal data Facebook collected, the best demographic filter was allowing me to narrow in on pages someone liked or interacted with. We were selling things related to cruising sailboats, and we could target an audience within 30 miles of Fort Lauderdale who also liked Sailing Magazine. Moreover, we could use a pixel so that only people who had also visited our website saw the ads.
Facebook had a policy of rewarding high-quality ad content. If people clicked the ad, or better yet left positive comments and discussion or shared, the price drastically decreased to fractions of a cent per impression and click-through. We were able to get ads shared a lot with people tagging other people about the product suggesting they might be interested in it. That was the holy grail for copy that we always strived for.
Of course, they got rid of all that. But at the time, it was a great way to target an audience based on third-party pages they liked, giving them ad content about products they were generally interested in—and products they were happy to know they could purchase because they had value.
Ads configuration is like gambling in Las Vegas, in that the easier the game, the worse the odds—like slot machines—and the more the player has to interact, like Blackjack, the better the payout. When done well with good configuration, we were getting 1000s of click-throughs for dollars. It was amazing.
The point is that Facebook rewarded ads that people positively interacted with, as it meant the quality of the news feed wasn't hurt by the ad.
There was a time when ads benefitted everyone, the buyer, the seller, and Facebook.
>when ads benefitted everyone, the buyer, the seller, and Facebook
As others have stated in this thread, it's called the acquisition phase. Get people hooked, build the network, make it be the place that people have to be at.
After that comes the exploit phase where said network effects make it hard to leave. You can rake in billions (trillion?) of dollars this way. Who cares if it eventually kills the company, you've made more money than god at this point.
Google Search Ads are usually the most expensive on a CPC basis out of the big platforms, but usually the CPA is much lower (even though Bing Ads can often be better value). This is usually because of 2 reasons:
1. You can target a specific part of the funnel (informational -> purchase intent) in search ads. Targeting on social networks is more about overall user profiles rather than their immediate state of mind.
2. People going to a search engine expect to leave that search engine to go to another website. Whereas people on a social network expect to stay in that portal. So clicking on an ad then doing something after is a more natural flow (and better value for advertisers).
https://hawtads.com
Just launched the blog too
https://blog.hawtads.com/
reply