Parsing HTML and tagsoup is IMHO not the right application for LLMs since these ...

Parsing HTML and tagsoup is IMHO not the right application for LLMs since these are ultimately structured formats. LLM are for NLP tasks, like extracting meaning out of unstructured and ambiguous text. The computational cost of an LLM chewing through even moderately-sized document can be more efficiently spent on sophisticated parser technologies that have been around for decades, which can also to a degree deal with ambiguous and irregular grammars. LLMs should be able to help you write those.