Wednesday, 17th December 2025
firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.
You can see that in action by checking out the ~8GB Firefox repository and running:
cd parser/html/java
make sync
make translate
Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.
I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!
Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:
protected void startClassDeclaration() { printer.print("#define "); printer.print(className); printer.printLn("_cpp__"); printer.printLn(); for (int i = 0; i < Main.H_LIST.length; i++) { String klazz = Main.H_LIST[i]; if (!klazz.equals(javaClassName)) { printer.print("#include \""); printer.print(cppTypes.classPrefix()); printer.print(klazz); printer.printLn(".h\""); } } printer.printLn(); printer.print("#include \""); printer.print(className); printer.printLn(".h\""); printer.printLn(); }
Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.
Gemini 3 Flash
It continues to be a busy December, if not quite as busy as last year. Today’s big news is Gemini 3 Flash, the latest in Google’s “Flash” line of faster and less expensive models.
[... 1,221 words]AoAH Day 15: Porting a complete HTML5 parser and browser test suite (via) Anil Madhavapeddy is running an Advent of Agentic Humps this year, building a new useful OCaml library every day for most of December.
Inspired by Emil Stenström's JustHTML and my own coding agent port of that to JavaScript he coined the term vibespiling for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in html5rw which passes the same html5lib-tests suite that Emil and myself used for our projects.
Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:
The question of copyright and licensing is difficult. I definitely did some editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the LICENSE a joint one with Emil Stenström. I did not follow the transitive dependency through to the Rust one, which I probably should.
I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are excellent HTML5 parsers already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents vs humans territory that I ruled out in my groundrules. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!
I decided to credit Emil in the same way for my own vibespiled project.
