Sketch of a platypus

Idiosyncracies of the HTML parser

Simon Pieters

About this Book

The HTML parser is a piece of software that processes HTML markup and produces an in-memory tree representation (known as the DOM).

The HTML parser has many strange behaviors. For more than a decade, the HTML standards stated that HTML was an application of SGML, while web browsers used a very different approach to parsing HTML. Then, the WHATWG specified that the HTML parser was much closer to what contemporary web browsers did. Today, all browsers have conforming HTML parsers. This book will highlight the ins and outs of the HTML parser, and contains almost-impossible quizzes.

HTML is not only used by basically all of the web, but it is also part of many modern applications. The HTML parser is part of the foundation of the web platform. HTML parsers can be found in web browsers, but are also implemented in various languages and platforms.

You can buy the eBook on Leanpub. 50% of royalties go to Amazon Watch.

A healthy Amazon rainforest is one of the Earth's best defenses against climate change.

Climate and the Amazon

Table of Contents

  1. Preface
  2. Introduction
  3. The HTML parser
  4. Microsyntaxes
  5. DOM manipulation
  6. Serializing
  7. Implementations
  8. Conformance checkers