Table of Contents

Chapter 4. Scripting complications

Revised overview of the HTML parser

TODO

document.write()

TODO

Blocking the parser

TODO

Speculative parsing a.k.a. preload scanner

TODO

Other parser APIs

TODO

Window DOMParser Element,ShadowRoot innerHTML Element outerHTML Element insertAdjasentHTML Range createContextualFragment

innerHTML and friends

TODO some introduction before getting into the weeds...

#htmlpubquiz How do you get a Siamese twins document (i.e. two <head>s and two <body>s) using only innerHTML/outerHTML?

Correct answer:

<!DOCTYPE html>
<script>
document.head.outerHTML = '';
document.body.outerHTML = '';
</script>

When the parser reaches </script>, before running the script, the body element hasn't been created yet:

#document
├── DOCTYPE: html
└── html
    └── head
        └── script
            └── #text:  document.head.outerHTML = ''; document.body.outerHTML = '';

The first line in the script sets document.head.outerHTML to the empty string. outerHTML is like innerHTML but it replaces the element with the parsed nodes. The spec for outerHTML will invoke the fragment parsing algorithm on the given value, and then call the DOM replace algorithm on the context object with the parsed result.

The fragment parsing algorithm then calls the HTML fragment parsing algorithm, with context being the html element (the parent of the head element). This will set up a new instance of the HTML parser, with the state of the HTML parser as appropriate for context. In particular, this step:

  1. Reset the parser's insertion mode appropriately.

...which says:

  1. If node is an html element, run these substeps:

    1. If the head element pointer is null, switch the insertion mode to "before head" and return. (fragment case)

So when this parser parses the markup given (the empty string), it starts in the "before head" insertion mode. It immediately reaches EOF, so steps through the usual states and appends a head and a body element.

At this point, if we were to inspect the DOM right after the document.head.outerHTML assignment, it looks like this:

#document
├── DOCTYPE: html
└── html
    ├── head
    └── body

The parser-created head has been replaced by fragment parser-created head and body elements. Now, document.body is no longer null, since a body element exists, even though the still running main parser hasn't created one yet.

Next, the document.body.outerHTML = '' line does basically the same thing but for the new body element: replace it with new head and body elements:

#document
├── DOCTYPE: html
└── html
    ├── head
    ├── head
    └── body

The first head didn't go away; outerHTML only replaces the element you call it on, not other siblings.

Now the script is done, and the main parser is allowed to continue. The insertion mode is "in head", since the script element was in head. The next token is end-of-file, so the insertion mode switches to "after head", where it inserts a body element and switches to "in body", and then it stops parsing.

#document
├── DOCTYPE: html
└── html
    ├── head
    ├── head
    ├── body
    └── body

DOM manipulation

TODO

Modifying the DOM during parsing

Script can execute during parsing, and those scripts can modify the DOM. This can lead to some interesting effects.

Simple example:

<!doctype html>
<body>
<script>
 document.body.remove();
</script>
Oops.

The resulting DOM is:

#document
├── DOCTYPE: html
└── html
    └── head

At least it didn't lose its head...

Note that the text "Oops.", which the parser processed after running the script, is not in the DOM. It was inserted into the body element, that the script had removed.

#HTMLQuiz what happens?

<iframe id=x></iframe>
<script>
x.contentDocument.body.appendChild(x);
</script>

Correct answer: iframe escapes.

Nothing prevents the iframe element from being moved to its own document (about:blank is same-origin). So the iframe element is removed from its original document.

The spec for appendChild() does have various checks in place to make sure that the resulting DOM wouldn't violate invariants; e.g., if you tried to append an element to a text node, that would throw. But appending an element to another element is allowed, even across (same-origin) documents.

When an iframe is removed from a document, its browsing context disappears. So the child document does not have a browsing context when the iframe element is inserted into it. Therefore the iframe, after the move, does not have a new child browsing context (there's no infinite recursion happening). The spec for the iframe element says:

When an iframe element is removed from a document, the user agent must discard the element's nested browsing context, if it is not null, and then set the element's nested browsing context to null.

If the script had saved a reference to the iframe's window, the script would still be able to access it, its document, and the moved iframe element, after the move.