Using the Objectos HTML Pseudo DOM API. Objectos 0.5.3 released

Marcio EndoApr 2, 2023

Welcome to Objectos Weekly issue #020.

I have released Objectos 0.5.3! It introduces the Pseudo DOM (pseudom) API for Objectos HTML. The pseudom API replaces the Visitor API shown in the previous issue of the newsletter. The Visitor API has been removed from Objectos HTML in this release. If you are interested, this is the full list of changes.

In this issue I will show you how to use the new pseudom API.

Let's begin.

Before we begin

I use Objectos in production. The Objectos website is generated using Objectos HTML and other Objectos libraries. The internal Objectos CI process also uses other Objectos libraries, such as Objectos GIT.

However, please know that Objectos 0.5.3 is an alpha release. In particular:

it is not stable. I expect it to fail if you deviate slightly from the use-case shown here;
there may be breaking API changes between releases; and
documentation is a work in progress.

Generating different representations of your template

Suppose you have a blog and wish to provide an Atom feed. This would allow the readers of your blog to be notified of the latest articles by using a RSS reader. Not only that; depending on how you set up your Atom feed, readers can access the full article without leaving their RSS reader.

Providing the full article

An Atom feed is a XML file. I won't go into the details of the Atom format. If you're interested, here is RFC 4287.

Just know that:

the feed element is the root element;
it may contain a number of entry elements. Each represent a distinct article of your blog; and
entry elements may contain a content element. That is where the article contents goes.

So it may look like the following:

<feed xmlns="http://www.w3.org/2005/Atom">
  ...
  <entry>
  ...
    <content type="html">content goes here</content>
  ...
  </entry>
  ...
</feed>

Notice that, in our example, the content element has the html value for its type attribute. Let's look into that.

The type attribute

RFC 4287 says the following about the type attribute:

If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML [HTML]. Any markup within MUST be escaped; for example, "<br>" as "<br>".

So if our blog post is something like:

<h1>Post title</h1>

<p>Intro paragraph</p>

The content element will be rendered like the following:

<content type="html">
&lt;h1&gt;Post title&lt;/h1&gt;

&lt;p&gt;Intro paragraph&lt;/p&gt;
</content>

Notice that, while the RFC does not require the '>' symbol to be escaped, we will anyways.

Our example

We will generate the content element for the following Objectos HTML template:

import objectos.html.HtmlTemplate;public class BlogPost extends HtmlTemplate {  @Override  protected final void definition() {    doctype();    html(      head(        title("A pseudom example")      ),      body(        h1("Title"),        p("Intro text"),        h2("Subtitle"),        p("More text")        pre(code(          "class Foo {}"        ))      )    );  }}

Everything that is inside the body tag must be included in our result.

Our entry content writer

We will create a feed entry content writer using the new pseudom API.

The pseudom API provides a DocumentProcessor interface which gives you access to a HtmlDocument. The latter gives you access to the HTML elements defined in our template.

Let's have our writer implement the DocumentProcessor interface:

import static java.lang.System.out;import objectos.html.pseudom.DocumentProcessor;import objectos.html.pseudom.HtmlDocument;...final class EntryContentWriter implements DocumentProcessor {  @Override  public final void process(HtmlDocument document) {    for (var node : document.nodes()) {      if (node instanceof HtmlElement element) {        findBody(element);      }    }  }    ...}

Notice that we static importing the out member of java.lang.System. We will write our result directly to it.

Next, let's look at the process method.

The process method

The DocumentProcessor interface defines a single process method which we implemented like so:

@Overridepublic final void process(HtmlDocument document) {  for (var node : document.nodes()) {    if (node instanceof HtmlElement element) {      findBody(element);    }  }}

We iterate over the nodes of our document. If the node is a HtmlElement then we have to look for the body element. Remember, our result must contain all of the children of the body element.

Searching for the body element

The findBody method is implemented like so:

private void findBody(HtmlElement element) {  if (element.hasName(StandardElementName.BODY)) {    consumeBody(element);  } else {    for (var node : element.nodes()) {      if (node instanceof HtmlElement child) {        findBody(child);      }    }  }}

If the current element is the body element then we consume it.

Otherwise we keep looking for the body element. We do it by:

iterating over the element's nodes; and
recursively calling the findBody method.

Next, let's look at the consumeBody method.

Consuming the body element

Here is the implementation of the consumeBody method:

private void consumeBody(HtmlElement body) {  for (var node : body.nodes()) {    if (node instanceof HtmlElement element) {      writeElement(element);    }  }}

We know for sure that we are at the body element. So we write all of the elements contained in the body.

Writing the element

The writeElement method is implement like the following:

private void writeElement(HtmlElement element) {  var name = element.name();  writeStartTag(name);  if (element.isVoid()) {    return;  }  for (var node : element.nodes()) {    consumeNode(node);  }  writeEndTag(name);}

First, we write the start tag of the element.

If the element is void then it will not have any contents and we can exit the method early. If it is a normal element, then we:

consume its nodes, i.e., any text or child element; and
write the end tag.

Writing the element's contents

For completeness, here's the implementation of the consumeNode method:

private void consumeNode(HtmlNode node) {  if (node instanceof HtmlElement element) {    writeElement(element);  } else if (node instanceof HtmlText text) {    writeText(text.value());  }}

Therefore:

if the node is a child element, we make a recursive call to writeElement; and
if it is a text node, we write its value.

You can find the full source code of the processor here.

Running our example

We write a small program to run our example:

public static void main(String... args) {  var sink = new HtmlSink();  var post = new BlogPost();  var writer = new EntryContentWriter();  sink.toProcessor(post, writer);}

When executed, it prints:

&lt;h1&gt;Title&lt;/h1&gt;
&lt;p&gt;Intro text&lt;/p&gt;
&lt;h2&gt;Subtitle&lt;/h2&gt;
&lt;p&gt;More text&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Foo {}&lt;/code&gt;
&lt;/pre&gt;

Which then can be used in an Atom feed XML file, like so:

<content type="html">
&lt;h1&gt;Title&lt;/h1&gt;
&lt;p&gt;Intro text&lt;/p&gt;
&lt;h2&gt;Subtitle&lt;/h2&gt;
&lt;p&gt;More text&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class Foo {}&lt;/code&gt;
&lt;/pre&gt;
</content>

This is how the Objectos site's Atom feed is generated.

Why is it called Pseudo DOM?

The Pseudo DOM (pseudom) API is named this way because, well, it is not a real DOM API:

internally it is a forward-only event streamer (think StAX Iterator API); and
at any given time, there's a single HtmlElement instance which is reused in iterations (recursive included).

To illustrate, consider the following Objectos HTML template:

public class WhyPseudoDom extends HtmlTemplate  {  @Override  protected final void definition() {    h1("Why the pseudom name?");    p("Just an example");  }}

And we write the following DocumentProcessor for it:

public class WhyPseudoDomProc implements DocumentProcessor {  @Override  public final void process(HtmlDocument document) {    var nodes = document.nodes();    var nodesIter = nodes.iterator();    assertTrue(nodesIter.hasNext());    var h1 = (HtmlElement) nodesIter.next();    assertTrue(h1.hasName(StandardElementName.H1));    assertTrue(nodesIter.hasNext());    var p = (HtmlElement) nodesIter.next();    assertTrue(p.hasName(StandardElementName.P));        assertTrue(h1 == p);        assertTrue(h1.hasName(StandardElementName.H1));  }  private void assertTrue(boolean expected) {    if (!expected) {      throw new AssertionError();    }  }}

This programs works fine until the last assertion:

assertTrue(h1.hasName(StandardElementName.H1));

This assertion fails because the previous one, h1 == p, evaluates to true.

Until the next issue of Objectos Weekly

So that's it for today. I hope you enjoyed reading.

The source code of all of the examples are in this GitHub repository.

Please send me an e-mail if you have comments, questions or corrections regarding this post.