Changelog: Updated Markdown Storage and Output Previews for Bookmarks

In February, I moved my Bookmark/Link content to using Markdown (Commonmark) as the definitively stored content format. Today, I updated that implementation.

In February, I moved my Bookmark/Link content to using Markdown (Commonmark) as the definitively stored content format. Today, I updated that implementation.

My Bookmarks have four “rich” fields: Title, Quote, and Commentary. Title should only allow inline elements, while Quote and Commentary can contain any HTML. For all of these fields, I wanted to store Markdown and be able to retrieve HTML and plain text versions.

The first attempt

I wanted my plain text version to be somewhat pleasant to look at, kind of like raw markdown but with some modifications. I never found a presentation format that satisfied me, so I went with pandoc/pypandoc and a custom filter.

This turned out to be noticeably slow when parsing on the fly for multiple pieces of content on a single page, so I begrudgingly opted to parse the markdown when it’s saved and store each output format in the database.

Unfortunately, my custom filter still didn’t satisfy me. One of the things I wanted was to convert anchor elements from <a href="someurl">link text</a> to link text (someurl). I also wanted cite elements rendered wrapped with underscore characters, but sometimes this resulted in output like Started _East of Eden (https://en.wikipedia.org/wiki/East_of_Eden_(novel))_ instead of the preferred Started _East of Eden_ (https://en.wikipedia.org/wiki/East_of_Eden_(novel)).

Over the months, I got tired of these little errors to the point where it was affecting what I was writing.

My current attempt

I decided to largely give up on being particular about this output. Microformats2 has an algorithm for rendering embedded content to a value property:

value: the textContent of the element after:
  • dropping any nested <script> & <style> elements;
  • replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative;
  • removing all leading/trailing spaces

This is nowhere close to my original desire, but it’s at least predictable. I’m already using mf2py. Not part of it’s documented API, but dom_helpers has a get_textContent method that does conversion.

Since I was no longer using pandoc for the plain text, I replaced it with mistune for the markdown-to-HTML conversion, which I was also already using elsewhere.

These changes sped up the just-in-time parsing to the point where I don’t need to pre-compile the HTML and plain-text parsing anymore. I never liked having content that had the potential to get out of sync.

While I was there: updated plain-text-count custom web component

While I was working on that, I also added some features to the plain-text-count component I added in January. I didn’t like having to save and reload other pages to see the rendered output from my markup content.

The endpoint I had configured for the character count was already returning the rendered output content, so I updated the component to include the rendered output alongside character counts for each one.

I plan to give this iteration some experience, but my end goal is to move all of the post types to this kind of markdown authored/stored format with HTML and plain text output formats.