Changelog: Updated Notes

I made some changes to my Notes: - Better HTML and plain text output support - Only enforce plain text character count and only when publishing - See the plain text character count live in the Admin UI

I made a few updates to my Notes functionality on my website:

  • Better HTML and plain text output support
  • Only enforce plain text character count and only when publishing
  • See the plain text character count live in the Admin UI

HTML uses more characters, so allow more characters

A while back I mused on storing all content on this site in HTML. In further thinking, I’ve decided to try storing as Markdown. HTML is valid in Markdown, and I think it would be good to store the content in the same format I intend to edit it.

Notes were already using Markdown for content, but, in addition to HTML, I wanted a plain text output option also. While Markdown should retain HTML elements, this output should convert or strip HTML.

To better handle HTML in a Note’s content, I changed the field from CharField to TextField

- content = models.CharField(max_length=560, help_text="Markdown supported.")
+ plain_text_limit = 560
+ content = models.TextField(help_text="Markdown")

But still enforce a character limit (but only for the plain text output (and only sometimes))

I don’t remember why I had a length limit of 560 characters, but I think it was to allow some characters for markup even though it would be stripped from syndication targets like Mastodon. I still wanted a character limit, but only on the plain text output and only when published.

I added a couple of utility methods to my generic feed.models.py using pypandoc and BeautifulSoup:

import pypandoc
from bs4 import BeautifulSoup

def convert_commonmark_to_plain_text(input):
    html = pypandoc.convert_text(input, 'html', format='commonmark')

    soup = BeautifulSoup(html, 'html.parser')

    for item in soup.find_all("a"):
        item.string = "%s (%s)" % (item.string, item['href'])
        item.unwrap()

    for item in soup.find_all(["cite","i","em"]):
        item.insert_after("_")
        item.insert_before("_")
        item.unwrap()

    for item in soup.find_all(["b", "strong"]):
        item.insert_after("*")
        item.insert_before("*")
        item.unwrap()

    output = pypandoc.convert_text(str(soup), 'plain', format='html')

    return output

def convert_commonmark_to_html(input):
    return pypandoc.convert_text(input, 'html', format='commonmark+autolink_bare_uris')

I was hoping to do some content preprocessing and then convert to markdown to catch anything else for the plain text, but some things ended up getting escaped on the markdown side and then other things I didn’t like the look of, so using Beautiful Soup to convert specific HTML tags to text and the plain-texting the result is my strategy. I consider it a work in progress.

I added/updated these methods on my Notes model to make using the converted content easy.

def content_plain(self):
        return convert_commonmark_to_plain_text(self.content)
    
    def content_plain_count(self):
        return len(self.content_plain())

    def content_html(self):
        return convert_commonmark_to_html(self.content)

And then to bring it all together, I validate that the content_plain_count is less than plain_text_limit during model validation, but only if the Note is published. If it’s a draft, I want to take advantage of the text field and allow overflow.

def clean(self):
    super().clean()
    self.validate_publishable()

def validate_publishable(self):
    if not self.published:
        return
    
    super().validate_publishable()

    if self.content_plain_count() > self.plain_text_limit:
        raise ValidationError("Plain text count of %s must be less than the limit of %s to publish." % (self.content_plain_count(), self.plain_text_limit))

Apparently, the model clean() method is the place to validate interdependent fields. In this case, the too-long content is only invalid if the Note is published. This gets called automatically when saving through the Admin, but apparently not if model.save() is called directly.

I have lots of currently-required fields on other content types that I would like to update to only being required if published. I’ll see how this approach works for the Notes.

And also show the character count in the Admin UI

This was perhaps the trickiest piece. I wanted to be able to get a live plain-text character count for my Markdown textarea. I figured if I could wrap the textarea with a custom element, I could take over the functionality with JavaScript, but I also wanted to count the same plain text output that the server would generate.

First I created a new View that would allow JavaScript to get a character count from the server. I wanted to use the same code on the client that I would be using to save on the server.

@method_decorator([staff_member_required, csrf_exempt], name='dispatch')
class CommonmarkConversion(View):
    def get(self, request, *args, **kwargs):
        id = request.GET.get("id")

        if id is None:
            return HttpResponse("id parameter is required", status=400)
        
        conversion = request.session.get(id)

        if conversion is None:
            return HttpResponse("id not found", status=404)
        
        return JsonResponse(conversion)

    def post(self, request, *args, **kwargs):
        body = json.loads(request.body)

        input = body.get("input")

        if input is None:
            return HttpResponse("input property is required", status=400)
        
        id = str(uuid.uuid4())

        request.session[id] = {
            "input": input,
            "html": convert_commonmark_to_html(input),
            "plain": convert_commonmark_to_plain_text(input)
        }

        return JsonResponse({
            "success": True,
            "id": id
        })

The method decorators require a logged in user with access to the admin to make the request, but not a csrf token. Because the text coming in could be lengthy, I didn’t want to deal with URL length limits, which meant I had to send the data as the body of a POST request. I found this nice little pattern online where the request isn’t “convert this text for me,” but “create a conversion of this text” (POST) and then “get that conversion“ (GET). So a POST request sends the text and the server responds with an id. Then a GET request sends the id and gets the converted text. The server stores the id and converted text in the session.

I’m using a UUID for the id, but also considered a MD5 hash of the input so that the same input would result in the same id. It also struck me that perhaps the conversion work should be on the GET side of the flow so that it only happens if requested.

Next I created a custom widget that extends the built in Textarea widget.

from django.forms.widgets import Textarea

class PlainTextCountTextarea(Textarea):
    template_name = "widgets/plaintextcounttextarea.html"

    def __init__(self, attrs = None, max=""):
        super().__init__(attrs)
        self.max=max

    def get_context(self, name, value, attrs):
        context = super().get_context(name,value,attrs)
        context.update({ "max": self.max })
        return context

    class Media:
        js = ["base/js/PlainTextCountTextarea.js"]

And a template that wraps the built-in template with a custom element

<plain-text-count conversion-endpoint="{% url 'feed:commonmarkconversion' %}" max="{{ max }}">
    {% include "django/forms/widgets/textarea.html" %}
</plain-text-count>

I pass the endpoint as an attribute to the custom element, and the widget also takes a `max` that gets added as an attribute.

That gets plumbed up in the NotesAdmin

content = CharField(widget=PlainTextCountTextarea(max=Note.plain_text_limit), help_text="Markdown supported.")

And then finally PlainTextCountTextarea.js creates the custom component. The key method here is updateCount:

updateCount = this.debounce(() => {
    let abortReason = "newer request";
    while (this.abortControllers.length > 0) {
        this.abortControllers.shift().abort(abortReason);
    }

    let abortController = new AbortController();
    this.abortControllers.push(abortController);

    fetch(this.conversionEndpoint, {
        method: "POST",
        body: JSON.stringify({ "input": this.textarea.value }),
        signal: abortController.signal
    }).then(response => {
        if (!response.ok) {
            throw new Error(`POST Response status: ${response.status}`);
        }

        content = response.json()
        return content
    }).then(content => {
        return fetch(`${this.conversionEndpoint}?id=${content['id']}`, {
            signal: abortController.signal
        });
    }).then(response => {
        if (!response.ok) {
            throw new Error(`GET Response status: ${response.status}`);
        }

        return response.json()
    }).then(content => {
        this.count = content['plain'].length;
        this.countContainer.innerText = this.count;
    }).catch(err => {
        // ignore our own aborted requests
        if (err.name === "AbortError" & err.abortReason === abortReason) {
            return;
        }
        if (err === abortReason) {
            return;
        }

        console.warn({err})
        this.error.innerText = err.message
    });
}, 300);

This method gets called on textarea keyup and on textarea change, so I wrapped it in a debounce method. Even with that, I didn’t want weird async stuff to bite me, so I pass an AbortSignal to each fetch request. Before starting a new update, I abort any existing fetches. It POSTs the textarea value to the conversion endpoint, gets the Id in the response and GETs the conversions. Then it counts the length of the plain conversion and updates the value below the textarea. The catch swallows any errors from aborted fetches. I had to check for a few things, because they kept coming through differently. Otherwise error messages are also output to the UI.

The js property on the Media class on the widget causes the JavaScript file to automatically load on pages that use this widget.

Here’s how the custom widget looks:

A screenshot of Django’s admin. A textarea labeled “Content” has some text in it and below the textarea is “Plain text count: 162 / 560”.