Publ v0.5.14 released!

Posted Tuesday, February 4 at 5:40 PM -0800 (4 years ago)

Today I released v0.5.14 of Publ, which has a bunch of improvements:

Fixed a bug in card retrieval when there’s no summary
Admin panel works again
Markdown entry headings now get individual permalinks (the presentation of which can be templated)
Markdown entry headings can be extracted into an outline to be used for a table of contents
Lots of performance improvements around ToC and footnote extraction, and template API functions in general

Entry headings

Because of the new entry headings, by default all headings will get an empty <a href> at the beginning; this is intended as a styling hook so that you can use stylesheet rules to add a visible anchor/link to them.

The format of headings is also templatizable, by passing a heading_template argument to the entry.body/more property-function things; the default value is "{link}</a>{text}" but you can do, for example, "{link}{text}</a>" if you want to put the entire heading into a link (although this means that having links within your headings becomes undefined), or "{text}" if you want the old behavior, or "{text}{link}#</a>" if you want the anchor marker to be part of the text data (rather than styled via CSS), or whatever. Part of why I opted for the current default is that it seemed to be maximally-useful while minimally-intrusive as far as changing existing layouts.

The "{link}" template fragment can also take some further configuration; if you just want to set its CSS class name, you can do that with the heading_link_class configuration, and you can also set any other arbitrary HTML attributes by passing a dict to heading_link_config. For example, if you want them to all have a title="permalink" you can pass the value {"title":"permalink"} for that configuration value.

Currently there is no way to have those vary across the links, however; a more robust configuration mechanism (that can perhaps take in functions or format strings or the like) is certainly possible but it felt out-of-scope for this feature.

Tables of Contents

I was originally just building the heading permalink generation as a standalone feature, but then I realized that while I was doing that I’d might as well provide tables of contents, too. My original plan for ToCs was to make use of Misaka’s built-in ToC formatter, but getting that to work alongside Publ seemed pretty challenging, especially since Publ has the multiple document section stuff that Misaka wasn’t intended to work with. (Incidentally, I went through similar things with the built-in footnotes stuff.)

A ToC is accessible simply by looking at entry.toc. While it takes all of the usual HTML formatting arguments, none of them really have any effect (aside from disabling smartquotes and enabling XHTML). It does add its own argument, max_depth, which chooses how many levels of the ToC to show. This is relative to the highest level in the entry, so you can continue to use whatever top heading level you prefer.

Performance improvements

So, footnotes had a bit of a performance impact in that rendering those out also requires rendering out the entire entry multiple times, which can add up a lot. This came down to part of how Publ allows you to use template functions as if they are properties, using a too-clever wrapper called CallableProxy. Put briefly, this is an object that wraps a function in such a way that if you use the function directly in a Jinja context, it gets treated as if you’re using the default return value instead. Unfortunately, there are various things at various layers of Flask that make it end up calling the function multiple times, which can be really slow — especially if the function, say, runs Markdown formatting on the entire entry.

A long time ago I had CallableProxy set up such that it would cache the return value of the default call, but this had other implications when I started supporting HTTPS and user login and so on. Depending on how objects got pooled and cached this could cause the wrong content to be displayed to the end user — definitely not a good thing! At the very least this would often cause bugs where outgoing links would flip-flop between being http:// or https:// or using the wrong hostname or the like, and in the worst case this could theoretically cause page content to display for someone who wasn’t authorized to see it, or vice-versa (although I don’t believe there’s an actual way of causing this). But in any case, it was bad.

So, what I ended up doing was instead of naïvely caching the default function return, I wrapped it behind @functools.lru_cache and made the various aspects that make these functions non-idempotent part of the cache key; for now this is just the request URL and the current user.

This cut down on a lot of chaff, but there was still a long ways to go!

Both tables of contents and footnotes have global numberings, meaning rendering entry.more gets affected by how many of those things live in entry.body. There were some pretty wonky ways I was trying to keep track of that stuff, but generally-speaking this meant that rendering entry.more also required rendering (and discarding) entry.body. This could also end up rendering a whole bunch of extra images, too, if the image configuration between body and more don’t match. There were also some attempts at caching the various fragments' buffers, but this got unwieldy and unpleasant.

So, what does it do instead now?

First, there’s a faster counting-only Markdown processor that will only count the number of footnotes and headings, which lets us do some cute optimizations especially on entry.more.

Next, any time a count of footnotes or headings comes about naturally based on some other bit of processing, that information gets cached for later.

This has cut down on the amount of rendering that has to take place. There’s still some redundancies (for example, it still has to render out all of the entry content even when it’s only trying to extract the footnotes or table of contents) but this at least cuts down on the amount of stuff that has to happen.

Combined with the CallableProxy optimization this means it’s also doing way less work when you’re simply checking to see if an entry has a TOC or footnotes, such as when doing:

{% if entry.toc %}
<nav id="toc"><h2>Table of Contents</h2>{{entry.toc}}</nav>
{% endif %}

This still unfortunately will be doing extraneous rendering — including image renditions — as will the similar entry.footnotes code, but still, a 3x improvement is a lot better, even if it’s not ideal.

An obvious next step would be to make a headings-only renderer for the table of contents. This won’t help with footnotes, unfortunately, and there’s no reasonable way to prevent footnotes from rendering images that are outside of footnotes (and it still needs to be able to render images for ones that are in footnotes), but still, a partial improvement is still better than no improvement.

So why is this still v0.5.x?

At some point I decided that my versioning scheme would be based on “milestones,” and for v0.6 my milestone is having automated testing and unit coverage set up. This seems to be kind of foolish; what I’m doing isn’t quite semantic versioning, which says that versioning should be based on major.minor.patch where major is for backwards-incompatible changes, minor is for added functionality, and patch is for bug fixes. But if I’d been releasing Publ with that scheme, we’d be on, like, v2.193.0 by now or something.

Fortunately, semver does have a provision for in-development software:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

so I think I’m still following in the spirit with semantic versioning for now. If I ever decide to release 1.0 I’ll have to re-evaluate my version numbering though!

Errata

There is an issue with watchdog v0.10.0 which causes issues with Pipenv or other lockfile-based deployment mechanisms. If you are developing on macOS and deploying on Linux, I highly recommend pinning the version with:

pipenv install watchdog==0.9.0
pipenv clean

See the FAQ for more information.