For a number of reasons, I have replaced the backing ORM. Previously I was using peewee, but now I’m using PonyORM. The primary reason for this is purely ideological; I do not want to use software which is maintained by someone with a track record of toxic behavior. peewee’s maintainer responds to issues and feature requests with shouting and dismissive snark; PonyORM’s maintainer responds with helpfulness and grace. I am a strong proponent of the latter.
PonyORM’s API is also significantly more Pythonic, and rather than abusing operator overloads for clever query building purposes, it abuses Python’s AST functionality to parse actual Python expressions into SQL queries. Seriously, look at this explanation of it and tell me that isn’t just amazing.
There are a few downsides to Pony so far, though:
While it’s possible to adapt arbitrary types into database fields, queries don’t actually work on them (so at least for Enums I have to convert at query time, which turns out to not be a huge deal)
There’s no simple way to incrementally build a query with an OR branch in it (which I don’t actually use anywhere at present but I did have to rework some query API stuff to do that)
Not really a downside but Pony treats
NULLas equivalent, which has some fun implications for storing empty strings in a table
Of course, SQLite does this too, internally, and my existing code for that case wasn’t actually “correct” (but it happened to work with SQLite anyway). So moving to Pony meant I had to make this actually correct which, on the plus side, means that Publ is more likely to work with MySQL or Postgres (which I haven’t tested yet)
In addition to PonyORM I evaluated a few other options; my other front-runner was to simply store all of the data in in-memory tables and using
sorted([e for e in model.Entry where e.foo > bar]) or whatever. Which was a gigantic pain to think about. Granted, a lot of what made it painful is stuff I had to do in order to support Pony as well (namely the switch from a query-building syntax to incremental list comprehensions), but the Pony approach happens to also be way more efficient since it can use indexes and also does all the filtering at once and so on.
Anyway, I’m rambling here. How about we look at some quick benchmarks to see if this hurts performance! All these timings are based on building beesbuzz.biz, which is getting to be a reasonably-large site at this point. These timings are based on simply running it locally on my desktop.
For the index scan I ran a simple Python script that looks like:
which just sets up the configuration as appropriate and scans the index directly and exists. For the spidering I ran it under gunicorn with
gunicorn main:app and used the command:
To keep things as fair as I could I spidered the entire site once without checking the time (so that the image cache would be pre-populated, to eliminate its I/O overhead as a variable).
Initial index scan:
Time to spider entire website:
Memory usage after spidering: around 78.6MB according to macOS Activity Monitor
Initial index scan:
Website spider time:
Memory usage after spidering: 72.6MB
PonyORM takes a little less RAM and it has faster writes. Its queries are also marginally faster. But not enough to make a meaningful difference.
Anyway, I’m mostly just happy that this doesn’t significantly hurt performance. The fact that it improves the end product while supporting positive influences in the F/OSS community is a bonus!
Anyway, the deployed site is still running Publ v0.2.3, but the first Pony-based release will come soon as v0.3.0.