Development should follow a problem-solution approach.
The plan is to continue evolving the library to support as many “feed reader application” use cases as possible, while still following the The reader philosophy. Even if a specific feature is not a good fit for the library itself, it should be possible to find a more generic solution that makes it possible to build the feature on top.
Following is an unsorted, non-exhausive list of known areas for improvement. I am working on reader based on my current interests, in my spare time, but I will prioritize supporting contributors (discussions, reviews and so on).
OPML support, #165
more feed interaction statistics, #254
searchable tag values, e.g. for comments
filter entries by entry tags
optimistic locking, #308
filter tags by prefix, #309
HTTP compliance, likely as plugins
add more fields to data objects
Internal API stabilization
arbitrary website scraping, #222
feed categories, likely as a plugin
Open issues and Design notes.
The Command-line interface is more or less stable,[*] although both the output and config loading need more polish and additional tests.
A full-blown terminal feed reader is not in scope, since I don’t need one, but I’m not opposed to the idea.
The Web application is “unsupported”, in that it’s not all that polished, and I don’t have time to do major improvments. But, I am using it daily, and it will keep working until a better one exists.
Long term, I’d like to:
re-design it from scratch to improve usability
switch to htmx instead of using a home-grown solution
spin it off into a separate package/project
reader uses semantic versioning.
Breaking compatibility is done by incrementing the major version, announcing it in the Changelog, and raising deprecation warnings for at least one minor version before the new major version is released (if possible).
There may be minor exceptions to this, e.g. bug fixes and gross violation of specifications; they will be announced in the Changelog with a This is a minor compatibility break warning.
Schema migrations for the default storage must happen automatically. Migrations can be removed in new major versions, with at least 3 months provided since the last migration.
What is the public API
reader follows the PEP 8 definition of public interface.
The following are part of the public API:
Every interface documented in the API reference.
Any (documented) module, function, object, method, and attribute, defined in the reader package, that is accessible without passing through a name that starts with underscore.
The number and position of positional arguments.
The names of keyword arguments.
Argument types (argument types cannot become more strict).
Attribute types (attribute types cannot become less strict).
Undocumented type aliases (even if not private) are not part of the public API.
Other exceptions are possible; they will be marked aggresively as such.
The Twisted Compatibility Policy, which served as inspiration for this.
Supported Python versions
The oldest Python version reader should support is:
the newest CPython available on the latest Ubuntu LTS (3 months after LTS release)
at least 1 stable PyPy version
This usually ends up being the last 3 stable CPython versions.
Dropping support for a Python version should be announced at least 1 release prior.
For convenience, reader only releases major and minor versions (bugfixes go in minor versions). Changes go only to the next release (no backports).
Making a release
scripts/release.py already does most of these.
Making a release (from
x + 1):
(release.py) bump version in
(release.py) update changelog with release version and date
(release.py) make sure tests pass / docs build
(release.py) clean up dist/:
rm -rf dist/
(release.py) build tarball and wheel:
python -m build
(release.py) push to GitHub
(release.py prompts) wait for GitHub Actions / Codecov / Read the Docs builds to pass
upload to test PyPI and check:
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
(release.py) upload to PyPI:
twine upload dist/*
(release.py) tag current commit with <major>.<minor> and <major>.x (e.g. when releasing 1.20: 1.20 and 1.x)
(release.py prompts) create release in GitHub
build docs from latest and enable
ydocs version (should happen automatically after the first time)
(release.py) bump versions from
(y + 1).dev0, add
(y + 1)changelog section
(release.py prompts) trigger Read the Docs build for <major>.x (doesn’t happen automatically)
Folowing are various design notes that aren’t captured somewhere else (either in the code, or in the issue where a feature was initially developed).
Why use SQLite and not SQLAlchemy?
tl;dr: For “historical reasons”.
In the beginning:
I wanted to keep things as simple as possible, so I don’t get demotivated and stop working on it. I also wanted to try out a “problem-solution” approach.
I think by that time I was already a great SQLite fan, and knew that because of the relatively single-user nature of the thing I won’t have to change databases because of concurrency issues.
The fact that I didn’t know exactly where and how I would deploy the web app (and that SQLite is in stdlib) kinda cemented that assumption.
Since then, I did come up with some of my own complexity: there’s a SQL query builder, a schema migration system, and there were some concurrency issues. SQLAlchemy would have likely helped with the first two, but not with the last one (not without dropping SQLite).
Note that it is possible to use a different storage implementation; all storage stuff happens through a DAO-style interface, and SQLAlchemy was the main real alternative I had in mind. The API is private at the moment (1.10), but if anyone wants to use it I can make it public.
It is unlikely I’ll write a SQLAlchemy storage myself, since I don’t need it (yet), and I think testing it with multiple databases would take quite some time.
Multiple storage implementations
Detailed requirements and API discussion: #168#issuecomment-642002049.
Minimal work needed to support alternate storages: #168#issuecomment-1383127564.
file:// handling, feed root, per-URL-prefix parsers (later retrievers, see below):
detailed requirements: #155#issuecomment-672324186
method for URL validation: #155#issuecomment-673694472, #155#issuecomment-946591071
Requests session plugins:
why the Session wrapper exists: #155#issuecomment-668716387 and #155#issuecomment-669164351
Retriever / parser split:
Alternative feed parsers:
the logical pipeline of parsing a feed: #264#issuecomment-973190028
comparison between feedparser and Atoma: #264#issuecomment-981678120, #263
Some thoughts on implementing metrics: #68#issuecomment-450025175.
Survey of possible options: #123#issuecomment-582307504.
In 2021, I’ve written an entire series about it: https://death.andgravity.com/query-builder
Pagination for methods that return iterators
Why do it for the private implementation: #167#issuecomment-626753299 (also a comment in storage code).
Detailed requirements and API discussion for public pagination: #196#issuecomment-706038363.
From the initial issue:
detailed requirements and API discussion: #122#issuecomment-591302580
discussion of possible backend-independent search queries: #122#issuecomment-508938311
Enabling search by default, and alternative search APIs: #252.
reader types to Atom mapping
This whole issue: #153.
Sort by random
Some thoughts in the initial issue: #105.
Entry/feed “primary key” attribute naming
This whole issue: #159#issuecomment-612914956.
Change feed URL
From the initial issue:
use cases: #149#issuecomment-700066794
initial requirements: #149#issuecomment-700532183
Discussion about API/typing, and things we didn’t do: #239.
Some thoughts about adding a
map argument: #152#issuecomment-606636200.
update_feeds() is like a pipeline: comment.
Data flow diagram for the update process, as of v1.13: #204#issuecomment-779709824.
use case: #204#issuecomment-779893386 and #204#issuecomment-780541740
return type: #204#issuecomment-780553373
Updating entries based on a hash of their content (regardless of
stable hasing of Python data objects: #179#issuecomment-796868555, the
reader._hash_utilsmodule, death and gravity article
ideas for how to deal with spurious hash changes: #225
Decision to ignore feed.updated when updating feeds: #231.
Requirements, open questions, and how it interacts with
A summary of why it isn’t easy to do: #301#issuecomment-1442423151.
Detailed requirements and API discussion: #185#issuecomment-731743327.
Using None as a special argument value
This comment: #177#issuecomment-674786498.
Some initial thoughts on batch get methods (including API/typing) in #191 (closed with wontfix, for now).
Why I want to postpone batch update/set methods: #187#issuecomment-700740251.
tl:dr: Performance is likely a non-issue with SQLite, convenience can be added on top as a plugin.
See the 2.12 reader._app.ResourceTags class for an idea of how to represent a bunch of tags in a reserved-name-scheme-agnostic way (useful e.g. for when get_entries() should return tags x, y, z of each entry).
Using a single Reader objects from multiple threads
Some thoughts on why it’s difficult to do: #206#issuecomment-751383418.
List of potential hooks (from mid-2018): #80.
Minimal plugin API (from 2021) – case study and built-in plugin naming scheme: #229#issuecomment-803870781.
We’ll add / document new (public) hooks as needed.
Requirements, thoughts about the naming scheme and prefixes unlikely to collide with user names: #186 (multiple comments).
Wrapping underlying storage exceptions
Which exception to wrap, and which not: #21#issuecomment-365442439.
Aware vs. naive, and what’s needed to go fully aware: #233#issuecomment-881618002.
Thoughts on dynamic lists of feeds: #165#issuecomment-905893420.
Using MinHash to speed up similarity checks (maybe): https://gist.github.com/lemon24/b9af5ade919713406bda9603847d32e5
Some early thoughts: #192#issuecomment-700773138 (closed with wontfix, for now).