Internal API

This part of the documentation covers the internal interfaces of reader, which are useful for plugins, or if you want to use low-level functionality without using Reader itself.

Warning

As of version 3.24, the internal API is not part of the public API; it is not stable yet and might change without any notice.

Parser

Reader._parser

The Parser instance used by this reader.

reader._parser.default_parser(feed_root=None, session_timeout=(3.05, 60), _lazy=True)

Create a pre-configured Parser.

Parameters:
Returns:

The parser.

Return type:

Parser

class reader._parser.Parser

Retrieve and parse feeds by delegating to retrievers and parsers.

To retrieve and parse a single feed, you can call the parser object directly.

Reader only uses the following methods:

To add retrievers and parsers:

The rest of the methods are low-level methods.

session_factory

SessionFactory used to create Requests sessions for retrieving feeds.

Plugins may add request or response hooks to this.

parallel(feeds, map=<class 'map'>)

Retrieve and parse many feeds, possibly in parallel.

Yields the parsed feeds, as soon as they are ready.

Parameters:
  • feeds (iterable(FeedArgument)) – An iterable of feeds.

  • map (function) – A map()-like function; the results can be in any order.

Yields:

ParseResult – The result of retrieving and parsing a feed; the feed is the object passed in feeds.

__call__(url, caching_info=None)

Retrieve and parse one feed.

This is a convenience wrapper over parallel().

Parameters:
  • feed (str) – The feed URL.

  • caching_info (JSONType or None) – caching_info from the last update.

Returns:

The parsed feed or None, if the feed didn’t change.

Return type:

ParsedFeed or None

Raises:

ParseError

retrieve_fn(feed)

retrieve() wrapper used by parallel().

Takes one argument and does not raise exceptions.

retrieve(url, caching_info=None)

Retrieve a feed.

Parameters:
  • url (str) – The feed URL.

  • caching_info (JSONType or None) – caching_info from the last update.

Returns:

A context manager with the retrieved feed as target.

Return type:

contextmanager(RetrieveResult or None)

Raises:

ParseError

parse_fn(result)

parse() wrapper used by parallel().

Takes one argument and does not raise exceptions.

parse(url, retrieved)

Parse a retrieved feed.

Parameters:
Returns:

The feed and entry data.

Return type:

ParsedFeed

Raises:

ParseError

get_parser(url, mime_type)

Select an appropriate parser for a feed.

Parsers registered by URL take precedence over those registered by MIME type.

If no MIME type is given, guess it from the URL using mimetypes.guess_type(). If the MIME type can’t be guessed, default to application/octet-stream.

Parameters:
  • url (str) – The feed URL.

  • mime_type (str or None) – The MIME type of the retrieved resource.

Returns:

The parser, and the (possibly guessed) MIME type.

Return type:

tuple(ParserType, str)

Raises:

ParseError – No parser matches.

validate_url(url)

Check if url is valid without actually retrieving it.

Raises:

InvalidFeedURLError – If url is not valid.

mount_retriever(prefix, retriever)

Register a retriever to a URL prefix.

Retrievers are sorted in descending order by prefix length.

Parameters:
get_retriever(url)

Get the retriever for a URL.

Parameters:

url (str) – The URL.

Returns:

The matching retriever.

Return type:

RetrieverType

Raises:

ParseError – No retriever matches the URL.

mount_parser_by_mime_type(parser, accept=None)

Register a parser to one or more MIME types.

Parameters:
  • parser (ParserType) – The parser.

  • accept (str or None) – The content types the parser supports, as an HTTP Accept header. If not given, use the parser’s accept attribute, if it has one.

Raises:

TypeError – The parser does not have an accept attribute, and no accept was given.

get_parser_by_mime_type(mime_type)

Get a parser for a MIME type.

Parameters:

mime_type (str) – The MIME type of the feed resource.

Returns:

The parser.

Return type:

ParserType

Raises:

ParseError – No parser matches the MIME type.

mount_parser_by_url(url, parser)

Register a parser to an exact URL.

Parameters:
get_parser_by_url(url)

Get a parser that was registered by URL.

Parameters:

url (str) – The URL.

Returns:

The parser.

Return type:

ParserType

Raises:

ParseError – No parser was registered for the URL.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever.

Delegates to process_feed_for_update() of the appropriate retriever.

Parameters:

feed (FeedForUpdate) – Feed information.

Returns:

The passed-in feed information, possibly modified.

Return type:

FeedForUpdate

process_entry_pairs(url, mime_type, pairs)

Process entry data before being stored.

Delegates to process_entry_pairs() of the appropriate parser.

Parameters:
  • url (str) – The feed URL.

  • mime_type (str or None) – The MIME type of the feed.

  • pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.

Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

class reader._parser.requests.SessionFactory(...)

Manage the lifetime of a session.

To get new session, call the factory directly.

request_hooks: Sequence[RequestHook]

Sequence of RequestHooks to be associated with new sessions.

response_hooks: Sequence[ResponseHook]

Sequence of ResponseHooks to be associated with new sessions.

__call__()

Create a new session.

Return type:

SessionWrapper

transient()

Return the current persistent() session, or a new one.

If a new session was created, it is closed once the context manager is exited.

Return type:

contextmanager(SessionWrapper)

persistent()

Register a persistent session with this factory.

While the context manager returned by this method is entered, all persistent() and transient() calls will return the same session. The session is closed once the outermost persistent() context manager is exited.

Plugins should use transient().

Reentrant, but NOT threadsafe.

Return type:

contextmanager(SessionWrapper)

class reader._parser.requests.SessionWrapper(...)

Minimal wrapper over a requests.Session.

Only provides a limited get() method.

Can be used as a context manager (closes the session on exit).

session: requests.Session

The underlying requests.Session.

request_hooks: Sequence[RequestHook]

Sequence of RequestHooks.

response_hooks: Sequence[ResponseHook]

Sequence of ResponseHooks.

get(url, headers=None, **kwargs)

Like Requests get(), but apply request_hooks and response_hooks.

Parameters:
Keyword Arguments:

**kwargs – Passed to send().

Return type:

requests.Response

caching_get(url, caching_info=None, headers=None, **kwargs)

Like get(), but set and return caching headers.

caching_get(url, old_caching_info) -> response, new_caching_info

Protocols

class reader._parser.FeedArgument(*args, **kwargs)

Any FeedForUpdate-like object.

property url: str

The feed URL.

property caching_info: TypeAliasForwardRef('reader.types.JSONType') | None

caching_info from the last update.

class reader._parser.RetrieverType(*args, **kwargs)

A callable that knows how to retrieve a feed.

__call__(url, caching_info, accept)

Retrieve a feed.

Parameters:
  • feed (str) – The feed URL.

  • caching_info (JSONType or None) – caching_info from the last update.

  • accept (str or None) – Content types to be retrieved, as an HTTP Accept header.

Returns:

A context manager that has as target either a RetrievedFeed wrapping the retrieved resource, or the bare resource.

Return type:

contextmanager(RetrievedFeed or None)

Raises:
validate_url(url)

Check if url is valid for this retriever.

Raises:

InvalidFeedURLError – If url is not valid.

exception reader._parser.RetrieveError(url, /, message='', http_info=None)

Bases: ParseError

An error occurred while retrieving the feed.

Can be used by retrievers to pass additional information to the parser.

exception reader._parser.NotModified(url, /, message='', http_info=None)

Bases: RetrieveError

Raised by retrievers to tell the parser that the feed was not modified.

class reader._parser.FeedForUpdateRetrieverType(*args, **kwargs)

Bases: RetrieverType[T_co], Protocol

A RetrieverType that can change update-relevant information.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever (RetrieverType.__call__()).

Parameters:

feed (FeedForUpdate) – Feed information.

Returns:

The passed-in feed information, possibly modified.

Return type:

FeedForUpdate

class reader._parser.ParserType(*args, **kwargs)

A callable that knows how to parse a retrieved feed.

__call__(url, resource, headers)

Parse a feed.

Parameters:
  • resource (T_cv) – The feed resource. Usually, a readable binary file.

  • headers (dict(str, str) or None) – The HTTP response headers associated with the resource.

Returns:

The feed and entry data.

Return type:

tuple(FeedData, collection(EntryData))

Raises:

ParseError

class reader._parser.AcceptParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that knows what content types it can handle.

property accept: str

The content types this parser supports, as an Accept HTTP header value.

class reader._parser.EntryPairsParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that can modify entry data before being stored.

process_entry_pairs(url, pairs)

Process entry data before being stored.

Parameters:
Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

class reader._parser.requests.RequestHook(*args, **kwargs)

Hook to modify a Request before it is sent.

__call__(session, request, **kwargs)

Modify a request before it is sent.

Parameters:
Keyword Arguments:

**kwargs – Will be passed to send().

Returns:

A (possibly modified) request to be sent. If none, send the initial request.

Return type:

requests.Request or None

class reader._parser.requests.ResponseHook(*args, **kwargs)

Hook to repeat a request depending on the Response.

__call__(session, response, request, **kwargs)

Repeat a request depending on the response.

Parameters:
Keyword Arguments:

**kwargs – Were passed to send().

Returns:

A (possibly new) request to be sent, or None, to return the current response.

Return type:

requests.Request or None

Data objects

class reader._parser.RetrieveResult(feed, value)

The result of retrieving a feed, regardless of the outcome.

feed: F

The feed (a FeedArgument, usually a FeedForUpdate).

value: ContextManager[RetrievedFeed[T], bool | None] | E

One of:

  • a context manager with the RetrievedFeed as target

  • an exception

class reader._parser.RetrievedFeed(resource, mime_type=None, caching_info=None, http_info=None, slow_to_read=False)

A (successfully) retrieved feed, plus metadata.

resource: T

The retrieved resource. Usually, a readable binary file. Passed to the parser.

mime_type: str | None = None

The MIME type of the resource. Used to select an appropriate parser.

caching_info: JSONType | None = None

Caching info passed back to the retriever on the next update. Usually, the ETag and Last-Modified headers.

http_info: HTTPInfo | None = None

Details about the HTTP response.

slow_to_read: bool = False

Allow Parser to read() the resource into a temporary file, and pass that to the parser (as an optimization). Implies the resource is a readable binary file.

reader._parser.ParseResult

alias of ParseResultBase[FeedForUpdate, FeedData, EntryData, ParseError]

reader._parser.ParsedFeed

alias of ParsedFeedBase[FeedData, EntryData]

class reader._parser.HTTPInfo(status, headers)

Details about an HTTP response.

status: int

The HTTP status code.

headers: Mapping[str, str]

The HTTP response headers.

get_update_after(now)

Select the best “update after” date from available headers.

parse_date(name, now=None)

Parse an HTTP date header and return a timezone-aware datetime.

Return None if missing or if parsing fails.

If now is given and the Date header is set, make the returned value relative to now.

property cache_control: RequestCacheControl | None

Parsed Cache-Control header, or None if missing.

Storage

reader storage is abstracted by two DAO protocols: StorageType, which provides the main storage, and SearchType, which provides search-related operations.

Currently, there’s only one supported implementation, based on SQLite.

That said, it is possible to use an alternate implementation by passing a StorageType instance via the _storage make_reader() argument:

reader = make_reader('unused', _storage=MyStorage(...))

The protocols are mostly stable, but some backwards-incompatible changes are expected in the future (known ones are marked below with Unstable). The long term goal is for the storage API to become stable, but at least one other implementation needs to exists before that. (Working on one? Let me know!)

Unstable

Currently, search is tightly-bound to a storage implementation (see make_search()). While the change tracking API allows search implementations to keep in sync with text content changes, there is no convenient way for SearchType.search_entries() to filter/sort results without storage cooperation; StorageType will need additional capabilities to support this.

Reader._storage

The StorageType instance used by this reader.

The SearchType instance used by this reader.

class reader._types.StorageType

Storage DAO protocol.

For methods with Reader correspondents, see the Reader docstrings for detailed semantics.

Any method can raise StorageError.

The behaviors described in Lifecycle and Threading are implemented at the storage level; specifically:

  • The storage can be used directly, without __enter__()ing it. There is no guarantee close() will be called at the end.

  • The storage can be reused after __exit__() / close().

  • The storage can be used from multiple threads, either directly, or as a context manager. Closing the storage in one thread should not close it in another thread.

Schema migrations are transparent to Reader. The current storage implementation does them at initialization, but others may require them to happen out-of-band with user intervention.

All datetime attributes of all parameters and return values are timezone-aware, with the timezone set to utc.

Unstable

In the future, implementations will be required to accept datetimes with any timezone.

Methods, grouped by topic:

object lifecycle

__enter__() __exit__() close()

feeds

add_feed() delete_feed() change_feed_url() get_feeds() get_feed_counts() set_feed_user_title() set_feed_updates_enabled()

entries

add_entry() delete_entries() get_entries() get_entry_counts() set_entry_read() set_entry_important()

tags

get_tags() set_tag() delete_tag()

update

get_feeds_for_update() update_feed() set_feed_stale() get_entries_for_update() add_or_update_entries() get_entry_recent_sort() set_entry_recent_sort()

__enter__()

Called when Reader is used as a context manager.

__exit__(*_)

Called when Reader is used as a context manager.

close()

Called by Reader.close().

add_feed(url, /, added)

Called by Reader.add_feed().

Parameters:
Raises:

FeedExistsError

delete_feed(url, /)

Called by Reader.delete_feed().

Parameters:

url (str)

Raises:

FeedNotFoundError

change_feed_url(old, new, /)

Called by Reader.change_feed_url().

Parameters:
Raises:

FeedNotFoundError

get_feeds(filter, sort, limit, starting_after)

Called by Reader.get_feeds().

For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the feeds.

Parameters:
Returns:

A lazy iterable.

Raises:

FeedNotFoundError – If starting_after does not exist.

Return type:

Iterable[Feed]

get_feed_counts(filter)

Called by Reader.get_feed_counts().

Parameters:

filter (FeedFilter)

Returns:

The counts.

Return type:

FeedCounts

set_feed_user_title(url, title, /)

Called by Reader.set_feed_user_title().

Parameters:
  • url (str)

  • title (str | None)

Raises:

FeedNotFoundError

set_feed_updates_enabled(url, enabled, /)

Called by Reader.enable_feed_updates() and Reader.disable_feed_updates().

Parameters:
Raises:

FeedNotFoundError

add_entry(intent, /)

Called by Reader.add_entry().

Parameters:

intent (EntryUpdateIntent)

Raises:
delete_entries(entries, /, *, added_by)

Called by Reader.delete_entry().

Also called by plugins like entry_dedupe.

Parameters:
Raises:
get_entries(filter, sort, limit, starting_after)

Called by Reader.get_entries().

For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the entries.

Additionally, implementations may choose to not implement tag filters more complicated than flat OR ([['one', 'two', ...]]) or flat AND ([['one'], ['two'], ...]), and raise StorageError instead.

Parameters:
Returns:

A lazy iterable.

Raises:

EntryNotFoundError – If starting_after does not exist.

Return type:

Iterable[Entry]

get_entry_counts(now, filter)

Called by Reader.get_entry_counts().

Unstable

In order to expose better feed interaction statistics, this method will need to return more granular data.

Unstable

In order to support search_entry_counts() of search implementations that are not bound to a storage, this method will need to take an entries argument.

Parameters:
Returns:

The counts.

Return type:

EntryCounts

set_entry_read(entry, read, modified, /)

Called by Reader.set_entry_read().

Parameters:
Raises:

EntryNotFoundError

set_entry_important(entry, important, modified, /)

Called by Reader.set_entry_important().

Parameters:
Raises:

EntryNotFoundError

get_tags(resource_id, key=None, /)

Called by Reader.get_tags().

Also called by Reader.get_tag_keys().

Unstable

A dedicated get_tag_keys() method will be added in the future.

Unstable

Both this method and get_tag_keys() will allow filtering by prefix (include/exclude), case sensitive and insensitive; implementations should allow for this.

Parameters:
Returns:

A lazy iterable.

Return type:

Iterable[tuple[str, TypeAliasForwardRef(‘reader.types.JSONType’)]]

set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, /) None
set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, value: JSONType, /) None

Called by Reader.set_tag().

Parameters:
Raises:

ResourceNotFoundError

delete_tag(resource_id, key, /)

Called by Reader.delete_tag().

Parameters:
Raises:

TagNotFoundError

get_feeds_for_update(filter)

Called by update logic.

Parameters:

filter (FeedFilter)

Returns:

A lazy iterable.

Return type:

Iterable[FeedForUpdate]

update_feed(intent, /)

Called by update logic.

Parameters:

intent (FeedUpdateIntent)

Raises:

FeedNotFoundError

set_feed_stale(url, stale, /)

Used by update logic tests.

Parameters:
Raises:

FeedNotFoundError

get_entries_for_update(entries, /)

Called by update logic.

Parameters:

entries (Iterable[tuple[str, str]])

Returns:

An iterable of entry or None (if an entry does not exist), matching the order of the input iterable.

Return type:

Iterable[EntryForUpdate | None]

add_or_update_entries(intents, /)

Called by update logic.

Parameters:

intents (Iterable[EntryUpdateIntent])

Raises:

FeedNotFoundError

get_entry_recent_sort(entry, /)

Get EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:

entry (tuple[str, str])

Returns:

entry recent_sort

Raises:

EntryNotFoundError

Return type:

datetime

set_entry_recent_sort(entry, recent_sort, /)

Set EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:
Raises:

EntryNotFoundError

class reader._types.BoundSearchStorageType

Bases: StorageType, Protocol

A storage that can create a storage-bound search provider.

Create a search provider.

Returns:

A search provider.

Return type:

SearchType

class reader._types.SearchType

Search DAO protocol.

Any method can raise SearchError.

There are two sets of methods that may be called at different times:

management methods

enable() disable() is_enabled() update()

read-only methods

search_entries() search_entry_counts()

Unstable

In the future, search may receive object lifecycle methods (context manager + close()), to support implementations that do not share state with the storage. If you need support for this, please open an issue.

enable()

Called by Reader.enable_search().

A no-op and reasonably fast if search is already enabled.

Checks if all dependencies needed for update() are available, raises SearchError if not.

Raises:

StorageError

disable()

Called by Reader.disable_search().

is_enabled()

Called by Reader.is_search_enabled().

Not called otherwise.

Returns:

Whether search is enabled or not.

Return type:

bool

update()

Called by Reader.update_search().

Should not enable search automatically (handled by Reader).

Raises:
search_entries(query, /, filter, sort, limit, starting_after)

Called by Reader.search_entries().

Parameters:
Returns:

A lazy iterable.

Raises:
Return type:

Iterable[EntrySearchResult]

search_entry_counts(query, /, now, filter)

Called by Reader.search_entry_counts().

Parameters:
Returns:

The counts.

Raises:
Return type:

EntrySearchCounts

Change tracking

class reader._types.ChangeTrackingStorageType

Bases: StorageType, Protocol

A storage that can track changes to the text content of resources.

property changes: ChangeTrackerType

The change tracker associated with this storage.

class reader._types.ChangeTrackerType

Storage API used to keep the full-text search index in sync.

Sync model

The sync model works as follows.

Each resource to be indexed has a sequence that changes every time its text content changes. The sequence can be a global counter, a random number, or a high-precision timestamp; the only requirement is that it won’t be used again (or it’s extremely unlikely that will happen).

Each sequence change gets recorded. Updates are recorded as pairs of DELETE + INSERT changes with the old / new sequences, respectively.

SearchType.update() gets changes and processes them. For INSERT, the resource is indexed only if the change sequence matches the current main storage sequence; otherwise, the change is ignored. For DELETE, the resource is deleted only if the change sequence matches the search index sequence. (This means that, during updates, multiple versions of a resource may appear in the index, with different sequences.) Processed changes are marked as done, regardless of the action taken. Pseudocode:

while changes := self.storage.changes.get():
    self._process_changes(changes)
    self.storage.changes.done(changes)

Enabling change tracking sets the sequence of all resources and adds matching INSERT changes to allow backfilling the search index. The sequence may be None when change tracking is disabled. There is no guarantee the sequence of a resource remains the same when change tracking is disabled and then enabled again.

See also

The model was validated using property-based testing in this gist.

API considerations

The entry sequence is exposed as Entry._sequence, and should change when the entry title, summary, or content change, or when its feed’s title or user_title change.

As of version 3.24, only entry changes are tracked, but the API supports tracking feeds and tags in the future; search implementations should ignore changes to resources they do not support (but still mark them as done!).

Any method can raise StorageError.

enable()

Enable change tracking.

A no-op and reasonably fast if change tracking is already enabled.

disable()

Disable change tracking.

A no-op if change tracking is already disabled.

get(action=None, limit=None)

Return the next batch of changes, if any.

Parameters:
  • action (Action | None) – Only return changes of this type.

  • limit (int | None) – Return at most this many changes; may return fewer, depending on storage internal limits. If none, reasonable limit should be used (hundreds).

Returns:

A batch of changes.

Raises:

ChangeTrackingNotEnabledError

Return type:

list[Change]

done(changes)

Mark changes as done. Ignore unknown changes.

Parameters:

changes (list[Change])

Raises:
class reader._types.Change(action, sequence, resource_id, tag_key=None)

A change to be applied to the search index.

The change can be of an entry, a feed, or a resource tag.

action: Action

Action to take.

sequence: bytes

Resource/tag sequence.

resource_id: tuple[()] | tuple[str] | tuple[str, str]

Resource id.

tag_key: str | None = None

Tag key, if the change is about a tag.

class reader._types.Action(*values)

Action to take.

INSERT = 1

The resource needs to be added to the search index.

DELETE = 2

The resource needs to be deleted from the search index.

Entry._sequence: bytes | None = None

Change sequence.

May be None when change tracking is disabled.

Unstable

This field is part of the unstable change tracking API.

exception reader.exceptions.ChangeTrackingNotEnabledError(message='')

Bases: StorageError

A change tracking method was called when change tracking was not enabled.

Unstable

This exception is part of the unstable change tracking API.

Data objects

class reader._types.FeedData(url, updated=None, title=None, link=None, authors=(), subtitle=None, version=None)

Feed data that comes from the feed.

Attributes are a subset of those of Feed.

url: str
updated: datetime | None = None
title: str | None = None
authors: Sequence[Author] = ()
subtitle: str | None = None
version: str | None = None
as_feed(**kwargs)

Convert this to a feed; kwargs override attributes.

Returns:

Feed.

Return type:

Feed

property resource_id: tuple[str]
property hash: bytes
class reader._types.EntryData(feed_url, id, updated=None, title=None, link=None, authors=(), published=None, summary=None, content=(), enclosures=(), source=None)

Entry data that comes from the feed.

Attributes are a subset of those of Entry.

feed_url: str
id: str
updated: datetime | None = None
title: str | None = None
authors: Sequence[Author] = ()
published: datetime | None = None
summary: str | None = None
content: Sequence[Content] = ()
enclosures: Sequence[Enclosure] = ()
source: EntrySource | None = None
as_entry(**kwargs)

Convert this to an entry; kwargs override attributes.

Returns:

Entry.

Return type:

Entry

property resource_id: tuple[str, str]
property hash: bytes
class reader._types.FeedFilter(feed_url=None, tags=(), broken=None, updates_enabled=None, new=None, update_after=None)

Options for filtering the results feed list operations.

See the Reader.get_feeds() docstring for detailed semantics.

feed_url: str | None

Alias for field number 0

tags: TagFilter

Alias for field number 1

broken: bool | None

Alias for field number 2

updates_enabled: bool | None

Alias for field number 3

new: bool | None

Alias for field number 4

update_after: datetime | None

Alias for field number 5

class reader._types.EntryFilter(feed_url=None, entry_id=None, read=None, important='any', has_enclosures=None, source=None, tags=(), feed_tags=())

Options for filtering the results entry list operations.

See the Reader.get_entries() docstring for detailed semantics.

feed_url: str | None

Alias for field number 0

entry_id: str | None

Alias for field number 1

read: bool | None

Alias for field number 2

important: TristateFilter

Alias for field number 3

has_enclosures: bool | None

Alias for field number 4

source: str | None

Alias for field number 5

tags: TagFilter

Alias for field number 6

feed_tags: TagFilter

Alias for field number 7

class reader._types.FeedForUpdate(url, updated=None, caching_info=None, stale=False, last_updated=None, last_exception=False, hash=None)

Update-relevant information about an existing feed, from Storage.

url: str

The feed URL.

updated: datetime | None

The date the feed was last updated, according to the feed.

caching_info: JSONType | None

Caching info from the last update.

stale: bool

Whether the next update should update all entries, regardless of their hash or updated.

last_updated: datetime | None

The date the feed was last updated, according to reader; none if never.

last_exception: bool

Whether the feed had an exception at the last update.

hash: bytes | None

The hash of the corresponding FeedData.

class reader._types.EntryForUpdate(first_updated, first_updated_epoch, recent_sort, updated, hash, hash_changed)

Update-relevant information about an existing entry, from Storage.

first_updated: datetime

From the last EntryUpdateIntent.first_updated.

first_updated_epoch: datetime

From the last EntryUpdateIntent.first_updated_epoch.

recent_sort: datetime

From the last EntryUpdateIntent.recent_sort.

updated: datetime | None

The date the entry was last updated, according to the entry.

hash: bytes | None

The hash of the corresponding EntryData.

hash_changed: int | None

The number of updates due to a different hash since the last time updated changed.

class reader._types.FeedUpdateIntent(url, last_retrieved, update_after, value)

Data passed to Storage to record a feed update attempt, regardless of the outcome.

url: str

The feed URL.

last_retrieved: datetime

The time at the start of updating this feed.

update_after: datetime

The earliest time the feed will next be updated.

value: FeedToUpdate | None | ExceptionInfo

One of: feed data and metadata (the feed was updated), None (the feed is unchanged) the cause of UpdateError, if one happened.

class reader._types.FeedToUpdate(feed, last_updated, caching_info=None)

Data passed to Storage when (successfully) updating a feed.

feed: FeedData

The feed data.

last_updated: datetime

The time at the start of updating this feed.

caching_info: JSONType | None

Caching info passed back to the retriever on the next update. See ParsedFeed.caching_info for details.

class reader._types.EntryUpdateIntent(entry, last_updated, first_updated, first_updated_epoch, recent_sort, feed_order=0, hash_changed=0, added_by='feed', original_feed_url=None)

Data passed to Storage when updating an entry.

entry: EntryData

The entry data.

last_updated: datetime

The time at the start of updating the feed (start of update_feed() in update_feed(), start of each feed update in update_feeds()).

first_updated: datetime

First last_updated (sets Entry.added). The value from EntryForUpdate if the entry already exists.

first_updated_epoch: datetime

The time at the start of updating this batch of feeds (start of update_feed() in update_feed(), start of update_feeds() in update_feeds()). The value from EntryForUpdate if the entry already exists.

recent_sort: datetime

Sort key for the get_entries() recent sort order. The value from EntryForUpdate if the entry already exists.

feed_order: int

The index of the entry in the feed (zero-based).

hash_changed: int | None

Same as EntryForUpdate.hash_changed.

added_by: Literal['feed', 'user']

Same as Entry.added_by.

original_feed_url: str | None

Same as Entry.original_feed_url. Usually does not need to be set.

Type aliases

reader._types.TagFilter

Like the tags argument of Reader.get_feeds(), except:

  • only the full mutiple-tags-with-disjunction form is used

  • tags are represented as (is negated, tag name) tuples (the - prefix is stripped)

Assuming a tag_filter_argument() function that converts get_feeds() tags to TagFilter:

>>> tag_filter_argument(['one'])
[[(False, 'one')]]
>>> tag_filter_argument(['one', 'two'])
[[(False, 'one')], [(False, 'two')]]
>>> tag_filter_argument([['one', 'two']])
[[(False, 'one'), (False, 'two')]]
>>> tag_filter_argument(['one', '-two'])
[[(False, 'one')], [(True, 'two')]]
>>> tag_filter_argument(True)
[[True]]

alias of Sequence[Sequence[bool | tuple[bool, str]]]

reader._types.TristateFilter

Like TristateFilterInput, but without bool/None aliases.

alias of Literal[‘istrue’, ‘isfalse’, ‘notset’, ‘nottrue’, ‘notfalse’, ‘isset’, ‘any’]

Recipes

Adding custom headers when retrieving feeds

Example of adding custom request headers with SessionFactory.request_hooks:

$ python examples/custom_headers.py
updating...
server: Hello, world!
updated!
import http.server
import threading
from reader import make_reader

# start a background server that logs the received header

class Handler(http.server.BaseHTTPRequestHandler):
    def log_message(self, *_): pass
    def do_GET(self):
        print("server:", self.headers.get('my-header'))
        self.send_error(304)

server = http.server.HTTPServer(('localhost', 8080), Handler)
threading.Thread(target=server.handle_request).start()

# create a reader object

reader = make_reader(':memory:')
reader.add_feed('http://localhost:8080')

# set up a hook that adds the header to each request

def hook(session, request, **kwargs):
    request.headers.setdefault('my-header', 'Hello, world!')

reader._parser.session_factory.request_hooks.append(hook)

# updating the feed sends the modified request to the server

print("updating...")
reader.update_feeds()
print("updated!")

Parsing a feed retrieved with something other than reader

Example of using the reader internal API to parse a feed retrieved asynchronously with HTTPX:

$ python examples/parser_only.py
death and gravity
Has your password been pwned? Or, how I almost failed to search a 37 GB text file in under 1 millisecond (in Python)
import asyncio
import io
import httpx
from reader._parser import default_parser
from werkzeug.http import parse_options_header

url = "https://death.andgravity.com/_feed/index.xml"
meta_parser = default_parser()


async def main():
    async with httpx.AsyncClient() as client:
        response = await client.get(url)

        # to select the parser, we need the MIME type of the response
        content_type = response.headers.get('content-type')
        if content_type:
            mime_type, _ = parse_options_header(content_type)
        else:
            mime_type = None

        # select the parser (raises ParseError if none found)
        parser, _ = meta_parser.get_parser(url, mime_type)

        # wrap the content in a readable binary file
        file = io.BytesIO(response.content)

        # parse the feed; not doing parser(url, file, response.headers) directly
        # because parsing is CPU-intensive and would block the event loop
        feed, entries = await asyncio.to_thread(parser, url, file, response.headers)

        print(feed.title)
        print(entries[0].title)


if __name__ == '__main__':
    asyncio.run(main())