Internal API

This part of the documentation covers the internal interfaces of reader, which are useful for plugins, or if you want to use low-level functionality without using Reader itself.

Warning

As of version 3.12, the internal API is not part of the public API; it is not stable yet and might change without any notice.

Parser

Reader._parser

The Parser instance used by this reader.

reader._parser.default_parser(feed_root=None, session_timeout=(3.05, 60), _lazy=True)

Create a pre-configured Parser.

Parameters:
Returns:

The parser.

Return type:

Parser

class reader._parser.Parser

Retrieve and parse feeds by delegating to retrievers and parsers.

To retrieve and parse a single feed, you can call the parser object directly.

Reader only uses the following methods:

To add retrievers and parsers:

The rest of the methods are low-level methods.

session_factory

SessionFactory used to create Requests sessions for retrieving feeds.

Plugins may add request or response hooks to this.

parallel(feeds, map=<class 'map'>, is_parallel=True)

Retrieve and parse many feeds, possibly in parallel.

Yields the parsed feeds, as soon as they are ready.

Parameters:
  • feeds (iterable(FeedArgument)) – An iterable of feeds.

  • map (function) – A map()-like function; the results can be in any order.

  • is_parallel (bool) – Whether map runs the tasks in parallel.

Yields:

tuple(FeedArgument, ParsedFeed or None or ParseError) – A (feed, result) pair, where result is either:

  • the parsed feed

  • None, if the feed didn’t change

  • an exception instance

__call__(url, http_etag=None, http_last_modified=None)

Retrieve and parse one feed.

This is a convenience wrapper over parallel().

Parameters:
  • feed (str) – The feed URL.

  • http_etag (str or None) – The HTTP ETag header from the last update.

  • http_last_modified (str or None) – The the HTTP Last-Modified header from the last update.

Returns:

The parsed feed or None, if the feed didn’t change.

Return type:

ParsedFeed or None

Raises:

ParseError

retrieve(url, http_etag=None, http_last_modified=None, is_parallel=False)

Retrieve a feed.

Parameters:
  • url (str) – The feed URL.

  • http_etag (str or None) – The HTTP ETag header from the last update.

  • http_last_modified (str or None) – The the HTTP Last-Modified header from the last update.

  • is_parallel (bool) – Whether this was called from parallel() (writes the contents to a temporary file, if possible).

Returns:

A context manager that has as target either the result or None, if the feed didn’t change.

Return type:

contextmanager(RetrieveResult or None)

Raises:

ParseError

parse(url, result)

Parse a retrieved feed.

Parameters:
Returns:

The feed and entry data.

Return type:

ParsedFeed

Raises:

ParseError

get_parser(url, mime_type)

Select an appropriate parser for a feed.

Parsers registered by URL take precedence over those registered by MIME type.

If no MIME type is given, guess it from the URL using mimetypes.guess_type(). If the MIME type can’t be guessed, default to application/octet-stream.

Parameters:
  • url (str) – The feed URL.

  • mime_type (str or None) – The MIME type of the retrieved resource.

Returns:

The parser, and the (possibly guessed) MIME type.

Return type:

tuple(ParserType, str)

Raises:

ParseError – No parser matches.

validate_url(url)

Check if url is valid without actually retrieving it.

Raises:

InvalidFeedURLError – If url is not valid.

mount_retriever(prefix, retriever)

Register a retriever to a URL prefix.

Retrievers are sorted in descending order by prefix length.

Parameters:
get_retriever(url)

Get the retriever for a URL.

Parameters:

url (str) – The URL.

Returns:

The matching retriever.

Return type:

RetrieverType

Raises:

ParseError – No retriever matches the URL.

mount_parser_by_mime_type(parser, http_accept=None)

Register a parser to one or more MIME types.

Parameters:
  • parser (ParserType) – The parser.

  • http_accept (str or None) – The content types the parser supports, as an Accept HTTP header value. If not given, use the parser’s http_accept attribute, if it has one.

Raises:

TypeError – The parser does not have an http_accept attribute, and no http_accept was given.

get_parser_by_mime_type(mime_type)

Get a parser for a MIME type.

Parameters:

mime_type (str) – The MIME type of the feed resource.

Returns:

The parser.

Return type:

ParserType

Raises:

ParseError – No parser matches the MIME type.

mount_parser_by_url(url, parser)

Register a parser to an exact URL.

Parameters:
get_parser_by_url(url)

Get a parser that was registered by URL.

Parameters:

url (str) – The URL.

Returns:

The parser.

Return type:

ParserType

Raises:

ParseError – No parser was registered for the URL.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever.

Delegates to process_feed_for_update() of the appropriate retriever.

Parameters:

feed (FeedForUpdate) – Feed information.

Returns:

The passed-in feed information, possibly modified.

Return type:

FeedForUpdate

process_entry_pairs(url, mime_type, pairs)

Process entry data before being stored.

Delegates to process_entry_pairs() of the appropriate parser.

Parameters:
  • url (str) – The feed URL.

  • mime_type (str or None) – The MIME type of the feed.

  • pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.

Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

class reader._parser.requests.SessionFactory(...)

Manage the lifetime of a session.

To get new session, call the factory directly.

request_hooks: Sequence[RequestHook]

Sequence of RequestHooks to be associated with new sessions.

response_hooks: Sequence[ResponseHook]

Sequence of ResponseHooks to be associated with new sessions.

__call__()

Create a new session.

Return type:

SessionWrapper

transient()

Return the current persistent() session, or a new one.

If a new session was created, it is closed once the context manager is exited.

Return type:

contextmanager(SessionWrapper)

persistent()

Register a persistent session with this factory.

While the context manager returned by this method is entered, all persistent() and transient() calls will return the same session. The session is closed once the outermost persistent() context manager is exited.

Plugins should use transient().

Reentrant, but NOT threadsafe.

Return type:

contextmanager(SessionWrapper)

class reader._parser.requests.SessionWrapper(...)

Minimal wrapper over a requests.Session.

Only provides a limited get() method.

Can be used as a context manager (closes the session on exit).

session: requests.Session

The underlying requests.Session.

request_hooks: Sequence[RequestHook]

Sequence of RequestHooks.

response_hooks: Sequence[ResponseHook]

Sequence of ResponseHooks.

get(url, headers=None, **kwargs)

Like Requests get(), but apply request_hooks and response_hooks.

Parameters:
Keyword Arguments:

**kwargs – Passed to send().

Return type:

requests.Response

caching_get(url, etag=None, last_modified=None, headers=None, **kwargs)

Like get(), but set and return caching headers.

caching_get(url, etag, last_modified) -> response, etag, last_modified

Protocols

class reader._parser.FeedArgument(*args, **kwargs)

Any FeedForUpdate-like object.

property url: str

The feed URL.

property http_etag: str | None

The HTTP ETag header from the last update.

property http_last_modified: str | None

The the HTTP Last-Modified header from the last update.

class reader._parser.RetrieverType(*args, **kwargs)

A callable that knows how to retrieve a feed.

slow_to_read: bool

Allow Parser to read() the result resource into a temporary file, and pass that to the parser (as an optimization). Implies the resource is a readable binary file.

__call__(url, http_etag, http_last_modified, http_accept)

Retrieve a feed.

Parameters:
  • feed (str) – The feed URL.

  • http_etag (str or None) – The HTTP ETag header from the last update.

  • http_last_modified (str or None) – The the HTTP Last-Modified header from the last update.

  • http_accept (str or None) – Content types to be retrieved, as an HTTP Accept header.

Returns:

A context manager that has as target either the result or None, if the feed didn’t change.

Return type:

contextmanager(RetrieveResult or None)

Raises:

ParseError

validate_url(url)

Check if url is valid for this retriever.

Raises:

InvalidFeedURLError – If url is not valid.

class reader._parser.FeedForUpdateRetrieverType(*args, **kwargs)

Bases: RetrieverType[T_co], Protocol

A RetrieverType that can change update-relevant information.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever (RetrieverType.__call__()).

Parameters:

feed (FeedForUpdate) – Feed information.

Returns:

The passed-in feed information, possibly modified.

Return type:

FeedForUpdate

class reader._parser.ParserType(*args, **kwargs)

A callable that knows how to parse a retrieved feed.

__call__(url, resource, headers)

Parse a feed.

Parameters:
  • resource (T_cv) – The feed resource. Usually, a readable binary file.

  • headers (dict(str, str) or None) – The HTTP response headers associated with the resource.

Returns:

The feed and entry data.

Return type:

tuple(FeedData, collection(EntryData))

Raises:

ParseError

class reader._parser.HTTPAcceptParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that knows what content it can handle.

property http_accept: str

The content types this parser supports, as an Accept HTTP header value.

class reader._parser.EntryPairsParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that can modify entry data before being stored.

process_entry_pairs(url, pairs)

Process entry data before being stored.

Parameters:
Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

class reader._parser.requests.RequestHook(*args, **kwargs)

Hook to modify a Request before it is sent.

__call__(session, request, **kwargs)

Modify a request before it is sent.

Parameters:
Keyword Arguments:

**kwargs – Will be passed to send().

Returns:

A (possibly modified) request to be sent. If none, send the initial request.

Return type:

requests.Request or None

class reader._parser.requests.ResponseHook(*args, **kwargs)

Hook to repeat a request depending on the Response.

__call__(session, response, request, **kwargs)

Repeat a request depending on the response.

Parameters:
Keyword Arguments:

**kwargs – Were passed to send().

Returns:

A (possibly new) request to be sent, or None, to return the current response.

Return type:

requests.Request or None

Data objects

class reader._parser.RetrieveResult(resource, mime_type=None, http_etag=None, http_last_modified=None, headers=None)

The result of retrieving a feed, plus metadata.

resource: T_co

The result of retrieving a feed. Usually, a readable binary file. Passed to the parser.

mime_type: str | None = None

The MIME type of the resource. Used to select an appropriate parser.

http_etag: str | None = None

The HTTP ETag header associated with the resource. Passed back to the retriever on the next update.

http_last_modified: str | None = None

The HTTP Last-Modified header associated with the resource. Passed back to the retriever on the next update.

headers: Mapping[str, str] | None = None

The HTTP response headers associated with the resource. Passed to the parser.

class reader._types.ParsedFeed(feed, entries, http_etag=None, http_last_modified=None, mime_type=None)

A parsed feed.

feed: FeedData

The feed.

entries: Iterable[EntryData]

Iterable of entries.

http_etag: str | None

The HTTP ETag header associated with the feed resource. Passed back to the retriever on the next update.

http_last_modified: str | None

The HTTP Last-Modified header associated with the feed resource. Passed back to the retriever on the next update.

mime_type: str | None

The MIME type of the feed resource. Used by process_entry_pairs() to select an appropriate parser.

class reader._types.FeedData(url, updated=None, title=None, link=None, author=None, subtitle=None, version=None)

Feed data that comes from the feed.

Attributes are a subset of those of Feed.

url: str
updated: datetime | None = None
title: str | None = None
author: str | None = None
subtitle: str | None = None
version: str | None = None
as_feed(**kwargs)

Convert this to a feed; kwargs override attributes.

Returns:

Feed.

Return type:

Feed

property resource_id: tuple[str]
property hash: bytes
class reader._types.EntryData(feed_url, id, updated=None, title=None, link=None, author=None, published=None, summary=None, content=(), enclosures=())

Entry data that comes from the feed.

Attributes are a subset of those of Entry.

feed_url: str
id: str
updated: datetime | None = None
title: str | None = None
author: str | None = None
published: datetime | None = None
summary: str | None = None
content: Sequence[Content] = ()
enclosures: Sequence[Enclosure] = ()
as_entry(**kwargs)

Convert this to an entry; kwargs override attributes.

Returns:

Entry.

Return type:

Entry

property resource_id: tuple[str, str]
property hash: bytes
class reader._types.FeedForUpdate(url, updated, http_etag, http_last_modified, stale, last_updated, last_exception, hash)

Update-relevant information about an existing feed, from Storage.

url: str

The feed URL.

updated: datetime | None

The date the feed was last updated, according to the feed.

http_etag: str | None

The HTTP ETag header from the last update.

http_last_modified: str | None

The HTTP Last-Modified header from the last update.

stale: bool

Whether the next update should update all entries, regardless of their hash or updated.

last_updated: datetime | None

The date the feed was last updated, according to reader; none if never.

last_exception: bool

Whether the feed had an exception at the last update.

hash: bytes | None

The hash of the corresponding FeedData.

class reader._types.EntryForUpdate(updated, published, hash, hash_changed)

Update-relevant information about an existing entry, from Storage.

updated: datetime | None

The date the entry was last updated, according to the entry.

published: datetime | None

The date the entry was published, according to the entry.

hash: bytes | None

The hash of the corresponding EntryData.

hash_changed: int | None

The number of updates due to a different hash since the last time updated changed.

Storage

reader storage is abstracted by two DAO protocols: StorageType, which provides the main storage, and SearchType, which provides search-related operations.

Currently, there’s only one supported implementation, based on SQLite.

That said, it is possible to use an alternate implementation by passing a StorageType instance via the _storage make_reader() argument:

reader = make_reader('unused', _storage=MyStorage(...))

The protocols are mostly stable, but some backwards-incompatible changes are expected in the future (known ones are marked below with Unstable). The long term goal is for the storage API to become stable, but at least one other implementation needs to exists before that. (Working on one? Let me know!)

Unstable

Currently, search is tightly-bound to a storage implementation (see make_search()). While the change tracking API allows search implementations to keep in sync with text content changes, there is no convenient way for SearchType.search_entries() to filter/sort results without storage cooperation; StorageType will need additional capabilities to support this.

Reader._storage

The StorageType instance used by this reader.

The SearchType instance used by this reader.

class reader._types.StorageType

Storage DAO protocol.

For methods with Reader correspondents, see the Reader docstrings for detailed semantics.

Any method can raise StorageError.

The behaviors described in Lifecycle and Threading are implemented at the storage level; specifically:

  • The storage can be used directly, without __enter__()ing it. There is no guarantee close() will be called at the end.

  • The storage can be reused after __exit__() / close().

  • The storage can be used from multiple threads, either directly, or as a context manager. Closing the storage in one thread should not close it in another thread.

Schema migrations are transparent to Reader. The current storage implementation does them at initialization, but others may require them to happen out-of-band with user intervention.

All datetime attributes of all parameters and return values are timezone-aware, with the timezone set to utc.

Unstable

In the future, implementations will be required to accept datetimes with any timezone.

Methods, grouped by topic:

object lifecycle

__enter__() __exit__() close()

feeds

add_feed() delete_feed() change_feed_url() get_feeds() get_feed_counts() set_feed_user_title() set_feed_updates_enabled()

entries

add_entry() delete_entries() get_entries() get_entry_counts() set_entry_read() set_entry_important()

tags

get_tags() set_tag() delete_tag()

update

get_feeds_for_update() update_feed() set_feed_stale() get_entries_for_update() add_or_update_entries() get_entry_recent_sort() set_entry_recent_sort()

__enter__()

Called when Reader is used as a context manager.

__exit__(*_)

Called when Reader is used as a context manager.

close()

Called by Reader.close().

add_feed(url, /, added)

Called by Reader.add_feed().

Parameters:
Raises:

FeedExistsError

delete_feed(url, /)

Called by Reader.delete_feed().

Parameters:

url (str) –

Raises:

FeedNotFoundError

change_feed_url(old, new, /)

Called by Reader.change_feed_url().

Parameters:
Raises:

FeedNotFoundError

get_feeds(filter, sort, limit, starting_after)

Called by Reader.get_feeds().

Parameters:
  • filter (FeedFilter) –

  • sort (Literal['title', 'added']) –

  • limit (int | None) –

  • starting_after (str | None) –

Returns:

A lazy iterable.

Raises:

FeedNotFoundError – If starting_after does not exist.

Return type:

Iterable[Feed]

get_feed_counts(filter)

Called by Reader.get_feed_counts().

Parameters:

filter (FeedFilter) –

Returns:

The counts.

Return type:

FeedCounts

set_feed_user_title(url, title, /)

Called by Reader.set_feed_user_title().

Parameters:
  • url (str) –

  • title (str | None) –

Raises:

FeedNotFoundError

set_feed_updates_enabled(url, enabled, /)

Called by Reader.enable_feed_updates() and Reader.disable_feed_updates().

Parameters:
Raises:

FeedNotFoundError

add_entry(intent, /)

Called by Reader.add_entry().

Parameters:

intent (EntryUpdateIntent) –

Raises:
delete_entries(entries, /, *, added_by)

Called by Reader.delete_entry().

Also called by plugins like entry_dedupe.

Parameters:
Raises:
get_entries(filter, sort, limit, starting_after)

Called by Reader.get_entries().

Parameters:
Returns:

A lazy iterable.

Raises:

EntryNotFoundError – If starting_after does not exist.

Return type:

Iterable[Entry]

get_entry_counts(now, filter)

Called by Reader.get_entry_counts().

Unstable

In order to expose better feed interaction statistics, this method will need to return more granular data.

Unstable

In order to support search_entry_counts() of search implementations that are not bound to a storage, this method will need to take an entries argument.

Parameters:
Returns:

The counts.

Return type:

EntryCounts

set_entry_read(entry, read, modified, /)

Called by Reader.set_entry_read().

Parameters:
Raises:

EntryNotFoundError

set_entry_important(entry, important, modified, /)

Called by Reader.set_entry_important().

Parameters:
Raises:

EntryNotFoundError

get_tags(resource_id, key=None, /)

Called by Reader.get_tags().

Also called by Reader.get_tag_keys().

Unstable

A dedicated get_tag_keys() method will be added in the future.

Unstable

Both this method and get_tag_keys() will allow filtering by prefix (include/exclude), case sensitive and insensitive; implementations should allow for this.

Parameters:
Returns:

A lazy iterable.

Return type:

Iterable[tuple[str, reader.types.JSONType]]

set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, /) None
set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, value: JSONType, /) None

Called by Reader.set_tag().

Parameters:
  • resource_id

  • key

  • value

Raises:

ResourceNotFoundError

delete_tag(resource_id, key, /)

Called by Reader.delete_tag().

Parameters:
Raises:

TagNotFoundError

get_feeds_for_update(filter)

Called by update logic.

Parameters:

filter (FeedFilter) –

Returns:

A lazy iterable.

Return type:

Iterable[FeedForUpdate]

update_feed(intent, /)

Called by update logic.

Parameters:

intent (FeedUpdateIntent) –

Raises:

FeedNotFoundError

set_feed_stale(url, stale, /)

Used by update logic tests.

Parameters:
Raises:

FeedNotFoundError

get_entries_for_update(entries, /)

Called by update logic.

Parameters:

entries (Iterable[tuple[str, str]]) –

Returns:

An iterable of entry or None (if an entry does not exist), matching the order of the input iterable.

Return type:

Iterable[EntryForUpdate | None]

add_or_update_entries(intents, /)

Called by update logic.

Parameters:

intents (Iterable[EntryUpdateIntent]) –

Raises:

FeedNotFoundError

get_entry_recent_sort(entry, /)

Get EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:

entry (tuple[str, str]) –

Returns:

entry recent_sort

Raises:

EntryNotFoundError

Return type:

datetime

set_entry_recent_sort(entry, recent_sort, /)

Set EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:
Raises:

EntryNotFoundError

class reader._types.BoundSearchStorageType

Bases: StorageType, Protocol

A storage that can create a storage-bound search provider.

Create a search provider.

Returns:

A search provider.

Return type:

SearchType

class reader._types.SearchType

Search DAO protocol.

Any method can raise SearchError.

There are two sets of methods that may be called at different times:

management methods

enable() disable() is_enabled() update()

read-only methods

search_entries() search_entry_counts()

Unstable

In the future, search may receive object lifecycle methods (context manager + close()), to support implementations that do not share state with the storage. If you need support for this, please open a issue.

enable()

Called by Reader.enable_search().

A no-op and reasonably fast if search is already enabled.

Checks if all dependencies needed for update() are available, raises SearchError if not.

Raises:

StorageError

disable()

Called by Reader.disable_search().

is_enabled()

Called by Reader.is_search_enabled().

Not called otherwise.

Returns:

Whether search is enabled or not.

Return type:

bool

update()

Called by Reader.update_search().

Should not enable search automatically (handled by Reader).

Raises:
search_entries(query, /, filter, sort, limit, starting_after)

Called by Reader.search_entries().

Parameters:
Returns:

A lazy iterable.

Raises:
Return type:

Iterable[EntrySearchResult]

search_entry_counts(query, /, now, filter)

Called by Reader.search_entry_counts().

Parameters:
Returns:

The counts.

Raises:
Return type:

EntrySearchCounts

class reader._types.ChangeTrackingStorageType

Bases: StorageType, Protocol

A storage that can track changes to the text content of resources.

property changes: ChangeTrackerType

The change tracker associated with this storage.

class reader._types.ChangeTrackerType

Storage API used to keep the full-text search index in sync.


The sync model works as follows.

Each resource to be indexed has a sequence that changes every time its text content changes. The sequence can be a global counter, a random number, or a high-precision timestamp; the only requirement is that it won’t be used again (or it’s extremely unlikely that will happen).

Each sequence change gets recorded. Updates are recorded as pairs of DELETE + INSERT changes with the old / new sequences, respectively.

SearchType.update() gets changes and processes them. For INSERT, the resource is indexed only if the change sequence matches the current main storage sequence; otherwise, the change is ignored. For DELETE, the resource is deleted only if the change sequence matches the search index sequence. (This means that, during updates, multiple versions of a resource may appear in the index, with different sequences.) Processed changes are marked as done, regardless of the action taken. Pseudocode:

while changes := self.storage.changes.get():
    self._process_changes(changes)
    self.storage.changes.done(changes)

Enabling change tracking sets the sequence of all resources and adds matching INSERT changes to allow backfilling the search index. The sequence may be None when change tracking is disabled. There is no guarantee the sequence of a resource remains the same when change tracking is disabled and then enabled again.

See also

The model was validated using property-based testing in this gist.


The entry sequence is exposed as Entry._sequence, and should change when the entry title, summary, or content change, or when its feed’s title or user_title change.

As of version 3.12, only entry changes are tracked, but the API supports tracking feeds and tags in the future; search implementations should ignore changes to resources they do not support (but still mark them as done!).

Any method can raise StorageError.

enable()

Enable change tracking.

A no-op and reasonably fast if change tracking is already enabled.

disable()

Disable change tracking.

A no-op if change tracking is already disabled.

get(action=None, limit=None)

Return the next batch of changes, if any.

Parameters:
  • action (Action | None) – Only return changes of this type.

  • limit (int | None) – Return at most this many changes; may return fewer, depending on storage internal limits. If none, reasonable limit should be used (hundreds).

Returns:

A batch of changes.

Raises:

ChangeTrackingNotEnabledError

Return type:

list[Change]

done(changes)

Mark changes as done. Ignore unknown changes.

Parameters:

changes (list[Change]) –

Raises:
class reader._types.Change(action, sequence, resource_id, tag_key=None)

A change to be applied to the search index.

The change can be of an entry, a feed, or a resource tag.

action: Action

Action to take.

sequence: bytes

Resource/tag sequence.

resource_id: tuple[()] | tuple[str] | tuple[str, str]

Resource id.

tag_key: str | None = None

Tag key, if the change is about a tag.

class reader._types.Action(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Action to take.

INSERT = 1

The resource needs to be added to the search index.

DELETE = 2

The resource needs to be deleted from the search index.

Entry._sequence: bytes | None = None

Change sequence.

May be None when change tracking is disabled.

Unstable

This field is part of the unstable change tracking API.

exception reader.exceptions.ChangeTrackingNotEnabledError(message='')

Bases: StorageError

A change tracking method was called when change tracking was not enabled.

Unstable

This exception is part of the unstable change tracking API.

Data objects

class reader._types.FeedFilter(feed_url=None, tags=(), broken=None, updates_enabled=None, new=None)

Options for filtering the results feed list operations.

See the Reader.get_feeds() docstring for detailed semantics.

feed_url: str | None

Alias for field number 0

tags: TagFilter

Alias for field number 1

broken: bool | None

Alias for field number 2

updates_enabled: bool | None

Alias for field number 3

new: bool | None

Alias for field number 4

class reader._types.EntryFilter(feed_url=None, entry_id=None, read=None, important='any', has_enclosures=None, tags=(), feed_tags=())

Options for filtering the results entry list operations.

See the Reader.get_entries() docstring for detailed semantics.

feed_url: str | None

Alias for field number 0

entry_id: str | None

Alias for field number 1

read: bool | None

Alias for field number 2

important: TristateFilter

Alias for field number 3

has_enclosures: bool | None

Alias for field number 4

tags: TagFilter

Alias for field number 5

feed_tags: TagFilter

Alias for field number 6

class reader._types.FeedUpdateIntent(url, last_updated, feed=None, http_etag=None, http_last_modified=None, last_exception=None)

Data to be passed to Storage when updating a feed.

url: str

The feed URL.

last_updated: datetime | None

The time at the start of updating this feed.

feed: FeedData | None

The feed data, if any.

http_etag: str | None

The feed’s ETag header; see ParsedFeed.http_etag for details.

Unstable

http_etag and http_last_modified may be grouped in a single attribute in the future.

http_last_modified: str | None

The feed’s Last-Modified header; see ParsedFeed.http_last_modified for details.

last_exception: ExceptionInfo | None

Cause of UpdateError, if any; if set, everything else except url should be None.

class reader._types.EntryUpdateIntent(entry, last_updated, first_updated, first_updated_epoch, recent_sort, feed_order=0, hash_changed=0, added_by='feed')

Data to be passed to Storage when updating a feed.

entry: EntryData

The entry data.

last_updated: datetime

The time at the start of updating the feed (start of update_feed() in update_feed(), start of each feed update in update_feeds()).

first_updated: datetime | None

First last_updated (sets Entry.added). None if the entry already exists.

first_updated_epoch: datetime | None

The time at the start of updating this batch of feeds (start of update_feed() in update_feed(), start of update_feeds() in update_feeds()). None if the entry already exists.

recent_sort: datetime | None

Sort key for the get_entries() recent sort order.

feed_order: int

The index of the entry in the feed (zero-based).

hash_changed: int | None

Same as EntryForUpdate.hash_changed.

added_by: Literal['feed', 'user']

Same as Entry.added_by.

property new: bool

Whether the entry is new or not.

Type aliases

reader._types.TagFilter

Like the tags argument of Reader.get_feeds(), except:

  • only the full mutiple-tags-with-disjunction form is used

  • tags are represented as (is negated, tag name) tuples (the - prefix is stripped)

Assuming a tag_filter_argument() function that converts get_feeds() tags to TagFilter:

>>> tag_filter_argument(['one'])
[[(False, 'one')]]
>>> tag_filter_argument(['one', 'two'])
[[(False, 'one')], [(False, 'two')]]
>>> tag_filter_argument([['one', 'two']])
[[(False, 'one'), (False, 'two')]]
>>> tag_filter_argument(['one', '-two'])
[[(False, 'one')], [(True, 'two')]]
>>> tag_filter_argument(True)
[[True]]

alias of Sequence[Sequence[Union[bool, tuple[bool, str]]]]

reader._types.TristateFilter

Like TristateFilterInput, but without bool/None aliases.

alias of Literal[‘istrue’, ‘isfalse’, ‘notset’, ‘nottrue’, ‘notfalse’, ‘isset’, ‘any’]

Recipes

Parsing a feed retrieved with something other than reader

Example of using the reader internal API to parse a feed retrieved asynchronously with HTTPX:

$ python examples/parser_only.py
death and gravity
Has your password been pwned? Or, how I almost failed to search a 37 GB text file in under 1 millisecond (in Python)
import asyncio
import io
import httpx
from reader._parser import default_parser
from werkzeug.http import parse_options_header

url = "https://death.andgravity.com/_feed/index.xml"
meta_parser = default_parser()


async def main():
    async with httpx.AsyncClient() as client:
        response = await client.get(url)

        # to select the parser, we need the MIME type of the response
        content_type = response.headers.get('content-type')
        if content_type:
            mime_type, _ = parse_options_header(content_type)
        else:
            mime_type = None

        # select the parser (raises ParseError if none found)
        parser, _ = meta_parser.get_parser(url, mime_type)

        # wrap the content in a readable binary file
        file = io.BytesIO(response.content)

        # parse the feed; not doing parser(url, file, response.headers) directly
        # because parsing is CPU-intensive and would block the event loop
        feed, entries = await asyncio.to_thread(parser, url, file, response.headers)

        print(feed.title)
        print(entries[0].title)


if __name__ == '__main__':
    asyncio.run(main())