Internal API

This part of the documentation covers the internal interfaces of reader, which are useful for plugins, or if you want to use low-level functionality without using Reader itself.

Warning

As of version 3.26, the internal API is not part of the public API; it is not stable yet and might change without any notice.

Parser

Reader._parser: The Parser instance used by this reader.

reader._parser.default_parser(feed_root=None, user_agent=None, session_timeout=(3.05, 60))

Create a pre-configured Parser.

Parameters:

feed_root (str or None) – See make_reader() for details.
session_timeout (float or tuple(float, float) or None) – See make_reader() for details.

Returns:

The parser.

Return type:

Parser

class reader._parser.Parser

Retrieve and parse feeds by delegating to retrievers and parsers.

To retrieve and parse a single feed, you can call the parser object directly.

Reader only uses the following methods:

parallel()
validate_url()
process_feed_for_update()
process_entry_pairs()

To add retrievers and parsers, ideally during lazy_init():

mount_retriever()
mount_parser_by_mime_type()
mount_parser_by_url()

The rest of the methods are low-level methods.

lazy_init(func)

Decorator used to register a lazy initialization function.

You should use a lazy init function instead of calling Parser methods directly if:

your parser / retriever has slow imports
you want to modify existing retrievers (e.g. to add HTTPRetriever.request_hooks)

parallel(feeds, map=<class 'map'>)

Retrieve and parse many feeds, possibly in parallel.

Yields the parsed feeds, as soon as they are ready.

Parameters:

feeds (iterable(FeedArgument)) – An iterable of feeds.
map (function) – A map()-like function; the results can be in any order.

Yields:

ParseResult – The result of retrieving and parsing a feed; the feed is the object passed in feeds.

__call__(url, caching_info=None)

Retrieve and parse one feed.

This is a convenience wrapper over parallel().

Parameters:

feed (str) – The feed URL.
caching_info (JSONType or None) – caching_info from the last update.

Returns:

The parsed feed or None, if the feed didn’t change.

Return type:

ParsedFeed or None

Raises:

ParseError –

retrieve_fn(feed)

retrieve() wrapper used by parallel().

Takes one argument and does not raise exceptions.

retrieve(url, caching_info=None)

Retrieve a feed.

Parameters:

url (str) – The feed URL.
caching_info (JSONType or None) – caching_info from the last update.

Returns:

A context manager with the retrieved feed as target.

Return type:

contextmanager(RetrieveResult or None)

Raises:

ParseError –

parse_fn(result)

parse() wrapper used by parallel().

Takes one argument and does not raise exceptions.

parse(url, retrieved)

Parse a retrieved feed.

Parameters:

url (str) – The feed URL.
retrieved (RetrievedFeed) – The retrieved feed.

Returns:

The feed and entry data.

Return type:

ParsedFeed

Raises:

ParseError –

get_parser(url, mime_type)

Select an appropriate parser for a feed.

Parsers registered by URL take precedence over those registered by MIME type.

If no MIME type is given, guess it from the URL using mimetypes.guess_type(). If the MIME type can’t be guessed, default to application/octet-stream.

Parameters:

url (str) – The feed URL.
mime_type (str or None) – The MIME type of the retrieved resource.

Returns:

The parser, and the (possibly guessed) MIME type.

Return type:

tuple(ParserType, str)

Raises:

ParseError – No parser matches.

validate_url(url)

Check if url is valid without actually retrieving it.

Raises:: InvalidFeedURLError – If url is not valid.

mount_retriever(prefix, retriever)

Register a retriever to a URL prefix.

Retrievers are sorted in descending order by prefix length.

Parameters:

prefix (str) – A URL prefix.
retriever (RetrieverType) – The retriever.

get_retriever(url)

Get the retriever for a URL.

Parameters:: url (str) – The URL.
Returns:: The matching retriever.
Return type:: RetrieverType
Raises:: ParseError – No retriever matches the URL.

mount_parser_by_mime_type(parser, accept=None)

Register a parser to one or more MIME types.

Parameters:

parser (ParserType) – The parser.
accept (str or None) – The content types the parser supports, as an HTTP Accept header. If not given, use the parser’s accept attribute, if it has one.

Raises:

TypeError – The parser does not have an accept attribute, and no accept was given.

get_parser_by_mime_type(mime_type)

Get a parser for a MIME type.

Parameters:: mime_type (str) – The MIME type of the feed resource.
Returns:: The parser.
Return type:: ParserType
Raises:: ParseError – No parser matches the MIME type.

mount_parser_by_url(url, parser)

Register a parser to an exact URL.

Parameters:

prefix (str) – A URL.
parser (ParserType) – The parser.

get_parser_by_url(url)

Get a parser that was registered by URL.

Parameters:: url (str) – The URL.
Returns:: The parser.
Return type:: ParserType
Raises:: ParseError – No parser was registered for the URL.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever.

Delegates to process_feed_for_update() of the appropriate retriever.

Parameters:: feed (FeedForUpdate) – Feed information.
Returns:: The passed-in feed information, possibly modified.
Return type:: FeedForUpdate

process_entry_pairs(url, mime_type, pairs)

Process entry data before being stored.

Delegates to process_entry_pairs() of the appropriate parser.

Parameters:

url (str) – The feed URL.
mime_type (str or None) – The MIME type of the feed.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.

Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

Protocols

class reader._parser.FeedArgument(*args, **kwargs)

Any FeedForUpdate-like object.

property url: str: The feed URL.

property caching_info: TypeAliasForwardRef('reader.types.JSONType') | None: caching_info from the last update.

class reader._parser.RetrieverType(*args, **kwargs)

A callable that knows how to retrieve a feed.

If the retriever is also a context manager, it will be entered at the beginning of Parser.parallel() and exited at the end.

__call__(url, caching_info, accept)

Retrieve a feed.

Parameters:

feed (str) – The feed URL.
caching_info (JSONType or None) – caching_info from the last update.
accept (str or None) – Content types to be retrieved, as an HTTP Accept header.

Returns:

A context manager that has as target either a RetrievedFeed wrapping the retrieved resource, or the bare resource.

Return type:

contextmanager(RetrievedFeed or None)

Raises:

ParseError –
RetrieveError – To pass additional information to the parser.
NotModified – To tell the parser that the feed was not modified.

validate_url(url)

Check if url is valid for this retriever.

Raises:: InvalidFeedURLError – If url is not valid.

exception reader._parser.RetrieveError(url, /, message='', http_info=None)

Bases: ParseError

An error occurred while retrieving the feed.

Can be used by retrievers to pass additional information to the parser.

exception reader._parser.NotModified(url, /, message='', http_info=None)

Bases: RetrieveError

Raised by retrievers to tell the parser that the feed was not modified.

class reader._parser.FeedForUpdateRetrieverType(*args, **kwargs)

Bases: RetrieverType[T_co], Protocol

A RetrieverType that can change update-relevant information.

process_feed_for_update(feed)

Change update-relevant information about a feed before it is passed to the retriever (RetrieverType.__call__()).

Parameters:: feed (FeedForUpdate) – Feed information.
Returns:: The passed-in feed information, possibly modified.
Return type:: FeedForUpdate

class reader._parser.ParserType(*args, **kwargs)

A callable that knows how to parse a retrieved feed.

__call__(url, resource, headers)

Parse a feed.

Parameters:

resource (T_cv) – The feed resource. Usually, a readable binary file.
headers (dict(str, str) or None) – The HTTP response headers associated with the resource.

Returns:

The feed and entry data.

Return type:

tuple(FeedData, collection(EntryData))

Raises:

ParseError –

class reader._parser.AcceptParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that knows what content types it can handle.

property accept: str: The content types this parser supports, as an Accept HTTP header value.

class reader._parser.EntryPairsParserType(*args, **kwargs)

Bases: ParserType[T_cv], Protocol

A ParserType that can modify entry data before being stored.

process_entry_pairs(url, pairs)

Process entry data before being stored.

Parameters:

url (str) – The feed URL.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.

Returns:

(entry data, entry for update) pairs, possibly modified.

Return type:

iterable(tuple(EntryData, EntryForUpdate or None))

Data objects

class reader._parser.RetrieveResult(feed, value)

The result of retrieving a feed, regardless of the outcome.

feed: F: The feed (a FeedArgument, usually a FeedForUpdate).

value: ContextManager[RetrievedFeed[T], bool | None] | E

One of:

a context manager with the RetrievedFeed as target
an exception

class reader._parser.RetrievedFeed(resource, mime_type=None, caching_info=None, http_info=None, slow_to_read=False)

A (successfully) retrieved feed, plus metadata.

resource: T: The retrieved resource. Usually, a readable binary file. Passed to the parser.

mime_type: str | None = None: The MIME type of the resource. Used to select an appropriate parser.

caching_info: JSONType | None = None: Caching info passed back to the retriever on the next update. Usually, the ETag and Last-Modified headers.

http_info: HTTPInfo | None = None: Details about the HTTP response.

slow_to_read: bool = False: Allow Parser to read() the resource into a temporary file, and pass that to the parser (as an optimization). Implies the resource is a readable binary file.

reader._parser.ParseResult: alias of ParseResultBase[FeedForUpdate, FeedData, EntryData, ParseError]

reader._parser.ParsedFeed: alias of ParsedFeedBase[FeedData, EntryData]

class reader._parser.HTTPInfo(status, headers)

Details about an HTTP response.

status: int: The HTTP status code.

headers: Mapping[str, str]: The HTTP response headers.

get_update_after(now)

Select the best “update after” date from available headers.

parse_date(name, now=None)

Parse an HTTP date header and return a timezone-aware datetime.

Return None if missing or if parsing fails.

If now is given and the Date header is set, make the returned value relative to now.

property cache_control: RequestCacheControl | None: Parsed Cache-Control header, or None if missing.

Built-in retrievers and parsers

class reader._parser.http.HTTPRetriever(user_agent=None, timeout=(3.05, 60), request_hooks=<factory>, response_hooks=<factory>)

http(s):// retriever that uses Requests.

request_hooks: list[RequestHook]: Sequence of RequestHooks.

response_hooks: list[ResponseHook]: Sequence of ResponseHooks.

class reader._parser.http.RequestHook(*args, **kwargs)

Hook to modify a Request before it is sent.

__call__(session, request, **kwargs)

Modify a request before it is sent.

Parameters:

session (requests.Session) – The session that will send the request.
request (requests.Request) – The request to be sent.

Keyword Arguments:

**kwargs – Will be passed to send().

Returns:

A (possibly modified) request to be sent. If none, send the initial request.

Return type:

requests.Request or None

class reader._parser.http.ResponseHook(*args, **kwargs)

Hook to repeat a request depending on the Response.

__call__(session, response, request, **kwargs)

Repeat a request depending on the response.

Parameters:

session (requests.Session) – The session that sent the request.
request (requests.Request) – The sent request.
response (requests.Response) – The received response.

Keyword Arguments:

**kwargs – Were passed to send().

Returns:

A (possibly new) request to be sent, or None, to return the current response.

Return type:

requests.Request or None

class reader._parser.file.FileRetriever(feed_root)

Bare path and file:// URI parser.

Allows restricting file-system access to a single directory; see make_reader() for details.

class reader._parser.jsonfeed.JSONFeedParser: JSON Feed parser.

Storage

reader storage is abstracted by two DAO protocols: StorageType, which provides the main storage, and SearchType, which provides search-related operations.

Currently, there’s only one supported implementation, based on SQLite.

That said, it is possible to use an alternate implementation by passing a StorageType instance via the _storage make_reader() argument:

reader = make_reader('unused', _storage=MyStorage(...))

The protocols are mostly stable, but some backwards-incompatible changes are expected in the future (known ones are marked below with Unstable). The long term goal is for the storage API to become stable, but at least one other implementation needs to exists before that. (Working on one? Let me know!)

Unstable

Currently, search is tightly-bound to a storage implementation (see make_search()). While the change tracking API allows search implementations to keep in sync with text content changes, there is no convenient way for SearchType.search_entries() to filter/sort results without storage cooperation; StorageType will need additional capabilities to support this.

Reader._storage: The StorageType instance used by this reader.

Reader._search: The SearchType instance used by this reader.

class reader._types.StorageType

Storage DAO protocol.

For methods with Reader correspondents, see the Reader docstrings for detailed semantics.

Any method can raise StorageError.

The behaviors described in Lifecycle and Threading are implemented at the storage level; specifically:

The storage can be used directly, without __enter__()ing it. There is no guarantee close() will be called at the end.
The storage can be reused after __exit__() / close().
The storage can be used from multiple threads, either directly, or as a context manager. Closing the storage in one thread should not close it in another thread.

Schema migrations are transparent to Reader. The current storage implementation does them at initialization, but others may require them to happen out-of-band with user intervention.

All datetime attributes of all parameters and return values are timezone-aware, with the timezone set to utc.

Unstable

In the future, implementations will be required to accept datetimes with any timezone.

Methods, grouped by topic:

__enter__()

Called when Reader is used as a context manager.

__exit__(*_)

Called when Reader is used as a context manager.

close()

Called by Reader.close().

add_feed(url, /, added)

Called by Reader.add_feed().

Parameters:

url (str)
added (datetime) – Feed.added

Raises:

FeedExistsError –

delete_feed(url, /)

Called by Reader.delete_feed().

Parameters:: url (str)
Raises:: FeedNotFoundError –

change_feed_url(old, new, /)

Called by Reader.change_feed_url().

Parameters:

old (str)
new (str)

Raises:

FeedNotFoundError –

get_feeds(filter, sort, limit, starting_after)

Called by Reader.get_feeds().

For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the feeds.

Parameters:

filter (FeedFilter)
sort (FeedSort)
limit (int | None)
starting_after (str | None)

Returns:

A lazy iterable.

Raises:

FeedNotFoundError – If starting_after does not exist.

Return type:

Iterable[Feed]

get_feed_counts(filter)

Called by Reader.get_feed_counts().

Parameters:: filter (FeedFilter)
Returns:: The counts.
Return type:: FeedCounts

set_feed_user_title(url, title, /)

Called by Reader.set_feed_user_title().

Parameters:

url (str)
title (str | None)

Raises:

FeedNotFoundError –

set_feed_updates_enabled(url, enabled, /)

Called by Reader.enable_feed_updates() and Reader.disable_feed_updates().

Parameters:

url (str)
enabled (bool)

Raises:

FeedNotFoundError –

add_entry(intent, /)

Called by Reader.add_entry().

Parameters:

intent (EntryUpdateIntent)

Raises:

EntryExistsError –
FeedNotFoundError –

delete_entries(entries, /, *, added_by)

Called by Reader.delete_entry().

Also called by plugins like entry_dedupe.

Parameters:

entries (Iterable[tuple[str, str]]) – A list of Entry.resource_ids.
added_by (str | None) – If given, only delete the entries if their added_by is equal to this.

Raises:

EntryNotFoundError – An entry does not exist.
EntryError – An entry added_by is different from the given one.

get_entries(filter, sort, limit, starting_after)

Called by Reader.get_entries().

For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the entries.

Additionally, implementations may choose to not implement tag filters more complicated than flat OR ([['one', 'two', ...]]) or flat AND ([['one'], ['two'], ...]), and raise StorageError instead.

Parameters:

filter (EntryFilter)
sort (EntrySort)
limit (int | None)
starting_after (tuple[str, str] | None)

Returns:

A lazy iterable.

Raises:

EntryNotFoundError – If starting_after does not exist.

Return type:

Iterable[Entry]

get_entry_counts(now, filter)

Called by Reader.get_entry_counts().

Unstable

In order to expose better feed interaction statistics, this method will need to return more granular data.

Unstable

In order to support search_entry_counts() of search implementations that are not bound to a storage, this method will need to take an entries argument.

Parameters:

now (datetime) – Time averages is relative to.
filter (EntryFilter)

Returns:

The counts.

Return type:

EntryCounts

set_entry_read(entry, read, modified, /)

Called by Reader.set_entry_read().

Parameters:

entry (tuple[str, str])
read (bool)
modified (datetime | None)

Raises:

EntryNotFoundError –

set_entry_important(entry, important, modified, /)

Called by Reader.set_entry_important().

Parameters:

entry (tuple[str, str])
important (bool | None)
modified (datetime | None)

Raises:

EntryNotFoundError –

get_tags(resource_id, key=None, /)

Called by Reader.get_tags().

Also called by Reader.get_tag_keys().

Unstable

A dedicated get_tag_keys() method will be added in the future.

Unstable

Both this method and get_tag_keys() will allow filtering by prefix (include/exclude), case sensitive and insensitive; implementations should allow for this.

Parameters:

resource_id (tuple[()] | tuple[str] | tuple[str, str] | None | tuple[None] | tuple[None, None])
key (str | None)

Returns:

A lazy iterable.

Return type:

Iterable[tuple[str, TypeAliasForwardRef(‘reader.types.JSONType’)]]

set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, /) → None

set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, value: JSONType, /) → None

Called by Reader.set_tag().

Parameters:

resource_id (tuple[()] | tuple[str] | tuple[str, str])
key (str)
value (MissingType | TypeAliasForwardRef('reader.types.JSONType'))

Raises:

ResourceNotFoundError –

delete_tag(resource_id, key, /)

Called by Reader.delete_tag().

Parameters:

resource_id (tuple[()] | tuple[str] | tuple[str, str])
key (str)

Raises:

TagNotFoundError –

get_feeds_for_update(filter)

Called by update logic.

Parameters:: filter (FeedFilter)
Returns:: A lazy iterable.
Return type:: Iterable[FeedForUpdate]

update_feed(intent, /)

Called by update logic.

Parameters:: intent (FeedUpdateIntent)
Raises:: FeedNotFoundError –

set_feed_stale(url, stale, /)

Used by update logic tests.

Parameters:

url (str)
stale (bool) – FeedForUpdate.stale

Raises:

FeedNotFoundError –

get_entries_for_update(entries, /)

Called by update logic.

Parameters:: entries (Iterable[tuple[str, str]])
Returns:: An iterable of entry or None (if an entry does not exist), matching the order of the input iterable.
Return type:: Iterable[EntryForUpdate | None]

add_or_update_entries(intents, /)

Called by update logic.

Parameters:: intents (Iterable[EntryUpdateIntent])
Raises:: FeedNotFoundError –

get_entry_recent_sort(entry, /)

Get EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:: entry (tuple[str, str])
Returns:: entry recent_sort
Raises:: EntryNotFoundError –
Return type:: datetime

set_entry_recent_sort(entry, recent_sort, /)

Set EntryUpdateIntent.recent_sort.

Used by plugins like entry_dedupe.

Parameters:

entry (tuple[str, str])
recent_sort (datetime)

Raises:

EntryNotFoundError –

export(outdir, prefix)

Export all data to a file in outdir and return its full path.

Aside from starting with prefix, the file name and contents are storage-dependent.

export_name(prefix)

Return the file name of an export.

class reader._types.BoundSearchStorageType

Bases: StorageType, Protocol

A storage that can create a storage-bound search provider.

make_search()

Create a search provider.

Returns:: A search provider.
Return type:: SearchType

class reader._types.SearchType

Search DAO protocol.

Any method can raise SearchError.

There are two sets of methods that may be called at different times:

management methods: enable() disable() is_enabled() update()
read-only methods: search_entries() search_entry_counts()

Unstable

In the future, search may receive object lifecycle methods (context manager + close()), to support implementations that do not share state with the storage. If you need support for this, please open an issue.

enable()

Called by Reader.enable_search().

A no-op and reasonably fast if search is already enabled.

Checks if all dependencies needed for update() are available, raises SearchError if not.

Raises:: StorageError –

disable()

Called by Reader.disable_search().

is_enabled()

Called by Reader.is_search_enabled().

Not called otherwise.

Returns:: Whether search is enabled or not.
Return type:: bool

update()

Called by Reader.update_search().

Should not enable search automatically (handled by Reader).

Raises:

SearchNotEnabledError –
StorageError –

search_entries(query, /, filter, sort, limit, starting_after)

Called by Reader.search_entries().

Parameters:

query (str)
filter (EntryFilter)
sort (EntrySearchSort)
limit (int | None)
starting_after (tuple[str, str] | None)

Returns:

A lazy iterable.

Raises:

SearchNotEnabledError –
InvalidSearchQueryError –
EntryNotFoundError – If starting_after does not exist.

Return type:

Iterable[EntrySearchResult]

search_entry_counts(query, /, now, filter)

Called by Reader.search_entry_counts().

Parameters:

query (str)
now (datetime) – Time averages is relative to.
filter (EntryFilter)

Returns:

The counts.

Raises:

Return type:

EntrySearchCounts

Change tracking

class reader._types.ChangeTrackingStorageType

Bases: StorageType, Protocol

A storage that can track changes to the text content of resources.

property changes: ChangeTrackerType: The change tracker associated with this storage.

class reader._types.ChangeTrackerType

Storage API used to keep the full-text search index in sync.

Sync model

The sync model works as follows.

Each resource to be indexed has a sequence that changes every time its text content changes. The sequence can be a global counter, a random number, or a high-precision timestamp; the only requirement is that it won’t be used again (or it’s extremely unlikely that will happen).

Each sequence change gets recorded. Updates are recorded as pairs of DELETE + INSERT changes with the old / new sequences, respectively.

SearchType.update() gets changes and processes them. For INSERT, the resource is indexed only if the change sequence matches the current main storage sequence; otherwise, the change is ignored. For DELETE, the resource is deleted only if the change sequence matches the search index sequence. (This means that, during updates, multiple versions of a resource may appear in the index, with different sequences.) Processed changes are marked as done, regardless of the action taken. Pseudocode:

while changes := self.storage.changes.get():
    self._process_changes(changes)
    self.storage.changes.done(changes)

Enabling change tracking sets the sequence of all resources and adds matching INSERT changes to allow backfilling the search index. The sequence may be None when change tracking is disabled. There is no guarantee the sequence of a resource remains the same when change tracking is disabled and then enabled again.

Data objects

class reader._types.FeedData(url, updated=None, title=None, link=None, authors=(), subtitle=None, version=None)

Feed data that comes from the feed.

Attributes are a subset of those of Feed.

url: str

updated: datetime | None = None

title: str | None = None

link: str | None = None

authors: Sequence[Author] = ()

subtitle: str | None = None

version: str | None = None

as_feed(**kwargs)

Convert this to a feed; kwargs override attributes.

Returns:: Feed.
Return type:: Feed

property resource_id: tuple[str]

property hash: bytes

class reader._types.EntryData(feed_url, id, updated=None, title=None, link=None, authors=(), published=None, summary=None, content=(), enclosures=(), source=None)

Entry data that comes from the feed.

Attributes are a subset of those of Entry.

feed_url: str

id: str

updated: datetime | None = None

title: str | None = None

link: str | None = None

authors: Sequence[Author] = ()

published: datetime | None = None

summary: str | None = None

content: Sequence[Content] = ()

enclosures: Sequence[Enclosure] = ()

source: EntrySource | None = None

as_entry(**kwargs)

Convert this to an entry; kwargs override attributes.

Returns:: Entry.
Return type:: Entry

property resource_id: tuple[str, str]

property hash: bytes

class reader._types.FeedFilter(feed_url=None, tags=(), broken=None, updates_enabled=None, new=None, update_after=None)

Options for filtering the results feed list operations.

See the Reader.get_feeds() docstring for detailed semantics.

feed_url: str | None: Alias for field number 0

tags: TagFilter: Alias for field number 1

broken: bool | None: Alias for field number 2

updates_enabled: bool | None: Alias for field number 3

new: bool | None: Alias for field number 4

update_after: datetime | None: Alias for field number 5

class reader._types.EntryFilter(feed_url=None, entry_id=None, read=None, important='any', has_enclosures=None, source=None, tags=(), feed_tags=())

Options for filtering the results entry list operations.

See the Reader.get_entries() docstring for detailed semantics.

feed_url: str | None: Alias for field number 0

entry_id: str | None: Alias for field number 1

read: bool | None: Alias for field number 2

important: TristateFilter: Alias for field number 3

has_enclosures: bool | None: Alias for field number 4

source: str | None: Alias for field number 5

tags: TagFilter: Alias for field number 6

feed_tags: TagFilter: Alias for field number 7

class reader._types.FeedForUpdate(url, updated=None, caching_info=None, stale=False, last_updated=None, last_exception=False, hash=None)

Update-relevant information about an existing feed, from Storage.

url: str: The feed URL.

updated: datetime | None: The date the feed was last updated, according to the feed.

caching_info: JSONType | None: Caching info from the last update.

stale: bool: Whether the next update should update all entries, regardless of their hash or updated.

last_updated: datetime | None: The date the feed was last updated, according to reader; none if never.

last_exception: bool: Whether the feed had an exception at the last update.

hash: bytes | None: The hash of the corresponding FeedData.

class reader._types.EntryForUpdate(first_updated, first_updated_epoch, recent_sort, updated, hash, hash_changed)

Update-relevant information about an existing entry, from Storage.

first_updated: datetime: From the last EntryUpdateIntent.first_updated.

first_updated_epoch: datetime: From the last EntryUpdateIntent.first_updated_epoch.

recent_sort: datetime: From the last EntryUpdateIntent.recent_sort.

updated: datetime | None: The date the entry was last updated, according to the entry.

hash: bytes | None: The hash of the corresponding EntryData.

hash_changed: int | None: The number of updates due to a different hash since the last time updated changed.

class reader._types.FeedUpdateIntent(url, last_retrieved, update_after, value)

Data passed to Storage to record a feed update attempt, regardless of the outcome.

url: str: The feed URL.

last_retrieved: datetime: The time at the start of updating this feed.

update_after: datetime: The earliest time the feed will next be updated.

value: FeedToUpdate | None | ExceptionInfo: One of: feed data and metadata (the feed was updated), None (the feed is unchanged) the cause of UpdateError, if one happened.

class reader._types.FeedToUpdate(feed, last_updated, caching_info=None)

Data passed to Storage when (successfully) updating a feed.

feed: FeedData: The feed data.

last_updated: datetime: The time at the start of updating this feed.

caching_info: JSONType | None: Caching info passed back to the retriever on the next update. See ParsedFeed.caching_info for details.

class reader._types.EntryUpdateIntent(entry, last_updated, first_updated, first_updated_epoch, recent_sort, feed_order=0, hash_changed=0, added_by='feed', original_feed_url=None)

Data passed to Storage when updating an entry.

entry: EntryData: The entry data.

last_updated: datetime: The time at the start of updating the feed (start of update_feed() in update_feed(), start of each feed update in update_feeds()).

first_updated: datetime: First last_updated (sets Entry.added). The value from EntryForUpdate if the entry already exists.

first_updated_epoch: datetime: The time at the start of updating this batch of feeds (start of update_feed() in update_feed(), start of update_feeds() in update_feeds()). The value from EntryForUpdate if the entry already exists.

recent_sort: datetime: Sort key for the get_entries() recent sort order. The value from EntryForUpdate if the entry already exists.

feed_order: int: The index of the entry in the feed (zero-based).

hash_changed: int | None: Same as EntryForUpdate.hash_changed.

added_by: Literal['feed', 'user']: Same as Entry.added_by.

original_feed_url: str | None: Same as Entry.original_feed_url. Usually does not need to be set.

Type aliases

reader._types.TagFilter

Like the tags argument of Reader.get_feeds(), except:

only the full mutiple-tags-with-disjunction form is used
tags are represented as (is negated, tag name) tuples (the - prefix is stripped)

Assuming a tag_filter_argument() function that converts get_feeds() tags to TagFilter:

>>> tag_filter_argument(['one'])
[[(False, 'one')]]
>>> tag_filter_argument(['one', 'two'])
[[(False, 'one')], [(False, 'two')]]
>>> tag_filter_argument([['one', 'two']])
[[(False, 'one'), (False, 'two')]]
>>> tag_filter_argument(['one', '-two'])
[[(False, 'one')], [(True, 'two')]]
>>> tag_filter_argument(True)
[[True]]

alias of Sequence[Sequence[bool | tuple[bool, str]]]

reader._types.TristateFilter

Like TristateFilterInput, but without bool/None aliases.

alias of Literal[‘istrue’, ‘isfalse’, ‘notset’, ‘nottrue’, ‘notfalse’, ‘isset’, ‘any’]

Recipes

Adding custom headers when retrieving feeds

Example of adding custom request headers with SessionFactory.request_hooks:

$ python examples/custom_headers.py
updating...
server: Hello, world!
updated!

import http.server
import threading
from reader import make_reader

# start a background server that logs the received header

class Handler(http.server.BaseHTTPRequestHandler):
    def log_message(self, *_): pass
    def do_GET(self):
        print("server:", self.headers.get('my-header'))
        self.send_error(304)

server = http.server.HTTPServer(('localhost', 8080), Handler)
threading.Thread(target=server.handle_request).start()

# create a reader object

reader = make_reader(':memory:')
reader.add_feed('http://localhost:8080')

# set up a hook that adds the header to each request

def hook(session, request, **kwargs):
    request.headers.setdefault('my-header', 'Hello, world!')

reader._parser.session_factory.request_hooks.append(hook)

# updating the feed sends the modified request to the server

print("updating...")
reader.update_feeds()
print("updated!")

Parsing a feed retrieved with something other than reader

Example of using the reader internal API to parse a feed retrieved asynchronously with HTTPX:

$ python examples/parser_only.py
death and gravity
Has your password been pwned? Or, how I almost failed to search a 37 GB text file in under 1 millisecond (in Python)

import asyncio
import io
import httpx
from reader._parser import default_parser
from werkzeug.http import parse_options_header

url = "https://death.andgravity.com/_feed/index.xml"
meta_parser = default_parser()


async def main():
    async with httpx.AsyncClient() as client:
        response = await client.get(url)

        # to select the parser, we need the MIME type of the response
        content_type = response.headers.get('content-type')
        if content_type:
            mime_type, _ = parse_options_header(content_type)
        else:
            mime_type = None

        # select the parser (raises ParseError if none found)
        parser, _ = meta_parser.get_parser(url, mime_type)

        # wrap the content in a readable binary file
        file = io.BytesIO(response.content)

        # parse the feed; not doing parser(url, file, response.headers) directly
        # because parsing is CPU-intensive and would block the event loop
        feed, entries = await asyncio.to_thread(parser, url, file, response.headers)

        print(feed.title)
        print(entries[0].title)


if __name__ == '__main__':
    asyncio.run(main())