Internal API
This part of the documentation covers the internal interfaces of reader,
which are useful for plugins,
or if you want to use low-level functionality
without using Reader itself.
Warning
As of version 3.24, the internal API is not part of the public API; it is not stable yet and might change without any notice.
Parser
- reader._parser.default_parser(feed_root=None, session_timeout=(3.05, 60), _lazy=True)
Create a pre-configured
Parser.- Parameters:
feed_root (str or None) – See
make_reader()for details.session_timeout (float or tuple(float, float) or None) – See
make_reader()for details.
- Returns:
The parser.
- Return type:
- class reader._parser.Parser
Retrieve and parse feeds by delegating to
retrieversandparsers.To retrieve and parse a single feed, you can
callthe parser object directly.Readeronly uses the following methods:To add retrievers and parsers:
The rest of the methods are low-level methods.
- session_factory
SessionFactoryused to create Requests sessions for retrieving feeds.Plugins may add request or response hooks to this.
- parallel(feeds, map=<class 'map'>)
Retrieve and parse many feeds, possibly in parallel.
Yields the parsed feeds, as soon as they are ready.
- Parameters:
feeds (iterable(FeedArgument)) – An iterable of feeds.
map (function) – A
map()-like function; the results can be in any order.
- Yields:
ParseResult – The result of retrieving and parsing a feed; the
feedis the object passed infeeds.
- __call__(url, caching_info=None)
Retrieve and parse one feed.
This is a convenience wrapper over
parallel().- Parameters:
feed (str) – The feed URL.
caching_info (JSONType or None) –
caching_infofrom the last update.
- Returns:
The parsed feed or
None, if the feed didn’t change.- Return type:
ParsedFeed or None
- Raises:
- retrieve_fn(feed)
retrieve()wrapper used byparallel().Takes one argument and does not raise exceptions.
- retrieve(url, caching_info=None)
Retrieve a feed.
- Parameters:
url (str) – The feed URL.
caching_info (JSONType or None) –
caching_infofrom the last update.
- Returns:
A context manager with the retrieved feed as target.
- Return type:
contextmanager(RetrieveResult or None)
- Raises:
- parse_fn(result)
parse()wrapper used byparallel().Takes one argument and does not raise exceptions.
- parse(url, retrieved)
Parse a retrieved feed.
- Parameters:
url (str) – The feed URL.
retrieved (RetrievedFeed) – The retrieved feed.
- Returns:
The feed and entry data.
- Return type:
- Raises:
- get_parser(url, mime_type)
Select an appropriate parser for a feed.
Parsers
registered by URLtake precedence over thoseregistered by MIME type.If no MIME type is given, guess it from the URL using
mimetypes.guess_type(). If the MIME type can’t be guessed, default toapplication/octet-stream.- Parameters:
- Returns:
The parser, and the (possibly guessed) MIME type.
- Return type:
- Raises:
ParseError – No parser matches.
- validate_url(url)
Check if
urlis valid without actually retrieving it.- Raises:
InvalidFeedURLError – If
urlis not valid.
- mount_retriever(prefix, retriever)
Register a retriever to a URL prefix.
Retrievers are sorted in descending order by prefix length.
- Parameters:
prefix (str) – A URL prefix.
retriever (RetrieverType) – The retriever.
- get_retriever(url)
Get the retriever for a URL.
- Parameters:
url (str) – The URL.
- Returns:
The matching retriever.
- Return type:
- Raises:
ParseError – No retriever matches the URL.
- mount_parser_by_mime_type(parser, accept=None)
Register a parser to one or more MIME types.
- Parameters:
parser (ParserType) – The parser.
accept (str or None) – The content types the parser supports, as an HTTP
Acceptheader. If not given, use the parser’sacceptattribute, if it has one.
- Raises:
TypeError – The parser does not have an
acceptattribute, and noacceptwas given.
- get_parser_by_mime_type(mime_type)
Get a parser for a MIME type.
- Parameters:
mime_type (str) – The MIME type of the feed resource.
- Returns:
The parser.
- Return type:
- Raises:
ParseError – No parser matches the MIME type.
- mount_parser_by_url(url, parser)
Register a parser to an exact URL.
- Parameters:
prefix (str) – A URL.
parser (ParserType) – The parser.
- get_parser_by_url(url)
Get a parser that was registered by URL.
- Parameters:
url (str) – The URL.
- Returns:
The parser.
- Return type:
- Raises:
ParseError – No parser was registered for the URL.
- process_feed_for_update(feed)
Change update-relevant information about a feed before it is passed to the retriever.
Delegates to
process_feed_for_update()of the appropriate retriever.- Parameters:
feed (FeedForUpdate) – Feed information.
- Returns:
The passed-in feed information, possibly modified.
- Return type:
- process_entry_pairs(url, mime_type, pairs)
Process entry data before being stored.
Delegates to
process_entry_pairs()of the appropriate parser.- Parameters:
url (str) – The feed URL.
mime_type (str or None) – The MIME type of the feed.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.
- Returns:
(entry data, entry for update) pairs, possibly modified.
- Return type:
iterable(tuple(EntryData, EntryForUpdate or None))
- class reader._parser.requests.SessionFactory(...)
Manage the lifetime of a session.
To get new session,
callthe factory directly.- request_hooks: Sequence[RequestHook]
Sequence of
RequestHooks to be associated with new sessions.
- response_hooks: Sequence[ResponseHook]
Sequence of
ResponseHooks to be associated with new sessions.
- __call__()
Create a new session.
- Return type:
- transient()
Return the current
persistent()session, or a new one.If a new session was created, it is closed once the context manager is exited.
- Return type:
contextmanager(SessionWrapper)
- persistent()
Register a persistent session with this factory.
While the context manager returned by this method is entered, all
persistent()andtransient()calls will return the same session. The session is closed once the outermostpersistent()context manager is exited.Plugins should use
transient().Reentrant, but NOT threadsafe.
- Return type:
contextmanager(SessionWrapper)
- class reader._parser.requests.SessionWrapper(...)
Minimal wrapper over a
requests.Session.Only provides a limited
get()method.Can be used as a context manager (closes the session on exit).
- session: requests.Session
The underlying
requests.Session.
- request_hooks: Sequence[RequestHook]
Sequence of
RequestHooks.
- response_hooks: Sequence[ResponseHook]
Sequence of
ResponseHooks.
- get(url, headers=None, **kwargs)
Like Requests
get(), but applyrequest_hooksandresponse_hooks.
Protocols
- class reader._parser.FeedArgument(*args, **kwargs)
Any
FeedForUpdate-like object.- property caching_info: TypeAliasForwardRef('reader.types.JSONType') | None
caching_infofrom the last update.
- class reader._parser.RetrieverType(*args, **kwargs)
A callable that knows how to retrieve a feed.
- __call__(url, caching_info, accept)
Retrieve a feed.
- Parameters:
feed (str) – The feed URL.
caching_info (JSONType or None) –
caching_infofrom the last update.accept (str or None) – Content types to be retrieved, as an HTTP
Acceptheader.
- Returns:
A context manager that has as target either a
RetrievedFeedwrapping the retrieved resource, or the bare resource.- Return type:
contextmanager(RetrievedFeed or None)
- Raises:
RetrieveError – To pass additional information to the parser.
NotModified – To tell the parser that the feed was not modified.
- validate_url(url)
Check if
urlis valid for this retriever.- Raises:
InvalidFeedURLError – If
urlis not valid.
- exception reader._parser.RetrieveError(url, /, message='', http_info=None)
Bases:
ParseErrorAn error occurred while retrieving the feed.
Can be used by retrievers to pass additional information to the parser.
- exception reader._parser.NotModified(url, /, message='', http_info=None)
Bases:
RetrieveErrorRaised by retrievers to tell the parser that the feed was not modified.
- class reader._parser.FeedForUpdateRetrieverType(*args, **kwargs)
Bases:
RetrieverType[T_co],ProtocolA
RetrieverTypethat can change update-relevant information.- process_feed_for_update(feed)
Change update-relevant information about a feed before it is passed to the retriever (
RetrieverType.__call__()).- Parameters:
feed (FeedForUpdate) – Feed information.
- Returns:
The passed-in feed information, possibly modified.
- Return type:
- class reader._parser.ParserType(*args, **kwargs)
A callable that knows how to parse a retrieved feed.
- __call__(url, resource, headers)
Parse a feed.
- class reader._parser.AcceptParserType(*args, **kwargs)
Bases:
ParserType[T_cv],ProtocolA
ParserTypethat knows what content types it can handle.
- class reader._parser.EntryPairsParserType(*args, **kwargs)
Bases:
ParserType[T_cv],ProtocolA
ParserTypethat can modify entry data before being stored.- process_entry_pairs(url, pairs)
Process entry data before being stored.
- Parameters:
url (str) – The feed URL.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.
- Returns:
(entry data, entry for update) pairs, possibly modified.
- Return type:
iterable(tuple(EntryData, EntryForUpdate or None))
- class reader._parser.requests.RequestHook(*args, **kwargs)
Hook to modify a
Requestbefore it is sent.- __call__(session, request, **kwargs)
Modify a request before it is sent.
- Parameters:
session (requests.Session) – The session that will send the request.
request (requests.Request) – The request to be sent.
- Keyword Arguments:
**kwargs – Will be passed to
send().- Returns:
A (possibly modified) request to be sent. If none, send the initial request.
- Return type:
requests.Request or None
- class reader._parser.requests.ResponseHook(*args, **kwargs)
Hook to repeat a request depending on the
Response.- __call__(session, response, request, **kwargs)
Repeat a request depending on the response.
- Parameters:
session (requests.Session) – The session that sent the request.
request (requests.Request) – The sent request.
response (requests.Response) – The received response.
- Keyword Arguments:
**kwargs – Were passed to
send().- Returns:
A (possibly new) request to be sent, or None, to return the current response.
- Return type:
requests.Request or None
Data objects
- class reader._parser.RetrieveResult(feed, value)
The result of retrieving a feed, regardless of the outcome.
- feed: F
The feed (a
FeedArgument, usually aFeedForUpdate).
- value: ContextManager[RetrievedFeed[T], bool | None] | E
One of:
a context manager with the
RetrievedFeedas targetan exception
- class reader._parser.RetrievedFeed(resource, mime_type=None, caching_info=None, http_info=None, slow_to_read=False)
A (successfully) retrieved feed, plus metadata.
- resource: T
The retrieved resource. Usually, a readable binary file. Passed to the parser.
- reader._parser.ParseResult
alias of
ParseResultBase[FeedForUpdate,FeedData,EntryData,ParseError]
- class reader._parser.HTTPInfo(status, headers)
Details about an HTTP response.
- get_update_after(now)
Select the best “update after” date from available headers.
- parse_date(name, now=None)
Parse an HTTP date header and return a timezone-aware datetime.
Return None if missing or if parsing fails.
If now is given and the Date header is set, make the returned value relative to now.
Storage
reader storage is abstracted by two DAO protocols:
StorageType, which provides the main storage,
and SearchType, which provides search-related operations.
Currently, there’s only one supported implementation, based on SQLite.
That said, it is possible to use an alternate implementation
by passing a StorageType instance
via the _storage make_reader() argument:
reader = make_reader('unused', _storage=MyStorage(...))
The protocols are mostly stable, but some backwards-incompatible changes are expected in the future (known ones are marked below with Unstable). The long term goal is for the storage API to become stable, but at least one other implementation needs to exists before that. (Working on one? Let me know!)
Unstable
Currently, search is tightly-bound to a storage implementation
(see make_search()).
While the change tracking API allows
search implementations to keep in sync with text content changes,
there is no convenient way for SearchType.search_entries()
to filter/sort results without storage cooperation;
StorageType will need
additional capabilities to support this.
- Reader._storage
The
StorageTypeinstance used by this reader.
- Reader._search
The
SearchTypeinstance used by this reader.
- class reader._types.StorageType
Storage DAO protocol.
For methods with
Readercorrespondents, see the Reader docstrings for detailed semantics.Any method can raise
StorageError.The behaviors described in Lifecycle and Threading are implemented at the storage level; specifically:
The storage can be used directly, without
__enter__()ing it. There is no guaranteeclose()will be called at the end.The storage can be reused after
__exit__()/close().The storage can be used from multiple threads, either directly, or as a context manager. Closing the storage in one thread should not close it in another thread.
Schema migrations are transparent to
Reader. The current storage implementation does them at initialization, but others may require them to happen out-of-band with user intervention.All
datetimeattributes of all parameters and return values are timezone-aware, with the timezone set toutc.Unstable
In the future, implementations will be required to accept datetimes with any timezone.
Methods, grouped by topic:
- object lifecycle
- feeds
add_feed()delete_feed()change_feed_url()get_feeds()get_feed_counts()set_feed_user_title()set_feed_updates_enabled()- entries
add_entry()delete_entries()get_entries()get_entry_counts()set_entry_read()set_entry_important()- tags
- update
get_feeds_for_update()update_feed()set_feed_stale()get_entries_for_update()add_or_update_entries()get_entry_recent_sort()set_entry_recent_sort()
- close()
Called by
Reader.close().
- add_feed(url, /, added)
Called by
Reader.add_feed().- Parameters:
url (str)
added (datetime) –
Feed.added
- Raises:
- delete_feed(url, /)
Called by
Reader.delete_feed().- Parameters:
url (str)
- Raises:
- change_feed_url(old, new, /)
Called by
Reader.change_feed_url().- Parameters:
- Raises:
- get_feeds(filter, sort, limit, starting_after)
Called by
Reader.get_feeds().For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the feeds.
- Parameters:
filter (FeedFilter)
sort (FeedSort)
limit (int | None)
starting_after (str | None)
- Returns:
A lazy iterable.
- Raises:
FeedNotFoundError – If
starting_afterdoes not exist.- Return type:
- get_feed_counts(filter)
Called by
Reader.get_feed_counts().- Parameters:
filter (FeedFilter)
- Returns:
The counts.
- Return type:
- set_feed_user_title(url, title, /)
Called by
Reader.set_feed_user_title().- Parameters:
- Raises:
- set_feed_updates_enabled(url, enabled, /)
Called by
Reader.enable_feed_updates()andReader.disable_feed_updates().- Parameters:
- Raises:
- add_entry(intent, /)
Called by
Reader.add_entry().- Parameters:
intent (EntryUpdateIntent)
- Raises:
- delete_entries(entries, /, *, added_by)
Called by
Reader.delete_entry().Also called by plugins like
entry_dedupe.- Parameters:
- Raises:
EntryNotFoundError – An entry does not exist.
EntryError – An entry
added_byis different from the given one.
- get_entries(filter, sort, limit, starting_after)
Called by
Reader.get_entries().For tag filters, implementations should optimize the single-tag case such that listing by tag does not have to go through all the entries.
Additionally, implementations may choose to not implement tag filters more complicated than flat OR (
[['one', 'two', ...]]) or flat AND ([['one'], ['two'], ...]), and raise StorageError instead.- Parameters:
- Returns:
A lazy iterable.
- Raises:
EntryNotFoundError – If
starting_afterdoes not exist.- Return type:
- get_entry_counts(now, filter)
Called by
Reader.get_entry_counts().Unstable
In order to expose better feed interaction statistics, this method will need to return more granular data.
Unstable
In order to support
search_entry_counts()of search implementations that are not bound to a storage, this method will need to take anentriesargument.- Parameters:
filter (EntryFilter)
- Returns:
The counts.
- Return type:
- set_entry_read(entry, read, modified, /)
Called by
Reader.set_entry_read().
- set_entry_important(entry, important, modified, /)
Called by
Reader.set_entry_important().
- get_tags(resource_id, key=None, /)
Called by
Reader.get_tags().Also called by
Reader.get_tag_keys().Unstable
A dedicated
get_tag_keys()method will be added in the future.Unstable
Both this method and
get_tag_keys()will allow filtering by prefix (include/exclude), case sensitive and insensitive; implementations should allow for this.
- set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, /) None
- set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, value: JSONType, /) None
Called by
Reader.set_tag().
- delete_tag(resource_id, key, /)
Called by
Reader.delete_tag().
- get_feeds_for_update(filter)
Called by update logic.
- Parameters:
filter (FeedFilter)
- Returns:
A lazy iterable.
- Return type:
- update_feed(intent, /)
Called by update logic.
- Parameters:
intent (FeedUpdateIntent)
- Raises:
- set_feed_stale(url, stale, /)
Used by update logic tests.
- Parameters:
url (str)
stale (bool) –
FeedForUpdate.stale
- Raises:
- get_entries_for_update(entries, /)
Called by update logic.
- add_or_update_entries(intents, /)
Called by update logic.
- Parameters:
intents (Iterable[EntryUpdateIntent])
- Raises:
- get_entry_recent_sort(entry, /)
Get
EntryUpdateIntent.recent_sort.Used by plugins like
entry_dedupe.- Parameters:
- Returns:
entry
recent_sort- Raises:
- Return type:
- set_entry_recent_sort(entry, recent_sort, /)
Set
EntryUpdateIntent.recent_sort.Used by plugins like
entry_dedupe.- Parameters:
- Raises:
- class reader._types.BoundSearchStorageType
Bases:
StorageType,ProtocolA storage that can create a storage-bound search provider.
- make_search()
Create a search provider.
- Returns:
A search provider.
- Return type:
- class reader._types.SearchType
Search DAO protocol.
Any method can raise
SearchError.There are two sets of methods that may be called at different times:
- management methods
- read-only methods
Unstable
In the future, search may receive object lifecycle methods (context manager +
close()), to support implementations that do not share state with the storage. If you need support for this, please open an issue.- enable()
Called by
Reader.enable_search().A no-op and reasonably fast if search is already enabled.
Checks if all dependencies needed for
update()are available, raisesSearchErrorif not.- Raises:
- disable()
Called by
Reader.disable_search().
- is_enabled()
Called by
Reader.is_search_enabled().Not called otherwise.
- Returns:
Whether search is enabled or not.
- Return type:
- update()
Called by
Reader.update_search().Should not enable search automatically (handled by
Reader).- Raises:
- search_entries(query, /, filter, sort, limit, starting_after)
Called by
Reader.search_entries().- Parameters:
query (str)
filter (EntryFilter)
sort (EntrySearchSort)
limit (int | None)
- Returns:
A lazy iterable.
- Raises:
EntryNotFoundError – If
starting_afterdoes not exist.
- Return type:
- search_entry_counts(query, /, now, filter)
Called by
Reader.search_entry_counts().- Parameters:
query (str)
filter (EntryFilter)
- Returns:
The counts.
- Raises:
- Return type:
Change tracking
- class reader._types.ChangeTrackingStorageType
Bases:
StorageType,ProtocolA storage that can track changes to the text content of resources.
- property changes: ChangeTrackerType
The change tracker associated with this storage.
- class reader._types.ChangeTrackerType
Storage API used to keep the full-text search index in sync.
Sync model
The sync model works as follows.
Each resource to be indexed has a sequence that changes every time its text content changes. The sequence can be a global counter, a random number, or a high-precision timestamp; the only requirement is that it won’t be used again (or it’s extremely unlikely that will happen).
Each sequence change gets recorded. Updates are recorded as pairs of
DELETE+INSERTchanges with the old / new sequences, respectively.SearchType.update()gets changes and processes them. ForINSERT, the resource is indexed only if the change sequence matches the current main storage sequence; otherwise, the change is ignored. ForDELETE, the resource is deleted only if the change sequence matches the search index sequence. (This means that, during updates, multiple versions of a resource may appear in the index, with different sequences.) Processed changes are marked as done, regardless of the action taken. Pseudocode:while changes := self.storage.changes.get(): self._process_changes(changes) self.storage.changes.done(changes)
Enabling change tracking sets the sequence of all resources and adds matching
INSERTchanges to allow backfilling the search index. The sequence may beNonewhen change tracking is disabled. There is no guarantee the sequence of a resource remains the same when change tracking is disabled and then enabled again.See also
The model was validated using property-based testing in this gist.
API considerations
The entry sequence is exposed as
Entry._sequence, and should change when the entrytitle,summary, orcontentchange, or when its feed’stitleoruser_titlechange.As of version 3.24, only entry changes are tracked, but the API supports tracking feeds and tags in the future; search implementations should ignore changes to resources they do not support (but still mark them as done!).
Any method can raise
StorageError.- enable()
Enable change tracking.
A no-op and reasonably fast if change tracking is already enabled.
- disable()
Disable change tracking.
A no-op if change tracking is already disabled.
- get(action=None, limit=None)
Return the next batch of changes, if any.
- Parameters:
- Returns:
A batch of changes.
- Raises:
- Return type:
- done(changes)
Mark changes as done. Ignore unknown changes.
- Parameters:
- Raises:
ValueError – If more changes than
get()returns are passed;done(get())should always work.
- class reader._types.Change(action, sequence, resource_id, tag_key=None)
A change to be applied to the search index.
The change can be of an entry, a feed, or a resource tag.
- class reader._types.Action(*values)
Action to take.
- INSERT = 1
The resource needs to be added to the search index.
- DELETE = 2
The resource needs to be deleted from the search index.
- Entry._sequence: bytes | None = None
Change sequence.
May be
Nonewhen change tracking is disabled.Unstable
This field is part of the unstable change tracking API.
- exception reader.exceptions.ChangeTrackingNotEnabledError(message='')
Bases:
StorageErrorA change tracking method was called when change tracking was not enabled.
Unstable
This exception is part of the unstable change tracking API.
Data objects
- class reader._types.FeedData(url, updated=None, title=None, link=None, authors=(), subtitle=None, version=None)
Feed data that comes from the feed.
Attributes are a subset of those of
Feed.- as_feed(**kwargs)
Convert this to a feed; kwargs override attributes.
- class reader._types.EntryData(feed_url, id, updated=None, title=None, link=None, authors=(), published=None, summary=None, content=(), enclosures=(), source=None)
Entry data that comes from the feed.
Attributes are a subset of those of
Entry.- source: EntrySource | None = None
- as_entry(**kwargs)
Convert this to an entry; kwargs override attributes.
- class reader._types.FeedFilter(feed_url=None, tags=(), broken=None, updates_enabled=None, new=None, update_after=None)
Options for filtering the results feed list operations.
See the
Reader.get_feeds()docstring for detailed semantics.
- class reader._types.EntryFilter(feed_url=None, entry_id=None, read=None, important='any', has_enclosures=None, source=None, tags=(), feed_tags=())
Options for filtering the results entry list operations.
See the
Reader.get_entries()docstring for detailed semantics.- important: TristateFilter
Alias for field number 3
- class reader._types.FeedForUpdate(url, updated=None, caching_info=None, stale=False, last_updated=None, last_exception=False, hash=None)
Update-relevant information about an existing feed, from Storage.
- stale: bool
Whether the next update should update all entries, regardless of their
hashorupdated.
- class reader._types.EntryForUpdate(first_updated, first_updated_epoch, recent_sort, updated, hash, hash_changed)
Update-relevant information about an existing entry, from Storage.
- first_updated: datetime
From the last
EntryUpdateIntent.first_updated.
- first_updated_epoch: datetime
From the last
EntryUpdateIntent.first_updated_epoch.
- recent_sort: datetime
From the last
EntryUpdateIntent.recent_sort.
- class reader._types.FeedUpdateIntent(url, last_retrieved, update_after, value)
Data passed to Storage to record a feed update attempt, regardless of the outcome.
- value: FeedToUpdate | None | ExceptionInfo
One of: feed data and metadata (the feed was updated), None (the feed is unchanged) the cause of
UpdateError, if one happened.
- class reader._types.FeedToUpdate(feed, last_updated, caching_info=None)
Data passed to Storage when (successfully) updating a feed.
- last_updated: datetime
The time at the start of updating this feed.
- class reader._types.EntryUpdateIntent(entry, last_updated, first_updated, first_updated_epoch, recent_sort, feed_order=0, hash_changed=0, added_by='feed', original_feed_url=None)
Data passed to Storage when updating an entry.
- last_updated: datetime
The time at the start of updating the feed (start of
update_feed()inupdate_feed(), start of each feed update inupdate_feeds()).
- first_updated: datetime
First
last_updated(setsEntry.added). The value fromEntryForUpdateif the entry already exists.
- first_updated_epoch: datetime
The time at the start of updating this batch of feeds (start of
update_feed()inupdate_feed(), start ofupdate_feeds()inupdate_feeds()). The value fromEntryForUpdateif the entry already exists.
- recent_sort: datetime
Sort key for the
get_entries()recentsort order. The value fromEntryForUpdateif the entry already exists.
- hash_changed: int | None
Same as
EntryForUpdate.hash_changed.
- added_by: Literal['feed', 'user']
Same as
Entry.added_by.
- original_feed_url: str | None
Same as
Entry.original_feed_url. Usually does not need to be set.
Type aliases
- reader._types.TagFilter
Like the
tagsargument ofReader.get_feeds(), except:only the full mutiple-tags-with-disjunction form is used
tags are represented as (is negated, tag name) tuples (the
-prefix is stripped)
Assuming a
tag_filter_argument()function that convertsget_feeds()tags toTagFilter:>>> tag_filter_argument(['one']) [[(False, 'one')]] >>> tag_filter_argument(['one', 'two']) [[(False, 'one')], [(False, 'two')]] >>> tag_filter_argument([['one', 'two']]) [[(False, 'one'), (False, 'two')]] >>> tag_filter_argument(['one', '-two']) [[(False, 'one')], [(True, 'two')]] >>> tag_filter_argument(True) [[True]]
- reader._types.TristateFilter
Like
TristateFilterInput, but without bool/None aliases.alias of
Literal[‘istrue’, ‘isfalse’, ‘notset’, ‘nottrue’, ‘notfalse’, ‘isset’, ‘any’]
Recipes
Adding custom headers when retrieving feeds
Example of adding custom request headers with SessionFactory.request_hooks:
$ python examples/custom_headers.py
updating...
server: Hello, world!
updated!
import http.server
import threading
from reader import make_reader
# start a background server that logs the received header
class Handler(http.server.BaseHTTPRequestHandler):
def log_message(self, *_): pass
def do_GET(self):
print("server:", self.headers.get('my-header'))
self.send_error(304)
server = http.server.HTTPServer(('localhost', 8080), Handler)
threading.Thread(target=server.handle_request).start()
# create a reader object
reader = make_reader(':memory:')
reader.add_feed('http://localhost:8080')
# set up a hook that adds the header to each request
def hook(session, request, **kwargs):
request.headers.setdefault('my-header', 'Hello, world!')
reader._parser.session_factory.request_hooks.append(hook)
# updating the feed sends the modified request to the server
print("updating...")
reader.update_feeds()
print("updated!")
Parsing a feed retrieved with something other than reader
Example of using the reader internal API to parse a feed retrieved asynchronously with HTTPX:
$ python examples/parser_only.py
death and gravity
Has your password been pwned? Or, how I almost failed to search a 37 GB text file in under 1 millisecond (in Python)
import asyncio
import io
import httpx
from reader._parser import default_parser
from werkzeug.http import parse_options_header
url = "https://death.andgravity.com/_feed/index.xml"
meta_parser = default_parser()
async def main():
async with httpx.AsyncClient() as client:
response = await client.get(url)
# to select the parser, we need the MIME type of the response
content_type = response.headers.get('content-type')
if content_type:
mime_type, _ = parse_options_header(content_type)
else:
mime_type = None
# select the parser (raises ParseError if none found)
parser, _ = meta_parser.get_parser(url, mime_type)
# wrap the content in a readable binary file
file = io.BytesIO(response.content)
# parse the feed; not doing parser(url, file, response.headers) directly
# because parsing is CPU-intensive and would block the event loop
feed, entries = await asyncio.to_thread(parser, url, file, response.headers)
print(feed.title)
print(entries[0].title)
if __name__ == '__main__':
asyncio.run(main())