Internal API
This part of the documentation covers the internal interfaces of reader,
which are useful for plugins,
or if you want to use low-level functionality
without using Reader
itself.
Warning
As of version 3.13, the internal API is not part of the public API; it is not stable yet and might change without any notice.
Parser
- reader._parser.default_parser(feed_root=None, session_timeout=(3.05, 60), _lazy=True)
Create a pre-configured
Parser
.- Parameters:
feed_root (str or None) – See
make_reader()
for details.session_timeout (float or tuple(float, float) or None) – See
make_reader()
for details.
- Returns:
The parser.
- Return type:
- class reader._parser.Parser
Retrieve and parse feeds by delegating to
retrievers
andparsers
.To retrieve and parse a single feed, you can
call
the parser object directly.Reader
only uses the following methods:To add retrievers and parsers:
The rest of the methods are low-level methods.
- session_factory
SessionFactory
used to create Requests sessions for retrieving feeds.Plugins may add request or response hooks to this.
- parallel(feeds, map=<class 'map'>, is_parallel=True)
Retrieve and parse many feeds, possibly in parallel.
Yields the parsed feeds, as soon as they are ready.
- Parameters:
feeds (iterable(FeedArgument)) – An iterable of feeds.
map (function) – A
map()
-like function; the results can be in any order.is_parallel (bool) – Whether
map
runs the tasks in parallel.
- Yields:
tuple(
FeedArgument
,ParsedFeed
orNone
orParseError
) – A (feed, result) pair, where result is either:the parsed feed
None
, if the feed didn’t changean exception instance
- __call__(url, http_etag=None, http_last_modified=None)
Retrieve and parse one feed.
This is a convenience wrapper over
parallel()
.- Parameters:
- Returns:
The parsed feed or
None
, if the feed didn’t change.- Return type:
ParsedFeed or None
- Raises:
- retrieve(url, http_etag=None, http_last_modified=None, is_parallel=False)
Retrieve a feed.
- Parameters:
url (str) – The feed URL.
http_etag (str or None) – The HTTP
ETag
header from the last update.http_last_modified (str or None) – The the HTTP
Last-Modified
header from the last update.is_parallel (bool) – Whether this was called from
parallel()
(writes the contents to a temporary file, if possible).
- Returns:
A context manager that has as target either the result or
None
, if the feed didn’t change.- Return type:
contextmanager(RetrieveResult or None)
- Raises:
- parse(url, result)
Parse a retrieved feed.
- Parameters:
url (str) – The feed URL.
result (RetrieveResult) – A retrieve result.
- Returns:
The feed and entry data.
- Return type:
- Raises:
- get_parser(url, mime_type)
Select an appropriate parser for a feed.
Parsers
registered by URL
take precedence over thoseregistered by MIME type
.If no MIME type is given, guess it from the URL using
mimetypes.guess_type()
. If the MIME type can’t be guessed, default toapplication/octet-stream
.- Parameters:
- Returns:
The parser, and the (possibly guessed) MIME type.
- Return type:
- Raises:
ParseError – No parser matches.
- validate_url(url)
Check if
url
is valid without actually retrieving it.- Raises:
InvalidFeedURLError – If
url
is not valid.
- mount_retriever(prefix, retriever)
Register a retriever to a URL prefix.
Retrievers are sorted in descending order by prefix length.
- Parameters:
prefix (str) – A URL prefix.
retriever (RetrieverType) – The retriever.
- get_retriever(url)
Get the retriever for a URL.
- Parameters:
url (str) – The URL.
- Returns:
The matching retriever.
- Return type:
- Raises:
ParseError – No retriever matches the URL.
- mount_parser_by_mime_type(parser, http_accept=None)
Register a parser to one or more MIME types.
- Parameters:
parser (ParserType) – The parser.
http_accept (str or None) – The content types the parser supports, as an
Accept
HTTP header value. If not given, use the parser’shttp_accept
attribute, if it has one.
- Raises:
TypeError – The parser does not have an
http_accept
attribute, and nohttp_accept
was given.
- get_parser_by_mime_type(mime_type)
Get a parser for a MIME type.
- Parameters:
mime_type (str) – The MIME type of the feed resource.
- Returns:
The parser.
- Return type:
- Raises:
ParseError – No parser matches the MIME type.
- mount_parser_by_url(url, parser)
Register a parser to an exact URL.
- Parameters:
prefix (str) – A URL.
parser (ParserType) – The parser.
- get_parser_by_url(url)
Get a parser that was registered by URL.
- Parameters:
url (str) – The URL.
- Returns:
The parser.
- Return type:
- Raises:
ParseError – No parser was registered for the URL.
- process_feed_for_update(feed)
Change update-relevant information about a feed before it is passed to the retriever.
Delegates to
process_feed_for_update()
of the appropriate retriever.- Parameters:
feed (FeedForUpdate) – Feed information.
- Returns:
The passed-in feed information, possibly modified.
- Return type:
- process_entry_pairs(url, mime_type, pairs)
Process entry data before being stored.
Delegates to
process_entry_pairs()
of the appropriate parser.- Parameters:
url (str) – The feed URL.
mime_type (str or None) – The MIME type of the feed.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.
- Returns:
(entry data, entry for update) pairs, possibly modified.
- Return type:
iterable(tuple(EntryData, EntryForUpdate or None))
- class reader._parser.requests.SessionFactory(...)
Manage the lifetime of a session.
To get new session,
call
the factory directly.- request_hooks: Sequence[RequestHook]
Sequence of
RequestHook
s to be associated with new sessions.
- response_hooks: Sequence[ResponseHook]
Sequence of
ResponseHook
s to be associated with new sessions.
- __call__()
Create a new session.
- Return type:
- transient()
Return the current
persistent()
session, or a new one.If a new session was created, it is closed once the context manager is exited.
- Return type:
contextmanager(SessionWrapper)
- persistent()
Register a persistent session with this factory.
While the context manager returned by this method is entered, all
persistent()
andtransient()
calls will return the same session. The session is closed once the outermostpersistent()
context manager is exited.Plugins should use
transient()
.Reentrant, but NOT threadsafe.
- Return type:
contextmanager(SessionWrapper)
- class reader._parser.requests.SessionWrapper(...)
Minimal wrapper over a
requests.Session
.Only provides a limited
get()
method.Can be used as a context manager (closes the session on exit).
- session: requests.Session
The underlying
requests.Session
.
- request_hooks: Sequence[RequestHook]
Sequence of
RequestHook
s.
- response_hooks: Sequence[ResponseHook]
Sequence of
ResponseHook
s.
- get(url, headers=None, **kwargs)
Like Requests
get()
, but applyrequest_hooks
andresponse_hooks
.
Protocols
- class reader._parser.FeedArgument(*args, **kwargs)
Any
FeedForUpdate
-like object.
- class reader._parser.RetrieverType(*args, **kwargs)
A callable that knows how to retrieve a feed.
- slow_to_read: bool
Allow
Parser
toread()
the resultresource
into a temporary file, and pass that to the parser (as an optimization). Implies theresource
is a readable binary file.
- __call__(url, http_etag, http_last_modified, http_accept)
Retrieve a feed.
- Parameters:
- Returns:
A context manager that has as target either the result or
None
, if the feed didn’t change.- Return type:
contextmanager(RetrieveResult or None)
- Raises:
- validate_url(url)
Check if
url
is valid for this retriever.- Raises:
InvalidFeedURLError – If
url
is not valid.
- class reader._parser.FeedForUpdateRetrieverType(*args, **kwargs)
Bases:
RetrieverType
[T_co
],Protocol
A
RetrieverType
that can change update-relevant information.- process_feed_for_update(feed)
Change update-relevant information about a feed before it is passed to the retriever (
RetrieverType.__call__()
).- Parameters:
feed (FeedForUpdate) – Feed information.
- Returns:
The passed-in feed information, possibly modified.
- Return type:
- class reader._parser.ParserType(*args, **kwargs)
A callable that knows how to parse a retrieved feed.
- __call__(url, resource, headers)
Parse a feed.
- class reader._parser.HTTPAcceptParserType(*args, **kwargs)
Bases:
ParserType
[T_cv
],Protocol
A
ParserType
that knows what content it can handle.
- class reader._parser.EntryPairsParserType(*args, **kwargs)
Bases:
ParserType
[T_cv
],Protocol
A
ParserType
that can modify entry data before being stored.- process_entry_pairs(url, pairs)
Process entry data before being stored.
- Parameters:
url (str) – The feed URL.
pairs (iterable(tuple(EntryData, EntryForUpdate or None))) – (entry data, entry for update) pairs.
- Returns:
(entry data, entry for update) pairs, possibly modified.
- Return type:
iterable(tuple(EntryData, EntryForUpdate or None))
- class reader._parser.requests.RequestHook(*args, **kwargs)
Hook to modify a
Request
before it is sent.- __call__(session, request, **kwargs)
Modify a request before it is sent.
- Parameters:
session (requests.Session) – The session that will send the request.
request (requests.Request) – The request to be sent.
- Keyword Arguments:
**kwargs – Will be passed to
send()
.- Returns:
A (possibly modified) request to be sent. If none, send the initial request.
- Return type:
requests.Request or None
- class reader._parser.requests.ResponseHook(*args, **kwargs)
Hook to repeat a request depending on the
Response
.- __call__(session, response, request, **kwargs)
Repeat a request depending on the response.
- Parameters:
session (requests.Session) – The session that sent the request.
request (requests.Request) – The sent request.
response (requests.Response) – The received response.
- Keyword Arguments:
**kwargs – Were passed to
send()
.- Returns:
A (possibly new) request to be sent, or None, to return the current response.
- Return type:
requests.Request or None
Data objects
- class reader._parser.RetrieveResult(resource, mime_type=None, http_etag=None, http_last_modified=None, headers=None)
The result of retrieving a feed, plus metadata.
- resource: T_co
The result of retrieving a feed. Usually, a readable binary file. Passed to the parser.
- http_etag: str | None = None
The HTTP
ETag
header associated with the resource. Passed back to the retriever on the next update.
- class reader._types.ParsedFeed(feed, entries, http_etag=None, http_last_modified=None, mime_type=None)
A parsed feed.
- http_etag: str | None
The HTTP
ETag
header associated with the feed resource. Passed back to the retriever on the next update.
- http_last_modified: str | None
The HTTP
Last-Modified
header associated with the feed resource. Passed back to the retriever on the next update.
- mime_type: str | None
The MIME type of the feed resource. Used by
process_entry_pairs()
to select an appropriate parser.
- class reader._types.FeedData(url, updated=None, title=None, link=None, author=None, subtitle=None, version=None)
Feed data that comes from the feed.
Attributes are a subset of those of
Feed
.- as_feed(**kwargs)
Convert this to a feed; kwargs override attributes.
- class reader._types.EntryData(feed_url, id, updated=None, title=None, link=None, author=None, published=None, summary=None, content=(), enclosures=())
Entry data that comes from the feed.
Attributes are a subset of those of
Entry
.- as_entry(**kwargs)
Convert this to an entry; kwargs override attributes.
- class reader._types.FeedForUpdate(url, updated, http_etag, http_last_modified, stale, last_updated, last_exception, hash)
Update-relevant information about an existing feed, from Storage.
- stale: bool
Whether the next update should update all entries, regardless of their
hash
orupdated
.
- class reader._types.EntryForUpdate(updated, published, hash, hash_changed)
Update-relevant information about an existing entry, from Storage.
Storage
reader storage is abstracted by two DAO protocols:
StorageType
, which provides the main storage,
and SearchType
, which provides search-related operations.
Currently, there’s only one supported implementation, based on SQLite.
That said, it is possible to use an alternate implementation
by passing a StorageType
instance
via the _storage
make_reader()
argument:
reader = make_reader('unused', _storage=MyStorage(...))
The protocols are mostly stable, but some backwards-incompatible changes are expected in the future (known ones are marked below with Unstable). The long term goal is for the storage API to become stable, but at least one other implementation needs to exists before that. (Working on one? Let me know!)
Unstable
Currently, search is tightly-bound to a storage implementation
(see make_search()
).
While the change tracking API allows
search implementations to keep in sync with text content changes,
there is no convenient way for SearchType.search_entries()
to filter/sort results without storage cooperation;
StorageType
will need
additional capabilities to support this.
- Reader._storage
The
StorageType
instance used by this reader.
- Reader._search
The
SearchType
instance used by this reader.
- class reader._types.StorageType
Storage DAO protocol.
For methods with
Reader
correspondents, see the Reader docstrings for detailed semantics.Any method can raise
StorageError
.The behaviors described in Lifecycle and Threading are implemented at the storage level; specifically:
The storage can be used directly, without
__enter__()
ing it. There is no guaranteeclose()
will be called at the end.The storage can be reused after
__exit__()
/close()
.The storage can be used from multiple threads, either directly, or as a context manager. Closing the storage in one thread should not close it in another thread.
Schema migrations are transparent to
Reader
. The current storage implementation does them at initialization, but others may require them to happen out-of-band with user intervention.All
datetime
attributes of all parameters and return values are timezone-aware, with the timezone set toutc
.Unstable
In the future, implementations will be required to accept datetimes with any timezone.
Methods, grouped by topic:
- object lifecycle
- feeds
add_feed()
delete_feed()
change_feed_url()
get_feeds()
get_feed_counts()
set_feed_user_title()
set_feed_updates_enabled()
- entries
add_entry()
delete_entries()
get_entries()
get_entry_counts()
set_entry_read()
set_entry_important()
- tags
- update
get_feeds_for_update()
update_feed()
set_feed_stale()
get_entries_for_update()
add_or_update_entries()
get_entry_recent_sort()
set_entry_recent_sort()
- close()
Called by
Reader.close()
.
- add_feed(url, /, added)
Called by
Reader.add_feed()
.- Parameters:
url (str) –
added (datetime) –
Feed.added
- Raises:
- delete_feed(url, /)
Called by
Reader.delete_feed()
.- Parameters:
url (str) –
- Raises:
- change_feed_url(old, new, /)
Called by
Reader.change_feed_url()
.- Parameters:
- Raises:
- get_feeds(filter, sort, limit, starting_after)
Called by
Reader.get_feeds()
.- Parameters:
filter (FeedFilter) –
sort (Literal['title', 'added']) –
limit (int | None) –
starting_after (str | None) –
- Returns:
A lazy iterable.
- Raises:
FeedNotFoundError – If
starting_after
does not exist.- Return type:
- get_feed_counts(filter)
Called by
Reader.get_feed_counts()
.- Parameters:
filter (FeedFilter) –
- Returns:
The counts.
- Return type:
- set_feed_user_title(url, title, /)
Called by
Reader.set_feed_user_title()
.- Parameters:
- Raises:
- set_feed_updates_enabled(url, enabled, /)
Called by
Reader.enable_feed_updates()
andReader.disable_feed_updates()
.- Parameters:
- Raises:
- add_entry(intent, /)
Called by
Reader.add_entry()
.- Parameters:
intent (EntryUpdateIntent) –
- Raises:
- delete_entries(entries, /, *, added_by)
Called by
Reader.delete_entry()
.Also called by plugins like
entry_dedupe
.- Parameters:
- Raises:
EntryNotFoundError – An entry does not exist.
EntryError – An entry
added_by
is different from the given one.
- get_entries(filter, sort, limit, starting_after)
Called by
Reader.get_entries()
.- Parameters:
- Returns:
A lazy iterable.
- Raises:
EntryNotFoundError – If
starting_after
does not exist.- Return type:
- get_entry_counts(now, filter)
Called by
Reader.get_entry_counts()
.Unstable
In order to expose better feed interaction statistics, this method will need to return more granular data.
Unstable
In order to support
search_entry_counts()
of search implementations that are not bound to a storage, this method will need to take anentries
argument.- Parameters:
filter (EntryFilter) –
- Returns:
The counts.
- Return type:
- set_entry_read(entry, read, modified, /)
Called by
Reader.set_entry_read()
.
- set_entry_important(entry, important, modified, /)
Called by
Reader.set_entry_important()
.
- get_tags(resource_id, key=None, /)
Called by
Reader.get_tags()
.Also called by
Reader.get_tag_keys()
.Unstable
A dedicated
get_tag_keys()
method will be added in the future.Unstable
Both this method and
get_tag_keys()
will allow filtering by prefix (include/exclude), case sensitive and insensitive; implementations should allow for this.
- set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, /) None
- set_tag(resource_id: tuple[()] | tuple[str] | tuple[str, str], key: str, value: JSONType, /) None
Called by
Reader.set_tag()
.- Parameters:
resource_id –
key –
value –
- Raises:
- delete_tag(resource_id, key, /)
Called by
Reader.delete_tag()
.
- get_feeds_for_update(filter)
Called by update logic.
- Parameters:
filter (FeedFilter) –
- Returns:
A lazy iterable.
- Return type:
- update_feed(intent, /)
Called by update logic.
- Parameters:
intent (FeedUpdateIntent) –
- Raises:
- set_feed_stale(url, stale, /)
Used by update logic tests.
- Parameters:
url (str) –
stale (bool) –
FeedForUpdate.stale
- Raises:
- get_entries_for_update(entries, /)
Called by update logic.
- add_or_update_entries(intents, /)
Called by update logic.
- Parameters:
intents (Iterable[EntryUpdateIntent]) –
- Raises:
- get_entry_recent_sort(entry, /)
Get
EntryUpdateIntent.recent_sort
.Used by plugins like
entry_dedupe
.- Parameters:
- Returns:
entry
recent_sort
- Raises:
- Return type:
- set_entry_recent_sort(entry, recent_sort, /)
Set
EntryUpdateIntent.recent_sort
.Used by plugins like
entry_dedupe
.- Parameters:
- Raises:
- class reader._types.BoundSearchStorageType
Bases:
StorageType
,Protocol
A storage that can create a storage-bound search provider.
- make_search()
Create a search provider.
- Returns:
A search provider.
- Return type:
- class reader._types.SearchType
Search DAO protocol.
Any method can raise
SearchError
.There are two sets of methods that may be called at different times:
- management methods
- read-only methods
Unstable
In the future, search may receive object lifecycle methods (context manager +
close()
), to support implementations that do not share state with the storage. If you need support for this, please open a issue.- enable()
Called by
Reader.enable_search()
.A no-op and reasonably fast if search is already enabled.
Checks if all dependencies needed for
update()
are available, raisesSearchError
if not.- Raises:
- disable()
Called by
Reader.disable_search()
.
- is_enabled()
Called by
Reader.is_search_enabled()
.Not called otherwise.
- Returns:
Whether search is enabled or not.
- Return type:
- update()
Called by
Reader.update_search()
.Should not enable search automatically (handled by
Reader
).- Raises:
- search_entries(query, /, filter, sort, limit, starting_after)
Called by
Reader.search_entries()
.- Parameters:
- Returns:
A lazy iterable.
- Raises:
EntryNotFoundError – If
starting_after
does not exist.
- Return type:
- search_entry_counts(query, /, now, filter)
Called by
Reader.search_entry_counts()
.- Parameters:
query (str) –
filter (EntryFilter) –
- Returns:
The counts.
- Raises:
- Return type:
- class reader._types.ChangeTrackingStorageType
Bases:
StorageType
,Protocol
A storage that can track changes to the text content of resources.
- property changes: ChangeTrackerType
The change tracker associated with this storage.
- class reader._types.ChangeTrackerType
Storage API used to keep the full-text search index in sync.
The sync model works as follows.
Each resource to be indexed has a sequence that changes every time its text content changes. The sequence can be a global counter, a random number, or a high-precision timestamp; the only requirement is that it won’t be used again (or it’s extremely unlikely that will happen).
Each sequence change gets recorded. Updates are recorded as pairs of
DELETE
+INSERT
changes with the old / new sequences, respectively.SearchType.update()
gets changes and processes them. ForINSERT
, the resource is indexed only if the change sequence matches the current main storage sequence; otherwise, the change is ignored. ForDELETE
, the resource is deleted only if the change sequence matches the search index sequence. (This means that, during updates, multiple versions of a resource may appear in the index, with different sequences.) Processed changes are marked as done, regardless of the action taken. Pseudocode:while changes := self.storage.changes.get(): self._process_changes(changes) self.storage.changes.done(changes)
Enabling change tracking sets the sequence of all resources and adds matching
INSERT
changes to allow backfilling the search index. The sequence may beNone
when change tracking is disabled. There is no guarantee the sequence of a resource remains the same when change tracking is disabled and then enabled again.See also
The model was validated using property-based testing in this gist.
The entry sequence is exposed as
Entry._sequence
, and should change when the entrytitle
,summary
, orcontent
change, or when its feed’stitle
oruser_title
change.As of version 3.13, only entry changes are tracked, but the API supports tracking feeds and tags in the future; search implementations should ignore changes to resources they do not support (but still mark them as done!).
Any method can raise
StorageError
.- enable()
Enable change tracking.
A no-op and reasonably fast if change tracking is already enabled.
- disable()
Disable change tracking.
A no-op if change tracking is already disabled.
- get(action=None, limit=None)
Return the next batch of changes, if any.
- Parameters:
- Returns:
A batch of changes.
- Raises:
- Return type:
- done(changes)
Mark changes as done. Ignore unknown changes.
- Parameters:
- Raises:
ValueError – If more changes than
get()
returns are passed;done(get())
should always work.
- class reader._types.Change(action, sequence, resource_id, tag_key=None)
A change to be applied to the search index.
The change can be of an entry, a feed, or a resource tag.
- class reader._types.Action(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Action to take.
- INSERT = 1
The resource needs to be added to the search index.
- DELETE = 2
The resource needs to be deleted from the search index.
- Entry._sequence: bytes | None = None
Change sequence.
May be
None
when change tracking is disabled.Unstable
This field is part of the unstable change tracking API.
- exception reader.exceptions.ChangeTrackingNotEnabledError(message='')
Bases:
StorageError
A change tracking method was called when change tracking was not enabled.
Unstable
This exception is part of the unstable change tracking API.
Data objects
- class reader._types.FeedFilter(feed_url=None, tags=(), broken=None, updates_enabled=None, new=None)
Options for filtering the results feed list operations.
See the
Reader.get_feeds()
docstring for detailed semantics.- tags: TagFilter
Alias for field number 1
- class reader._types.EntryFilter(feed_url=None, entry_id=None, read=None, important='any', has_enclosures=None, tags=(), feed_tags=())
Options for filtering the results entry list operations.
See the
Reader.get_entries()
docstring for detailed semantics.- important: TristateFilter
Alias for field number 3
- tags: TagFilter
Alias for field number 5
- feed_tags: TagFilter
Alias for field number 6
- class reader._types.FeedUpdateIntent(url, last_updated, feed=None, http_etag=None, http_last_modified=None, last_exception=None)
Data to be passed to Storage when updating a feed.
- http_etag: str | None
The feed’s
ETag
header; seeParsedFeed.http_etag
for details.Unstable
http_etag
andhttp_last_modified
may be grouped in a single attribute in the future.
- http_last_modified: str | None
The feed’s
Last-Modified
header; seeParsedFeed.http_last_modified
for details.
- last_exception: ExceptionInfo | None
Cause of
UpdateError
, if any; if set, everything else excepturl
should beNone
.
- class reader._types.EntryUpdateIntent(entry, last_updated, first_updated, first_updated_epoch, recent_sort, feed_order=0, hash_changed=0, added_by='feed')
Data to be passed to Storage when updating a feed.
- last_updated: datetime
The time at the start of updating the feed (start of
update_feed()
inupdate_feed()
, start of each feed update inupdate_feeds()
).
- first_updated: datetime | None
First
last_updated
(setsEntry.added
).None
if the entry already exists.
- first_updated_epoch: datetime | None
The time at the start of updating this batch of feeds (start of
update_feed()
inupdate_feed()
, start ofupdate_feeds()
inupdate_feeds()
).None
if the entry already exists.
- recent_sort: datetime | None
Sort key for the
get_entries()
recent
sort order.
- hash_changed: int | None
Same as
EntryForUpdate.hash_changed
.
- added_by: Literal['feed', 'user']
Same as
Entry.added_by
.
Type aliases
- reader._types.TagFilter
Like the
tags
argument ofReader.get_feeds()
, except:only the full mutiple-tags-with-disjunction form is used
tags are represented as (is negated, tag name) tuples (the
-
prefix is stripped)
Assuming a
tag_filter_argument()
function that convertsget_feeds()
tags toTagFilter
:>>> tag_filter_argument(['one']) [[(False, 'one')]] >>> tag_filter_argument(['one', 'two']) [[(False, 'one')], [(False, 'two')]] >>> tag_filter_argument([['one', 'two']]) [[(False, 'one'), (False, 'two')]] >>> tag_filter_argument(['one', '-two']) [[(False, 'one')], [(True, 'two')]] >>> tag_filter_argument(True) [[True]]
- reader._types.TristateFilter
Like
TristateFilterInput
, but without bool/None aliases.alias of
Literal
[‘istrue’, ‘isfalse’, ‘notset’, ‘nottrue’, ‘notfalse’, ‘isset’, ‘any’]
Recipes
Parsing a feed retrieved with something other than reader
Example of using the reader internal API to parse a feed retrieved asynchronously with HTTPX:
$ python examples/parser_only.py
death and gravity
Has your password been pwned? Or, how I almost failed to search a 37 GB text file in under 1 millisecond (in Python)
import asyncio
import io
import httpx
from reader._parser import default_parser
from werkzeug.http import parse_options_header
url = "https://death.andgravity.com/_feed/index.xml"
meta_parser = default_parser()
async def main():
async with httpx.AsyncClient() as client:
response = await client.get(url)
# to select the parser, we need the MIME type of the response
content_type = response.headers.get('content-type')
if content_type:
mime_type, _ = parse_options_header(content_type)
else:
mime_type = None
# select the parser (raises ParseError if none found)
parser, _ = meta_parser.get_parser(url, mime_type)
# wrap the content in a readable binary file
file = io.BytesIO(response.content)
# parse the feed; not doing parser(url, file, response.headers) directly
# because parsing is CPU-intensive and would block the event loop
feed, entries = await asyncio.to_thread(parser, url, file, response.headers)
print(feed.title)
print(entries[0].title)
if __name__ == '__main__':
asyncio.run(main())