Intro
zyte-parsers
provides functions that extract specific data from HTML
elements. The input element can be an instance of either
parsel.selector.Selector
or lxml.html.HtmlElement
. Some
functions can also take a string with text (e.g. extracted from HTML or JSON)
as input.
- zyte_parsers.SelectorOrElement
alias of
Union
[Selector
,HtmlElement
,HtmlComment
]
Parsers
Brand
- zyte_parsers.extract_brand_name(node: Selector | HtmlElement | HtmlComment, search_depth: int = 0) str | None [source]
Extract a brand name from a node that contains it.
It tries element text and image alt and title attributes.
- Parameters:
node – Node including the brand name.
search_depth – Max depth for searching images.
- Returns:
The brand name or None.
GTIN
- zyte_parsers.extract_gtin(node: Selector | HtmlElement | HtmlComment | str) Gtin | None [source]
Extract a GTIN (Global Trade Item Number) from a node or a string that contains its text.
It detects the GTIN type and returns it together with the cleaned GTIN value. The following types are supported: isbn10, isbn13, issn, ismn, upc, gtin8, gtin13, gtin14.
- Parameters:
node – A node or a string that includes the GTIN text.
- Returns:
A GTIN item.
Price
- zyte_parsers.extract_price(node: Selector | HtmlElement | HtmlComment | str, *, currency_hint: Selector | HtmlElement | HtmlComment | str | None = None) Price [source]
Extract a price value from a node or a string that contains it.
- Parameters:
node – A node or a string that includes the price text.
currency_hint – A string or a node that can contain currency. It will be passed as a hint to
price-parser
. If currency is present in the price string, it could be preferred over the value extracted fromcurrency_hint
.
- Returns:
The price value as a
price_parser.Price
object.
Ratings and review count
- class zyte_parsers.AggregateRating(bestRating: float | None = None, ratingValue: float | None = None)[source]
- zyte_parsers.extract_rating(node: Selector | HtmlElement | HtmlComment) AggregateRating [source]
Extract rating data from a node.
- Parameters:
node – Node that includes the rating data.
- Returns:
AggregateRating item.
- zyte_parsers.extract_rating_stars(node: Selector | HtmlElement | HtmlComment) float | None [source]
Extract a rating value from a node containing rating stars.
- Parameters:
node – Node that includes the rating stars.
- Returns:
Rating value as a float or None.
- zyte_parsers.extract_review_count(node: Selector | HtmlElement | HtmlComment) int | None [source]
Extract review count from a node containing it.
- Parameters:
node – Node that includes the review count.
- Returns:
Review count as an int or None.