Intro

zyte-parsers provides functions that extract specific data from HTML elements. The input element can be an instance of either parsel.selector.Selector or lxml.html.HtmlElement. Some functions can also take a string with text (e.g. extracted from HTML or JSON) as input.

zyte_parsers.SelectorOrElement

alias of Union[Selector, HtmlElement, HtmlComment]

Parsers

Brand

zyte_parsers.extract_brand_name(node: Selector | HtmlElement | HtmlComment, search_depth: int = 0) str | None[source]

Extract a brand name from a node that contains it.

It tries element text and image alt and title attributes.

Parameters:
  • node – Node including the brand name.

  • search_depth – Max depth for searching images.

Returns:

The brand name or None.

GTIN

class zyte_parsers.Gtin(type: str, value: str)[source]
type: str
value: str
zyte_parsers.extract_gtin(node: Selector | HtmlElement | HtmlComment | str) Gtin | None[source]

Extract a GTIN (Global Trade Item Number) from a node or a string that contains its text.

It detects the GTIN type and returns it together with the cleaned GTIN value. The following types are supported: isbn10, isbn13, issn, ismn, upc, gtin8, gtin13, gtin14.

Parameters:

node – A node or a string that includes the GTIN text.

Returns:

A GTIN item.

Price

zyte_parsers.extract_price(node: Selector | HtmlElement | HtmlComment | str, *, currency_hint: Selector | HtmlElement | HtmlComment | str | None = None) Price[source]

Extract a price value from a node or a string that contains it.

Parameters:
  • node – A node or a string that includes the price text.

  • currency_hint – A string or a node that can contain currency. It will be passed as a hint to price-parser. If currency is present in the price string, it could be preferred over the value extracted from currency_hint.

Returns:

The price value as a price_parser.Price object.

Ratings and review count

class zyte_parsers.AggregateRating(bestRating: float | None = None, ratingValue: float | None = None)[source]
bestRating: float | None
ratingValue: float | None
zyte_parsers.extract_rating(node: Selector | HtmlElement | HtmlComment) AggregateRating[source]

Extract rating data from a node.

Parameters:

node – Node that includes the rating data.

Returns:

AggregateRating item.

zyte_parsers.extract_rating_stars(node: Selector | HtmlElement | HtmlComment) float | None[source]

Extract a rating value from a node containing rating stars.

Parameters:

node – Node that includes the rating stars.

Returns:

Rating value as a float or None.

zyte_parsers.extract_review_count(node: Selector | HtmlElement | HtmlComment) int | None[source]

Extract review count from a node containing it.

Parameters:

node – Node that includes the review count.

Returns:

Review count as an int or None.