ParseraParsera

Extractor

LLM-Powered data extractor for one-time data extraction, unstructured data, and classic code-based scrapers.

extract

Extract structured data from a URL:

Endpoint: POST /v1/extractor/extract (alias: POST /v1/extract)

curl https://api.parsera.org/v1/extractor/extract \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "prompt": "Extract news metadata",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ],
    "proxy_country": "Germany"
}'

Parameters:

Minimal payload requires url and either prompt or attributes.

ParameterTypeDefaultDescription
urlstring-URL of the webpage to extract data from
promptstring""Prompt for initial scraping
attributesarray[]A list of attribute objects with name and description fields to extract from the webpage. Also, you can specify Output Types
modestringstandardMode of the extractor: standard or precision
proxy_countrystringUnitedStatesProxy country, see Proxy Countries
cookiesarrayEmptyCookies to use during extraction, see Cookies

It's recommended to set the proxy_country parameter to a specific country since a page could be unavailable from some locations.

If some data is missing, you can retry with precision mode. See Precision Mode for details.

parse

Parse data from HTML or text content you already have, instead of fetching from a URL:

Endpoint: POST /v1/extractor/parse (alias: POST /v1/parse)

curl https://api.parsera.org/v1/extractor/parse \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "content": <HTML_OR_TEXT_HERE>,
    "prompt": "Extract news metadata",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ]
}'

Parameters:

ParameterTypeDefaultDescription
contentstring-Raw HTML or text content to extract data from
promptstring""Prompt for initial scraping
attributesarray-A list of attribute objects with name and description fields to extract from the webpage. Also, you can specify Output Types
modestringstandardMode of the extractor: standard or precision

extract_markdown

Get clean markdown from a URL:

Endpoint: POST /v1/extractor/extract_markdown (alias: POST /v1/extract_markdown)

curl https://api.parsera.org/v1/extractor/extract_markdown \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "proxy_country": "UnitedStates"
}'

Parameters:

ParameterTypeDefaultDescription
urlstring-URL of the webpage to extract data from
proxy_countrystringUnitedStatesProxy country, see Proxy Countries
cookiesarrayEmptyCookies to use during extraction, see Cookies

Manage Scrapers

The Extractor API also lets you create and manage classic code-based scrapers. See Manage Scrapers for the full CRUD reference (create, generate, get, delete).

More Features