ParseraParsera

Extractor

LLM-Powered data extractor for one-time data extraction, unstructured data, and extractor scrapers.

extract

Extract structured data from a URL:

Endpoint: POST /v1/extractor/extract (alias: POST /v1/extract)

curl https://api.parsera.org/v1/extractor/extract \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "prompt": "Extract news metadata",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ],
    "proxy_country": "Germany"
}'

Parameters:

Minimal payload requires url and either prompt or attributes.

ParameterTypeDefaultDescription
urlstring-URL of the webpage to extract data from
promptstring""Prompt for initial scraping
attributesarray[]A list of attribute objects with name and description fields to extract from the webpage. Also, you can specify Output Types
modestringstandardMode of the extractor: standard
proxy_countrystringUnitedStatesProxy country, see Proxy Countries
cookiesarrayEmptyCookies to use during extraction, see Cookies

It's recommended to set the proxy_country parameter to a specific country since a page could be unavailable from some locations.

parse

Parse data from HTML or text content you already have, instead of fetching from a URL:

Endpoint: POST /v1/extractor/parse (alias: POST /v1/parse)

curl https://api.parsera.org/v1/extractor/parse \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "content": <HTML_OR_TEXT_HERE>,
    "prompt": "Extract news metadata",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ]
}'

Parameters:

ParameterTypeDefaultDescription
contentstring-Raw HTML or text content to extract data from
promptstring""Prompt for initial scraping
attributesarray-A list of attribute objects with name and description fields to extract from the webpage. Also, you can specify Output Types
modestringstandardMode of the extractor: standard

extract_markdown

Get clean markdown from a URL:

Endpoint: POST /v1/extractor/extract_markdown (alias: POST /v1/extract_markdown)

curl https://api.parsera.org/v1/extractor/extract_markdown \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "proxy_country": "UnitedStates"
}'

Parameters:

ParameterTypeDefaultDescription
urlstring-URL of the webpage to extract data from
proxy_countrystringUnitedStatesProxy country, see Proxy Countries
cookiesarrayEmptyCookies to use during extraction, see Cookies

Manage Scrapers

The Extractor API also lets you create and manage extractor scrapers. See Manage Scrapers for the full CRUD reference (create, generate, get, delete).

More Features