ParseraParsera

Scrapers

The Scrapers API is the unified execution layer for running both extractor and agentic scrapers.

To create and manage scrapers, see:

Run Asynchronous

The recommended way to run scrapers. Returns immediately with a run_id that you can poll for results or receive via callback.

This is the required method for agentic scrapers, and the recommended method for extractor scrapers when processing multiple URLs or long-running tasks.

Endpoint: POST /v1/scrapers/run_async

curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": "https://news.ycombinator.com/",
    "callback_url": "https://your-server.com/webhook"
}'

Run on multiple URLs (extractor scrapers only):

curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Multi-URL fan-out (one run_id aggregating results across N URLs) is supported only for extractor scrapers. Agentic scrapers run one URL per request — to scrape multiple URLs, call the endpoint once per URL.

Parameters:

ParameterTypeDefaultDescription
template_idstring-ID of the scraper to run
urlstring or arraynullURL to scrape. Extractor scrapers accept an array of up to 100 URLs and fan out to one child run per URL. Agentic scrapers accept a single URL only. Falls back to the scraper's default URL if omitted
parametersobjectnullCustom parameters for agentic scrapers that define a parameters schema. See Custom Parameters
max_pagesintegernullMaximum pages to extract (agentic scrapers only)
callback_urlstringnullURL to POST results to when the run completes
proxy_countrystringnullProxy country, see Proxy Countries. Ignored for agentic scrapers
cookiesarraynullCookies to use, see Cookies. Ignored for agentic scrapers

Response (202 Accepted):

{
    "run_id": "run-abc123",
    "status": "queued"
}

Poll Run Status

Poll the status of an asynchronous run and retrieve results when complete:

Endpoint: GET /v1/scrapers/run_async/{run_id}

curl https://api.parsera.org/v1/scrapers/run_async/run-abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response (in progress):

{
    "run_id": "run-abc123",
    "status": "running",
    "completed": 2,
    "total": 5
}

Response (completed):

{
    "run_id": "run-abc123",
    "status": "completed",
    "completed": 5,
    "total": 5,
    "credits_charged": 25,
    "data": {
        "https://example.com/page1": [
            {"title": "Example", "price": "$10"}
        ],
        "https://example.com/page2": [
            {"title": "Another", "price": "$20"}
        ]
    }
}

Run statuses:

StatusDescription
runningRun is in progress
completedAll URLs processed successfully
completed_partialSome URLs completed, others failed
failedRun failed

Run Synchronous

Run a scraper and wait for results in a single request. Best suited for quick, single-URL runs.

If the run doesn't complete within 5 minutes, the response will be 202 Accepted with a run_id and status: "running" — you can then poll via GET /v1/scrapers/run_async/{run_id}. For agentic scrapers or long-running tasks, use run_async directly instead.

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Parameters:

ParameterTypeDefaultDescription
template_idstring-ID of the scraper to run
urlstring or arraynullURL to scrape. Extractor scrapers accept an array of up to 100 URLs. Agentic scrapers accept a single URL only. Falls back to the scraper's default URL if omitted
parametersobjectnullCustom parameters for agentic scrapers that define a parameters schema. See Custom Parameters
proxy_countrystringnullProxy country, see Proxy Countries. Ignored for agentic scrapers
cookiesarraynullCookies to use, see Cookies. Ignored for agentic scrapers

Custom Parameters

Agentic scrapers built with the Agent API can declare a parameters schema — a set of named inputs the agent expects at run time. This lets a single scraper handle dynamic inputs like a booking number, carrier code, search query, or target URL, instead of being bound to one hard-coded value.

When an agent build produces a parameters schema, it appears in the parameters_schema field of the scraper's detail response — see Get Scraper Details. Each entry has a name, type, description, and optional default. Fields without a default are required.

Pass values via the parameters object on POST /v1/scrapers/run or POST /v1/scrapers/run_async:

curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "parameters": {
        "url": "https://shop.example.com/p/1",
        "query": "example value"
    }
}'

Notes:

  • parameters is only used for scrapers that define a parameters_schema. Extractor scrapers and agentic scrapers without a schema ignore it.
  • If the schema exists and parameters is missing, the request fails with 400 Bad Request and "parameters is required for this scraper."
  • If any required fields are missing, the request fails with 400 Bad Request and "Missing required parameters: <names>".
  • Each agentic run handles one URL per request. To scrape multiple URLs, call the endpoint once per URL. Multi-URL fan-out via url: [...] is supported only by extractor scrapers.
  • The top-level url field is still accepted for back-compat — when the scraper declares a URL-typed parameter, a top-level url is automatically promoted into parameters.

List Scrapers

List all scrapers for your account, including both extractor and agentic types:

Endpoint: GET /v1/scrapers

curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response:

[
    {
        "id": "abc123",
        "name": "hackernews",
        "type": "extractor",
        "status": "ready",
        "url": "https://news.ycombinator.com/",
        "created_at": "2025-01-15T10:30:00Z"
    },
    {
        "id": "def456",
        "name": "linkedin-jobs",
        "type": "agentic",
        "status": "ready",
        "url": "https://linkedin.com/jobs",
        "created_at": "2025-01-20T14:00:00Z"
    }
]

Response fields:

FieldTypeDescription
idstringScraper ID
namestringScraper name
typestringextractor or agentic
statusstringready, generating, failed, or null
urlstringPrimary URL the scraper was built for
created_atstringISO timestamp of creation

More Features