ParseraParsera

Scrapers

The Scrapers API is the unified execution layer for running both classic (extractor) and agentic scrapers.

To create and manage scrapers, see:

Run Asynchronous

The recommended way to run scrapers. Returns immediately with a run_id that you can poll for results or receive via callback.

This is the required method for agentic scrapers, and the recommended method for classic scrapers when processing multiple URLs or long-running tasks.

Endpoint: POST /v1/scrapers/run_async

curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": "https://news.ycombinator.com/",
    "callback_url": "https://your-server.com/webhook"
}'

Run on multiple URLs:

curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

ParameterTypeDefaultDescription
template_idstring-ID of the scraper to run
urlstring or arraynullURL or list of URLs to scrape (max 100). Falls back to the scraper's default URL if omitted
max_pagesintegernullMaximum pages to extract (agentic scrapers only)
callback_urlstringnullURL to POST results to when the run completes
proxy_countrystringnullProxy country, see Proxy Countries. Ignored for agentic scrapers
cookiesarraynullCookies to use, see Cookies. Ignored for agentic scrapers

Response (202 Accepted):

{
    "run_id": "run-abc123",
    "status": "queued"
}

Poll Run Status

Poll the status of an asynchronous run and retrieve results when complete:

Endpoint: GET /v1/scrapers/run_async/{run_id}

curl https://api.parsera.org/v1/scrapers/run_async/run-abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response (in progress):

{
    "run_id": "run-abc123",
    "status": "running",
    "completed": 2,
    "total": 5
}

Response (completed):

{
    "run_id": "run-abc123",
    "status": "completed",
    "completed": 5,
    "total": 5,
    "credits_charged": 25,
    "data": {
        "https://example.com/page1": [
            {"title": "Example", "price": "$10"}
        ],
        "https://example.com/page2": [
            {"title": "Another", "price": "$20"}
        ]
    }
}

Run statuses:

StatusDescription
runningRun is in progress
completedAll URLs processed successfully
completed_partialSome URLs completed, others failed
failedRun failed

Run Synchronous

Run a scraper and wait for results in a single request. Best suited for quick, single-URL runs.

If the run doesn't complete within 5 minutes, the response will be 202 Accepted with a run_id and status: "running" — you can then poll via GET /v1/scrapers/run_async/{run_id}. For agentic scrapers or long-running tasks, use run_async directly instead.

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Parameters:

ParameterTypeDefaultDescription
template_idstring-ID of the scraper to run
urlstring or arraynullURL or list of URLs to scrape (max 100). Falls back to the scraper's default URL if omitted
proxy_countrystringnullProxy country, see Proxy Countries. Ignored for agentic scrapers
cookiesarraynullCookies to use, see Cookies. Ignored for agentic scrapers

List Scrapers

List all scrapers for your account, including both classic and agentic types:

Endpoint: GET /v1/scrapers

curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response:

[
    {
        "id": "abc123",
        "name": "hackernews",
        "type": "extractor",
        "status": "ready",
        "url": "https://news.ycombinator.com/",
        "created_at": "2025-01-15T10:30:00Z"
    },
    {
        "id": "def456",
        "name": "linkedin-jobs",
        "type": "agentic",
        "status": "ready",
        "url": "https://linkedin.com/jobs",
        "created_at": "2025-01-20T14:00:00Z"
    }
]

Response fields:

FieldTypeDescription
idstringScraper ID
namestringScraper name
typestringextractor (classic) or agentic
statusstringready, generating, failed, or null
urlstringPrimary URL the scraper was built for
created_atstringISO timestamp of creation

More Features