Scrapers
The Scrapers API is the unified execution layer for running both extractor and agentic scrapers.
To create and manage scrapers, see:
- Extractor API — extract from specific URLs;
- Agent API — navigate and interact before extraction.
Run Asynchronous
The recommended way to run scrapers. Returns immediately with a run_id that you can poll for results or receive via callback.
This is the required method for agentic scrapers, and the recommended method for extractor scrapers when processing multiple URLs or long-running tasks.
Endpoint: POST /v1/scrapers/run_async
curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": "https://news.ycombinator.com/",
"callback_url": "https://your-server.com/webhook"
}'Run on multiple URLs (extractor scrapers only):
curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": [
"https://news.ycombinator.com/front?day=2024-09-11",
"https://news.ycombinator.com/front?day=2024-09-12",
"https://news.ycombinator.com/front?day=2024-09-13"
]
}'Multi-URL fan-out (one
run_idaggregating results across N URLs) is supported only for extractor scrapers. Agentic scrapers run one URL per request — to scrape multiple URLs, call the endpoint once per URL.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | - | ID of the scraper to run |
url | string or array | null | URL to scrape. Extractor scrapers accept an array of up to 100 URLs and fan out to one child run per URL. Agentic scrapers accept a single URL only. Falls back to the scraper's default URL if omitted |
parameters | object | null | Custom parameters for agentic scrapers that define a parameters schema. See Custom Parameters |
max_pages | integer | null | Maximum pages to extract (agentic scrapers only) |
callback_url | string | null | URL to POST results to when the run completes |
proxy_country | string | null | Proxy country, see Proxy Countries. Ignored for agentic scrapers |
cookies | array | null | Cookies to use, see Cookies. Ignored for agentic scrapers |
Response (202 Accepted):
{
"run_id": "run-abc123",
"status": "queued"
}Poll Run Status
Poll the status of an asynchronous run and retrieve results when complete:
Endpoint: GET /v1/scrapers/run_async/{run_id}
curl https://api.parsera.org/v1/scrapers/run_async/run-abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'Response (in progress):
{
"run_id": "run-abc123",
"status": "running",
"completed": 2,
"total": 5
}Response (completed):
{
"run_id": "run-abc123",
"status": "completed",
"completed": 5,
"total": 5,
"credits_charged": 25,
"data": {
"https://example.com/page1": [
{"title": "Example", "price": "$10"}
],
"https://example.com/page2": [
{"title": "Another", "price": "$20"}
]
}
}Run statuses:
| Status | Description |
|---|---|
running | Run is in progress |
completed | All URLs processed successfully |
completed_partial | Some URLs completed, others failed |
failed | Run failed |
Run Synchronous
Run a scraper and wait for results in a single request. Best suited for quick, single-URL runs.
If the run doesn't complete within 5 minutes, the response will be
202 Acceptedwith arun_idandstatus: "running"— you can then poll viaGET /v1/scrapers/run_async/{run_id}. For agentic scrapers or long-running tasks, userun_asyncdirectly instead.
Endpoint: POST /v1/scrapers/run
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": "https://news.ycombinator.com/front?day=2024-09-11"
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | - | ID of the scraper to run |
url | string or array | null | URL to scrape. Extractor scrapers accept an array of up to 100 URLs. Agentic scrapers accept a single URL only. Falls back to the scraper's default URL if omitted |
parameters | object | null | Custom parameters for agentic scrapers that define a parameters schema. See Custom Parameters |
proxy_country | string | null | Proxy country, see Proxy Countries. Ignored for agentic scrapers |
cookies | array | null | Cookies to use, see Cookies. Ignored for agentic scrapers |
Custom Parameters
Agentic scrapers built with the Agent API can declare a parameters schema — a set of named inputs the agent expects at run time. This lets a single scraper handle dynamic inputs like a booking number, carrier code, search query, or target URL, instead of being bound to one hard-coded value.
When an agent build produces a parameters schema, it appears in the parameters_schema field of the scraper's detail response — see Get Scraper Details. Each entry has a name, type, description, and optional default. Fields without a default are required.
Pass values via the parameters object on POST /v1/scrapers/run or POST /v1/scrapers/run_async:
curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"parameters": {
"url": "https://shop.example.com/p/1",
"query": "example value"
}
}'Notes:
parametersis only used for scrapers that define aparameters_schema. Extractor scrapers and agentic scrapers without a schema ignore it.- If the schema exists and
parametersis missing, the request fails with400 Bad Requestand"parameters is required for this scraper." - If any required fields are missing, the request fails with
400 Bad Requestand"Missing required parameters: <names>". - Each agentic run handles one URL per request. To scrape multiple URLs, call the endpoint once per URL. Multi-URL fan-out via
url: [...]is supported only by extractor scrapers. - The top-level
urlfield is still accepted for back-compat — when the scraper declares a URL-typed parameter, a top-levelurlis automatically promoted intoparameters.
List Scrapers
List all scrapers for your account, including both extractor and agentic types:
Endpoint: GET /v1/scrapers
curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'Response:
[
{
"id": "abc123",
"name": "hackernews",
"type": "extractor",
"status": "ready",
"url": "https://news.ycombinator.com/",
"created_at": "2025-01-15T10:30:00Z"
},
{
"id": "def456",
"name": "linkedin-jobs",
"type": "agentic",
"status": "ready",
"url": "https://linkedin.com/jobs",
"created_at": "2025-01-20T14:00:00Z"
}
]Response fields:
| Field | Type | Description |
|---|---|---|
id | string | Scraper ID |
name | string | Scraper name |
type | string | extractor or agentic |
status | string | ready, generating, failed, or null |
url | string | Primary URL the scraper was built for |
created_at | string | ISO timestamp of creation |
More Features
- Extractor API — Create and manage extractor scrapers
- Agent API — Create and manage AI-powered agentic scrapers
- Specify Output Types — Define data types for extracted fields
- Setting Proxy — Access content from different geographic locations
- Setting Cookies — Handle authentication and session cookies
