Scrapers
The Scrapers API is the unified execution layer for running both classic (extractor) and agentic scrapers.
To create and manage scrapers, see:
- Extractor API — classic code-based scrapers
- Agent API — AI-powered agentic scrapers
Run Asynchronous
The recommended way to run scrapers. Returns immediately with a run_id that you can poll for results or receive via callback.
This is the required method for agentic scrapers, and the recommended method for classic scrapers when processing multiple URLs or long-running tasks.
Endpoint: POST /v1/scrapers/run_async
curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": "https://news.ycombinator.com/",
"callback_url": "https://your-server.com/webhook"
}'Run on multiple URLs:
curl https://api.parsera.org/v1/scrapers/run_async \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": [
"https://news.ycombinator.com/front?day=2024-09-11",
"https://news.ycombinator.com/front?day=2024-09-12",
"https://news.ycombinator.com/front?day=2024-09-13"
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | - | ID of the scraper to run |
url | string or array | null | URL or list of URLs to scrape (max 100). Falls back to the scraper's default URL if omitted |
max_pages | integer | null | Maximum pages to extract (agentic scrapers only) |
callback_url | string | null | URL to POST results to when the run completes |
proxy_country | string | null | Proxy country, see Proxy Countries. Ignored for agentic scrapers |
cookies | array | null | Cookies to use, see Cookies. Ignored for agentic scrapers |
Response (202 Accepted):
{
"run_id": "run-abc123",
"status": "queued"
}Poll Run Status
Poll the status of an asynchronous run and retrieve results when complete:
Endpoint: GET /v1/scrapers/run_async/{run_id}
curl https://api.parsera.org/v1/scrapers/run_async/run-abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'Response (in progress):
{
"run_id": "run-abc123",
"status": "running",
"completed": 2,
"total": 5
}Response (completed):
{
"run_id": "run-abc123",
"status": "completed",
"completed": 5,
"total": 5,
"credits_charged": 25,
"data": {
"https://example.com/page1": [
{"title": "Example", "price": "$10"}
],
"https://example.com/page2": [
{"title": "Another", "price": "$20"}
]
}
}Run statuses:
| Status | Description |
|---|---|
running | Run is in progress |
completed | All URLs processed successfully |
completed_partial | Some URLs completed, others failed |
failed | Run failed |
Run Synchronous
Run a scraper and wait for results in a single request. Best suited for quick, single-URL runs.
If the run doesn't complete within 5 minutes, the response will be
202 Acceptedwith arun_idandstatus: "running"— you can then poll viaGET /v1/scrapers/run_async/{run_id}. For agentic scrapers or long-running tasks, userun_asyncdirectly instead.
Endpoint: POST /v1/scrapers/run
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": "https://news.ycombinator.com/front?day=2024-09-11"
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | - | ID of the scraper to run |
url | string or array | null | URL or list of URLs to scrape (max 100). Falls back to the scraper's default URL if omitted |
proxy_country | string | null | Proxy country, see Proxy Countries. Ignored for agentic scrapers |
cookies | array | null | Cookies to use, see Cookies. Ignored for agentic scrapers |
List Scrapers
List all scrapers for your account, including both classic and agentic types:
Endpoint: GET /v1/scrapers
curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'Response:
[
{
"id": "abc123",
"name": "hackernews",
"type": "extractor",
"status": "ready",
"url": "https://news.ycombinator.com/",
"created_at": "2025-01-15T10:30:00Z"
},
{
"id": "def456",
"name": "linkedin-jobs",
"type": "agentic",
"status": "ready",
"url": "https://linkedin.com/jobs",
"created_at": "2025-01-20T14:00:00Z"
}
]Response fields:
| Field | Type | Description |
|---|---|---|
id | string | Scraper ID |
name | string | Scraper name |
type | string | extractor (classic) or agentic |
status | string | ready, generating, failed, or null |
url | string | Primary URL the scraper was built for |
created_at | string | ISO timestamp of creation |
More Features
- Extractor API — Create and manage classic code-based scrapers
- Agent API — Create and manage AI-powered agentic scrapers
- Specify Output Types — Define data types for extracted fields
- Setting Proxy — Access content from different geographic locations
- Setting Cookies — Handle authentication and session cookies
