Scraper Code
Scrapers are reusable custom extractors that you can generate once and run on multiple URLs. This API is ideal for scenarios where you need to extract the same type of data from multiple pages or websites.
How It Works
- Create a new empty scraper to get a scraper ID
- Generate a scraper by defining the data fields you want to extract
- Run the scraper on specific URLs whenever you need
- List all your available scrapers
- Reuse the same scraper across different URLs
List Scrapers
List all available scrapers for your account:
Endpoint: GET /v1/scrapers
curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'Response:
Returns a list of scraper objects with id and name fields:
[
{
"id": "scraper-id-123",
"name": "hackernews"
},
{
"id": "scraper-id-456",
"name": "linkedin-jobs"
}
]Create New Scraper
Create a new empty scraper for your account. This endpoint returns a scraper_id that you can use with the /v1/scrapers/generate endpoint to generate scraping code.
Endpoint: POST /v1/scrapers/new
curl https://api.parsera.org/v1/scrapers/new \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request POSTResponse:
Returns a scraper ID that can be used to generate the scraper:
{
"scraper_id": "hackernews"
}Note: After creating a new scraper, use the returned scraper_id with the /v1/scrapers/generate endpoint to generate the scraping code.
Generate Code
Generate a new scraper by providing a scraper ID, sample URL or content, and the attributes you want to extract:
Endpoint: POST /v1/scrapers/generate
curl https://api.parsera.org/v1/scrapers/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": "https://news.ycombinator.com/",
"attributes": [
{
"name": "title",
"description": "News title"
},
{
"name": "points",
"description": "Number of points"
}
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
scraper_id | string | - | Unique identifier for your scraper |
url | string | null | Sample URL to generate the scraper from |
content | string | null | Raw HTML or text content to generate the scraper from (alternative to url) |
prompt | string | "" | Additional prompt for scraper generation |
attributes | array | [] | A list of attribute objects with name and description fields to extract. You can also specify Output Types |
proxy_country | string | null | Proxy country for the sample request, see Proxy Countries |
cookies | array | null | Cookies to use during generation, see Cookies |
Note: You must provide either url or content, but not both.
Run Code
Run an existing scraper on one or multiple URLs:
Endpoint: POST /v1/scrapers/run
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": "https://news.ycombinator.com/front?day=2024-09-11"
}'Run on multiple URLs:
You can also run a scraper on multiple URLs at once (up to 100 URLs):
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": [
"https://news.ycombinator.com/front?day=2024-09-11",
"https://news.ycombinator.com/front?day=2024-09-12",
"https://news.ycombinator.com/front?day=2024-09-13"
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
scraper_id | string | - | ID of the scraper to run |
url | string or array | null | URL or list of URLs to scrape (max 100 URLs) |
proxy_country | string | null | Proxy country, see Proxy Countries |
cookies | array | null | Cookies to use during extraction, see Cookies |
Delete Scraper
Delete an existing scraper by its ID:
Endpoint: DELETE /v1/scrapers/{scraper_id}
curl https://api.parsera.org/v1/scrapers/hackernews \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request DELETEParameters:
| Parameter | Type | Description |
|---|---|---|
scraper_id | string | ID of the scraper to delete (path parameter) |
Response:
Returns a success message on successful deletion:
{
"message": "Scraper deleted successfully."
}Note: Old scrapers (prefixed with scraper:) cannot be deleted via this API. Only scrapers created through the /v1/scrapers/new endpoint can be deleted.
Migration from Agents API
If you're using the older Agents API (agents.parsera.org), please refer to the Agents API (Deprecated) documentation for migration guidance.
More Features
Enhance your scraper with additional features:
- Specify Output Types - Define data types for extracted fields
- Setting Proxy - Access content from different geographic locations
- Setting Cookies - Handle authentication and session cookies