Scrapers

Scrapers are reusable extractors that you can create, manage, and run on multiple URLs. This API is ideal for scenarios where you need to extract the same type of data from multiple pages or websites.

How It Works

Create a new empty scraper to get a scraper ID
Generate code for your scraper (see Code Generation)
Run the scraper on specific URLs whenever you need
List all your available scrapers
Delete scrapers you no longer need

List Scrapers

List all available scrapers for your account:

Endpoint: GET /v1/scrapers

curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response:

Returns a list of scraper objects with id and name fields:

[
    {
        "id": "scraper-id-123",
        "name": "hackernews"
    },
    {
        "id": "scraper-id-456",
        "name": "linkedin-jobs"
    }
]

Create New Scraper

Create a new empty scraper for your account. This endpoint returns a scraper_id that you can use with the /v1/scrapers/generate endpoint to generate scraping code.

Endpoint: POST /v1/scrapers/new

curl https://api.parsera.org/v1/scrapers/new \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request POST

Response:

Returns a scraper ID that can be used to generate the scraper:

{
    "scraper_id": "hackernews"
}

Note: After creating a new scraper, use the returned scraper_id to generate code for your scraper. See the Code Mode documentation for details.

Run Scraper

Run an existing scraper on one or multiple URLs:

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Run on multiple URLs:

You can also run a scraper on multiple URLs at once (up to 100 URLs):

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

Parameter	Type	Default	Description
`scraper_id`	`string`	-	ID of the scraper to run
`url`	`string` or `array`	`null`	URL or list of URLs to scrape (max 100 URLs)
`proxy_country`	`string`	`null`	Proxy country, see Proxy Countries
`cookies`	`array`	`null`	Cookies to use during extraction, see Cookies

Delete Scraper

Delete an existing scraper by its ID:

Endpoint: DELETE /v1/scrapers/{scraper_id}

curl https://api.parsera.org/v1/scrapers/hackernews \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request DELETE

Parameters:

Parameter	Type	Description
`scraper_id`	`string`	ID of the scraper to delete (path parameter)

Response:

Returns a success message on successful deletion:

{
    "message": "Scraper deleted successfully."
}

Note: Old scrapers (prefixed with scraper:) cannot be deleted via this API. Only scrapers created through the /v1/scrapers/new endpoint can be deleted.

Code Mode

To generate scraping code and run it in code mode, see the Code Mode documentation.

Migration from Agents API

If you're using the older Agents API (agents.parsera.org), please refer to the Agents API (Deprecated) documentation for migration guidance.

More Features

Enhance your scraper with additional features:

Code Mode - Generate and run custom scraper code
Specify Output Types - Define data types for extracted fields
Setting Proxy - Access content from different geographic locations
Setting Cookies - Handle authentication and session cookies