Scrapers

Scrapers

Scrapers are reusable extractors that you can create, manage, and run on multiple URLs. This API is ideal for scenarios where you need to extract the same type of data from multiple pages or websites.

How It Works

  1. Create a new empty scraper to get a scraper ID
  2. Generate code for your scraper (see Code Generation)
  3. Run the scraper on specific URLs whenever you need
  4. List all your available scrapers
  5. Delete scrapers you no longer need

List Scrapers

List all available scrapers for your account:

Endpoint: GET /v1/scrapers

curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response:

Returns a list of scraper objects with id and name fields:

[
    {
        "id": "scraper-id-123",
        "name": "hackernews"
    },
    {
        "id": "scraper-id-456",
        "name": "linkedin-jobs"
    }
]

Create New Scraper

Create a new empty scraper for your account. This endpoint returns a scraper_id that you can use with the /v1/scrapers/generate endpoint to generate scraping code.

Endpoint: POST /v1/scrapers/new

curl https://api.parsera.org/v1/scrapers/new \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request POST

Response:

Returns a scraper ID that can be used to generate the scraper:

{
    "scraper_id": "hackernews"
}

Note: After creating a new scraper, use the returned scraper_id to generate code for your scraper. See the Code Mode documentation for details.

Run Scraper

Run an existing scraper on one or multiple URLs:

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Run on multiple URLs:

You can also run a scraper on multiple URLs at once (up to 100 URLs):

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

ParameterTypeDefaultDescription
scraper_idstring-ID of the scraper to run
urlstring or arraynullURL or list of URLs to scrape (max 100 URLs)
proxy_countrystringnullProxy country, see Proxy Countries
cookiesarraynullCookies to use during extraction, see Cookies

Delete Scraper

Delete an existing scraper by its ID:

Endpoint: DELETE /v1/scrapers/{scraper_id}

curl https://api.parsera.org/v1/scrapers/hackernews \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request DELETE

Parameters:

ParameterTypeDescription
scraper_idstringID of the scraper to delete (path parameter)

Response:

Returns a success message on successful deletion:

{
    "message": "Scraper deleted successfully."
}

Note: Old scrapers (prefixed with scraper:) cannot be deleted via this API. Only scrapers created through the /v1/scrapers/new endpoint can be deleted.

Code Mode

To generate scraping code and run it in code mode, see the Code Mode documentation.

Migration from Agents API

If you're using the older Agents API (agents.parsera.org), please refer to the Agents API (Deprecated) documentation for migration guidance.

More Features

Enhance your scraper with additional features:

Parsera Parsera on