Scraper Code

Scraper Code

Scrapers are reusable custom extractors that you can generate once and run on multiple URLs. This API is ideal for scenarios where you need to extract the same type of data from multiple pages or websites.

How It Works

  1. Create a new empty scraper to get a scraper ID
  2. Generate a scraper by defining the data fields you want to extract
  3. Run the scraper on specific URLs whenever you need
  4. List all your available scrapers
  5. Reuse the same scraper across different URLs

List Scrapers

List all available scrapers for your account:

Endpoint: GET /v1/scrapers

curl https://api.parsera.org/v1/scrapers \
--header 'X-API-KEY: <YOUR_API_KEY>'

Response:

Returns a list of scraper objects with id and name fields:

[
    {
        "id": "scraper-id-123",
        "name": "hackernews"
    },
    {
        "id": "scraper-id-456",
        "name": "linkedin-jobs"
    }
]

Create New Scraper

Create a new empty scraper for your account. This endpoint returns a scraper_id that you can use with the /v1/scrapers/generate endpoint to generate scraping code.

Endpoint: POST /v1/scrapers/new

curl https://api.parsera.org/v1/scrapers/new \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request POST

Response:

Returns a scraper ID that can be used to generate the scraper:

{
    "scraper_id": "hackernews"
}

Note: After creating a new scraper, use the returned scraper_id with the /v1/scrapers/generate endpoint to generate the scraping code.

Generate Code

Generate a new scraper by providing a scraper ID, sample URL or content, and the attributes you want to extract:

Endpoint: POST /v1/scrapers/generate

curl https://api.parsera.org/v1/scrapers/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ]
}'

Parameters:

ParameterTypeDefaultDescription
scraper_idstring-Unique identifier for your scraper
urlstringnullSample URL to generate the scraper from
contentstringnullRaw HTML or text content to generate the scraper from (alternative to url)
promptstring""Additional prompt for scraper generation
attributesarray[]A list of attribute objects with name and description fields to extract. You can also specify Output Types
proxy_countrystringnullProxy country for the sample request, see Proxy Countries
cookiesarraynullCookies to use during generation, see Cookies

Note: You must provide either url or content, but not both.

Run Code

Run an existing scraper on one or multiple URLs:

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Run on multiple URLs:

You can also run a scraper on multiple URLs at once (up to 100 URLs):

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

ParameterTypeDefaultDescription
scraper_idstring-ID of the scraper to run
urlstring or arraynullURL or list of URLs to scrape (max 100 URLs)
proxy_countrystringnullProxy country, see Proxy Countries
cookiesarraynullCookies to use during extraction, see Cookies

Delete Scraper

Delete an existing scraper by its ID:

Endpoint: DELETE /v1/scrapers/{scraper_id}

curl https://api.parsera.org/v1/scrapers/hackernews \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--request DELETE

Parameters:

ParameterTypeDescription
scraper_idstringID of the scraper to delete (path parameter)

Response:

Returns a success message on successful deletion:

{
    "message": "Scraper deleted successfully."
}

Note: Old scrapers (prefixed with scraper:) cannot be deleted via this API. Only scrapers created through the /v1/scrapers/new endpoint can be deleted.

Migration from Agents API

If you're using the older Agents API (agents.parsera.org), please refer to the Agents API (Deprecated) documentation for migration guidance.

More Features

Enhance your scraper with additional features:

Parsera Parsera on