Code Mode

Code Mode

Generate custom Python scraping code for your scrapers and run them in code mode. This approach provides maximum flexibility and control over the scraping process.

How It Works

  1. Create a scraper using the Scrapers API
  2. Generate Python code by defining the data fields you want to extract
  3. Switch the scraper to "code" mode in the UI
  4. Run the generated code on specific URLs

Note: In order to run scraper code, the scraper must be switched to code mode in the UI. Code mode allows you to execute custom Python extraction logic rather than using the standard LLM-based extraction.

Generate Code

Generate Python scraping code for a scraper by providing a scraper ID, sample URL or content, and the attributes you want to extract:

Endpoint: POST /v1/scrapers/generate

curl https://api.parsera.org/v1/scrapers/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ]
}'

Parameters:

ParameterTypeDefaultDescription
scraper_idstring-Unique identifier for your scraper
urlstringnullSample URL to generate the scraper from
contentstringnullRaw HTML or text content to generate the scraper from (alternative to url)
promptstring""Additional prompt for scraper generation
attributesarray[]A list of attribute objects with name and description fields to extract. You can also specify Output Types
proxy_countrystringnullProxy country for the sample request, see Proxy Countries
cookiesarraynullCookies to use during generation, see Cookies

Note: You must provide either url or content, but not both.

Response:

Returns a success message when code generation is complete:

{
    "message": "Code generated successfully for scraper hackernews"
}

After generating the code, remember to switch your scraper to "code" mode in the UI before running it.

Run Code

Run a scraper in code mode on one or multiple URLs:

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Run on multiple URLs:

You can run code on multiple URLs at once (up to 100 URLs):

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "scraper_id": "hackernews",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

ParameterTypeDefaultDescription
scraper_idstring-ID of the scraper to run (must be in code mode)
urlstring or arraynullURL or list of URLs to scrape (max 100 URLs)
proxy_countrystringnullProxy country, see Proxy Countries
cookiesarraynullCookies to use during extraction, see Cookies

Important: The scraper must be in "code" mode for this to work. You can switch modes in the Parsera UI.

Benefits of Code Mode

Running scrapers in code mode offers several advantages:

  • Performance - Faster execution with optimized Python code
  • Consistency - Deterministic results across multiple runs
  • Cost-effective - Lower computational costs compared to LLM-based extraction
  • Customization - Generated code can be reviewed and modified if needed
Parsera Parsera on