ParseraParsera

Code Mode

Generate custom Python scraping code for your classic scrapers and run them in code mode. This approach provides maximum flexibility and control over the scraping process.

How It Works

  1. Create a classic scraper using POST /v1/extractor/new (or let it auto-create)
  2. Generate Python code by providing a sample URL and the data fields you want to extract
  3. Run the scraper on specific URLs via POST /v1/scrapers/run

Generate Code

Generate Python scraping code by providing a sample URL or content and the attributes you want to extract. Code generation is asynchronous — it returns immediately and runs in the background.

Endpoint: POST /v1/extractor/generate

curl https://api.parsera.org/v1/extractor/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "attributes": [
        {
            "name": "title",
            "description": "News title"
        },
        {
            "name": "points",
            "description": "Number of points"
        }
    ]
}'

Parameters:

ParameterTypeDefaultDescription
template_idstringnullScraper ID. If omitted, a new scraper is created automatically
urlstringnullSample URL to generate the scraper from
contentstringnullRaw HTML or text content to generate the scraper from (alternative to url)
promptstring""Additional prompt for scraper generation
attributesarray[]A list of attribute objects with name and description fields to extract. You can also specify Output Types
proxy_countrystringnullProxy country for the sample request, see Proxy Countries
cookiesarraynullCookies to use during generation, see Cookies

Note: You must provide either url or content, and either attributes or prompt.

Response (202 Accepted):

{
    "template_id": "abc123",
    "status": "generating"
}

To check generation progress, poll the scraper details:

curl https://api.parsera.org/v1/extractor/abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'

The status field will be generating, ready, or failed.

Run Code

Run a scraper in code mode on one or multiple URLs:

Endpoint: POST /v1/scrapers/run

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": "https://news.ycombinator.com/front?day=2024-09-11"
}'

Run on multiple URLs:

curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "template_id": "abc123",
    "url": [
        "https://news.ycombinator.com/front?day=2024-09-11",
        "https://news.ycombinator.com/front?day=2024-09-12",
        "https://news.ycombinator.com/front?day=2024-09-13"
    ]
}'

Parameters:

ParameterTypeDefaultDescription
template_idstring-ID of the scraper to run
urlstring or arraynullURL or list of URLs to scrape (max 100 URLs)
proxy_countrystringnullProxy country, see Proxy Countries
cookiesarraynullCookies to use during extraction, see Cookies

For long-running jobs, use the async endpoint POST /v1/scrapers/run_async instead. See Scrapers API for details.

Benefits of Code Mode

  • Performance - Faster execution with optimized Python code
  • Consistency - Deterministic results across multiple runs
  • Cost-effective - Lower computational costs compared to LLM-based extraction
  • Customization - Generated code can be reviewed and modified if needed