Code Mode
Generate custom Python scraping code for your scrapers and run them in code mode. This approach provides maximum flexibility and control over the scraping process.
How It Works
- Create a scraper using the Scrapers API
- Generate Python code by defining the data fields you want to extract
- Switch the scraper to "code" mode in the UI
- Run the generated code on specific URLs
Note: In order to run scraper code, the scraper must be switched to code mode in the UI. Code mode allows you to execute custom Python extraction logic rather than using the standard LLM-based extraction.
Generate Code
Generate Python scraping code for a scraper by providing a scraper ID, sample URL or content, and the attributes you want to extract:
Endpoint: POST /v1/scrapers/generate
curl https://api.parsera.org/v1/scrapers/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": "https://news.ycombinator.com/",
"attributes": [
{
"name": "title",
"description": "News title"
},
{
"name": "points",
"description": "Number of points"
}
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
scraper_id | string | - | Unique identifier for your scraper |
url | string | null | Sample URL to generate the scraper from |
content | string | null | Raw HTML or text content to generate the scraper from (alternative to url) |
prompt | string | "" | Additional prompt for scraper generation |
attributes | array | [] | A list of attribute objects with name and description fields to extract. You can also specify Output Types |
proxy_country | string | null | Proxy country for the sample request, see Proxy Countries |
cookies | array | null | Cookies to use during generation, see Cookies |
Note: You must provide either url or content, but not both.
Response:
Returns a success message when code generation is complete:
{
"message": "Code generated successfully for scraper hackernews"
}After generating the code, remember to switch your scraper to "code" mode in the UI before running it.
Run Code
Run a scraper in code mode on one or multiple URLs:
Endpoint: POST /v1/scrapers/run
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": "https://news.ycombinator.com/front?day=2024-09-11"
}'Run on multiple URLs:
You can run code on multiple URLs at once (up to 100 URLs):
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"scraper_id": "hackernews",
"url": [
"https://news.ycombinator.com/front?day=2024-09-11",
"https://news.ycombinator.com/front?day=2024-09-12",
"https://news.ycombinator.com/front?day=2024-09-13"
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
scraper_id | string | - | ID of the scraper to run (must be in code mode) |
url | string or array | null | URL or list of URLs to scrape (max 100 URLs) |
proxy_country | string | null | Proxy country, see Proxy Countries |
cookies | array | null | Cookies to use during extraction, see Cookies |
Important: The scraper must be in "code" mode for this to work. You can switch modes in the Parsera UI.
Benefits of Code Mode
Running scrapers in code mode offers several advantages:
- Performance - Faster execution with optimized Python code
- Consistency - Deterministic results across multiple runs
- Cost-effective - Lower computational costs compared to LLM-based extraction
- Customization - Generated code can be reviewed and modified if needed
Related Documentation
- Scrapers API - Create, list, and delete scrapers
- Output Types - Define data types for extracted fields
- Proxy Countries - Access content from different geographic locations
- Cookies - Handle authentication and session cookies