Code Mode
Generate custom Python scraping code for your classic scrapers and run them in code mode. This approach provides maximum flexibility and control over the scraping process.
How It Works
- Create a classic scraper using
POST /v1/extractor/new(or let it auto-create) - Generate Python code by providing a sample URL and the data fields you want to extract
- Run the scraper on specific URLs via
POST /v1/scrapers/run
Generate Code
Generate Python scraping code by providing a sample URL or content and the attributes you want to extract. Code generation is asynchronous — it returns immediately and runs in the background.
Endpoint: POST /v1/extractor/generate
curl https://api.parsera.org/v1/extractor/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"url": "https://news.ycombinator.com/",
"attributes": [
{
"name": "title",
"description": "News title"
},
{
"name": "points",
"description": "Number of points"
}
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | null | Scraper ID. If omitted, a new scraper is created automatically |
url | string | null | Sample URL to generate the scraper from |
content | string | null | Raw HTML or text content to generate the scraper from (alternative to url) |
prompt | string | "" | Additional prompt for scraper generation |
attributes | array | [] | A list of attribute objects with name and description fields to extract. You can also specify Output Types |
proxy_country | string | null | Proxy country for the sample request, see Proxy Countries |
cookies | array | null | Cookies to use during generation, see Cookies |
Note: You must provide either url or content, and either attributes or prompt.
Response (202 Accepted):
{
"template_id": "abc123",
"status": "generating"
}To check generation progress, poll the scraper details:
curl https://api.parsera.org/v1/extractor/abc123 \
--header 'X-API-KEY: <YOUR_API_KEY>'The status field will be generating, ready, or failed.
Run Code
Run a scraper in code mode on one or multiple URLs:
Endpoint: POST /v1/scrapers/run
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": "https://news.ycombinator.com/front?day=2024-09-11"
}'Run on multiple URLs:
curl https://api.parsera.org/v1/scrapers/run \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
"template_id": "abc123",
"url": [
"https://news.ycombinator.com/front?day=2024-09-11",
"https://news.ycombinator.com/front?day=2024-09-12",
"https://news.ycombinator.com/front?day=2024-09-13"
]
}'Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
template_id | string | - | ID of the scraper to run |
url | string or array | null | URL or list of URLs to scrape (max 100 URLs) |
proxy_country | string | null | Proxy country, see Proxy Countries |
cookies | array | null | Cookies to use during extraction, see Cookies |
For long-running jobs, use the async endpoint POST /v1/scrapers/run_async instead. See Scrapers API for details.
Benefits of Code Mode
- Performance - Faster execution with optimized Python code
- Consistency - Deterministic results across multiple runs
- Cost-effective - Lower computational costs compared to LLM-based extraction
- Customization - Generated code can be reviewed and modified if needed
Related Documentation
- Extractor API - Create and manage classic scrapers
- Scrapers API - Run scrapers synchronously or asynchronously
- Output Types - Define data types for extracted fields
