API Reference
Batch Scrape API
Scrape multiple URLs in parallel with a single API call
Batch Scrape API
The Batch Scrape API allows you to scrape multiple URLs in a single request. It processes URLs in parallel with shared scrape options and supports webhook notifications.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/batch/scrape | Start a batch scrape job |
| GET | /api/v1/batch/scrape/:id | Get batch status and results |
| GET | /api/v1/batch/scrape | List batch history |
Start Batch Scrape
POST /api/v1/batch/scrapeRequest Body
| Parameter | Type | Description |
|---|---|---|
urls | string[] | Array of URLs to scrape (1-100) |
scrapeOptions | object | Shared options for all URLs (see Scrape API) |
webhookUrl | string | Webhook URL for completion notification |
Example Request
curl -X POST https://server.anyhunt.app/api/v1/batch/scrape \
-H "Authorization: Bearer ah_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3"
],
"scrapeOptions": {
"formats": ["markdown", "links"],
"onlyMainContent": true
},
"webhookUrl": "https://your-app.com/webhooks/batch"
}'Response
{
"id": "batch_abc123",
"status": "PENDING",
"totalUrls": 3,
"completedUrls": 0,
"failedUrls": 0,
"createdAt": "2024-01-15T10:30:00.000Z"
}Get Batch Status
GET /api/v1/batch/scrape/:idResponse (In Progress)
{
"id": "batch_abc123",
"status": "PROCESSING",
"totalUrls": 3,
"completedUrls": 2,
"failedUrls": 0,
"createdAt": "2024-01-15T10:30:00.000Z"
}Response (Completed)
{
"id": "batch_abc123",
"status": "COMPLETED",
"totalUrls": 3,
"completedUrls": 3,
"failedUrls": 0,
"createdAt": "2024-01-15T10:30:00.000Z",
"completedAt": "2024-01-15T10:30:15.000Z",
"data": [
{
"url": "https://example.com/page-1",
"status": "COMPLETED",
"result": {
"markdown": "# Page 1\n\nContent...",
"links": ["https://example.com/other"]
}
},
{
"url": "https://example.com/page-2",
"status": "COMPLETED",
"result": {
"markdown": "# Page 2\n\nContent...",
"links": []
}
},
{
"url": "https://example.com/page-3",
"status": "COMPLETED",
"result": {
"markdown": "# Page 3\n\nContent...",
"links": []
}
}
]
}Status Values:
| Status | Description |
|---|---|
PENDING | Job is queued |
PROCESSING | Batch is in progress |
COMPLETED | All URLs processed |
FAILED | Batch failed with error |
Item Status Values:
| Status | Description |
|---|---|
PENDING | URL not yet processed |
COMPLETED | URL scraped successfully |
FAILED | URL scrape failed |
List Batch History
GET /api/v1/batch/scrapeQuery Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | number | 20 | Max results (1-100) |
offset | number | 0 | Skip results for pagination |
Response
[
{
"id": "batch_abc123",
"status": "COMPLETED",
"totalUrls": 3,
"completedUrls": 3,
"failedUrls": 0,
"createdAt": "2024-01-15T10:30:00.000Z"
}
]Webhook Payload
When a batch job completes:
{
"event": "batch.completed",
"data": {
"id": "batch_abc123",
"status": "COMPLETED",
"totalUrls": 3,
"completedUrls": 3,
"failedUrls": 0
},
"timestamp": "2024-01-15T10:30:15.000Z"
}Code Examples
Node.js
// Start batch scrape
const response = await fetch('https://server.anyhunt.app/api/v1/batch/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
urls: [
'https://example.com/page-1',
'https://example.com/page-2',
'https://example.com/page-3',
],
scrapeOptions: {
formats: ['markdown'],
},
}),
});
const data = await response.json();
console.log('Batch job ID:', data.id);
// Poll for results
const checkStatus = async (id) => {
const res = await fetch(`https://server.anyhunt.app/api/v1/batch/scrape/${id}`, {
headers: { 'Authorization': 'Bearer ah_your_api_key' },
});
return res.json();
};
// Wait for completion
let status = await checkStatus(data.id);
while (status.status === 'PROCESSING') {
await new Promise(r => setTimeout(r, 2000));
status = await checkStatus(data.id);
}
console.log('Results:', status.data);Python
import requests
import time
# Start batch scrape
response = requests.post(
'https://server.anyhunt.app/api/v1/batch/scrape',
headers={
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
json={
'urls': [
'https://example.com/page-1',
'https://example.com/page-2',
'https://example.com/page-3',
],
'scrapeOptions': {
'formats': ['markdown'],
},
},
)
batch_id = response.json()['id']
print(f'Batch job ID: {batch_id}')
# Poll for results
while True:
status = requests.get(
f'https://server.anyhunt.app/api/v1/batch/scrape/{batch_id}',
headers={'Authorization': 'Bearer ah_your_api_key'},
).json()
if status['status'] != 'PROCESSING':
break
time.sleep(2)
print('Results:', status['data'])Best Practices
- Use webhooks - For batches with more than 10 URLs, use webhooks instead of polling
- Group similar pages - URLs with similar structure will process more efficiently
- Monitor failures - Check
failedUrlscount and individual item status - Limit batch size - Start with smaller batches (10-20) to estimate processing time