Anyhunt
API Reference

Batch Scrape API

Scrape multiple URLs in parallel with a single API call

Batch Scrape API

The Batch Scrape API allows you to scrape multiple URLs in a single request. It processes URLs in parallel with shared scrape options and supports webhook notifications.

Endpoints

MethodPathDescription
POST/api/v1/batch/scrapeStart a batch scrape job
GET/api/v1/batch/scrape/:idGet batch status and results
GET/api/v1/batch/scrapeList batch history

Start Batch Scrape

POST /api/v1/batch/scrape

Request Body

ParameterTypeDescription
urlsstring[]Array of URLs to scrape (1-100)
scrapeOptionsobjectShared options for all URLs (see Scrape API)
webhookUrlstringWebhook URL for completion notification

Example Request

curl -X POST https://server.anyhunt.app/api/v1/batch/scrape \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2",
      "https://example.com/page-3"
    ],
    "scrapeOptions": {
      "formats": ["markdown", "links"],
      "onlyMainContent": true
    },
    "webhookUrl": "https://your-app.com/webhooks/batch"
  }'

Response

{
  "id": "batch_abc123",
  "status": "PENDING",
  "totalUrls": 3,
  "completedUrls": 0,
  "failedUrls": 0,
  "createdAt": "2024-01-15T10:30:00.000Z"
}

Get Batch Status

GET /api/v1/batch/scrape/:id

Response (In Progress)

{
  "id": "batch_abc123",
  "status": "PROCESSING",
  "totalUrls": 3,
  "completedUrls": 2,
  "failedUrls": 0,
  "createdAt": "2024-01-15T10:30:00.000Z"
}

Response (Completed)

{
  "id": "batch_abc123",
  "status": "COMPLETED",
  "totalUrls": 3,
  "completedUrls": 3,
  "failedUrls": 0,
  "createdAt": "2024-01-15T10:30:00.000Z",
  "completedAt": "2024-01-15T10:30:15.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "status": "COMPLETED",
      "result": {
        "markdown": "# Page 1\n\nContent...",
        "links": ["https://example.com/other"]
      }
    },
    {
      "url": "https://example.com/page-2",
      "status": "COMPLETED",
      "result": {
        "markdown": "# Page 2\n\nContent...",
        "links": []
      }
    },
    {
      "url": "https://example.com/page-3",
      "status": "COMPLETED",
      "result": {
        "markdown": "# Page 3\n\nContent...",
        "links": []
      }
    }
  ]
}

Status Values:

StatusDescription
PENDINGJob is queued
PROCESSINGBatch is in progress
COMPLETEDAll URLs processed
FAILEDBatch failed with error

Item Status Values:

StatusDescription
PENDINGURL not yet processed
COMPLETEDURL scraped successfully
FAILEDURL scrape failed

List Batch History

GET /api/v1/batch/scrape

Query Parameters

ParameterTypeDefaultDescription
limitnumber20Max results (1-100)
offsetnumber0Skip results for pagination

Response

[
  {
    "id": "batch_abc123",
    "status": "COMPLETED",
    "totalUrls": 3,
    "completedUrls": 3,
    "failedUrls": 0,
    "createdAt": "2024-01-15T10:30:00.000Z"
  }
]

Webhook Payload

When a batch job completes:

{
  "event": "batch.completed",
  "data": {
    "id": "batch_abc123",
    "status": "COMPLETED",
    "totalUrls": 3,
    "completedUrls": 3,
    "failedUrls": 0
  },
  "timestamp": "2024-01-15T10:30:15.000Z"
}

Code Examples

Node.js

// Start batch scrape
const response = await fetch('https://server.anyhunt.app/api/v1/batch/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ah_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    urls: [
      'https://example.com/page-1',
      'https://example.com/page-2',
      'https://example.com/page-3',
    ],
    scrapeOptions: {
      formats: ['markdown'],
    },
  }),
});

const data = await response.json();
console.log('Batch job ID:', data.id);

// Poll for results
const checkStatus = async (id) => {
  const res = await fetch(`https://server.anyhunt.app/api/v1/batch/scrape/${id}`, {
    headers: { 'Authorization': 'Bearer ah_your_api_key' },
  });
  return res.json();
};

// Wait for completion
let status = await checkStatus(data.id);
while (status.status === 'PROCESSING') {
  await new Promise(r => setTimeout(r, 2000));
  status = await checkStatus(data.id);
}

console.log('Results:', status.data);

Python

import requests
import time

# Start batch scrape
response = requests.post(
    'https://server.anyhunt.app/api/v1/batch/scrape',
    headers={
        'Authorization': 'Bearer ah_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'urls': [
            'https://example.com/page-1',
            'https://example.com/page-2',
            'https://example.com/page-3',
        ],
        'scrapeOptions': {
            'formats': ['markdown'],
        },
    },
)

batch_id = response.json()['id']
print(f'Batch job ID: {batch_id}')

# Poll for results
while True:
    status = requests.get(
        f'https://server.anyhunt.app/api/v1/batch/scrape/{batch_id}',
        headers={'Authorization': 'Bearer ah_your_api_key'},
    ).json()

    if status['status'] != 'PROCESSING':
        break
    time.sleep(2)

print('Results:', status['data'])

Best Practices

  1. Use webhooks - For batches with more than 10 URLs, use webhooks instead of polling
  2. Group similar pages - URLs with similar structure will process more efficiently
  3. Monitor failures - Check failedUrls count and individual item status
  4. Limit batch size - Start with smaller batches (10-20) to estimate processing time