Anyhunt
API Reference

Scrape API

Extract content from any webpage in multiple formats

Scrape API

The Scrape API is the primary endpoint for extracting content from web pages. It supports multiple output formats including Markdown, HTML, links, and screenshots.

Endpoints

MethodPathDescription
POST/api/v1/scrapeCreate a scrape job
GET/api/v1/scrape/:idGet job status and result
GET/api/v1/scrapeList scrape history

Create Scrape Job

POST /api/v1/scrape

Request Body

Required Parameters

ParameterTypeDescription
urlstringThe URL to scrape (must be valid HTTP/HTTPS)

Output Format Options

ParameterTypeDefaultDescription
formatsstring[]["markdown"]Output formats: markdown, html, rawHtml, links, screenshot
onlyMainContentbooleantrueExtract only main content (uses Readability algorithm)
includeTagsstring[]-CSS selectors to include
excludeTagsstring[]-CSS selectors to exclude (also hides elements for screenshots)

Page Configuration

ParameterTypeDefaultDescription
viewportobject-Custom viewport {width, height}
viewport.widthnumber1280Viewport width (100-3840)
viewport.heightnumber800Viewport height (100-2160)
mobilebooleanfalseUse mobile viewport and user agent
devicestring-Device preset: desktop, tablet, mobile
darkModebooleanfalseEnable dark mode
headersobject-Custom HTTP headers

Timing Options

ParameterTypeDefaultDescription
waitFornumber | string-Wait time in ms, or CSS selector to wait for
timeoutnumber30000Page timeout in milliseconds

Screenshot Options

Only applies when formats includes screenshot:

ParameterTypeDefaultDescription
screenshotOptions.fullPagebooleanfalseCapture full page height
screenshotOptions.formatstring"png"Image format: png, jpeg, webp
screenshotOptions.qualitynumber80Image quality (1-100)
screenshotOptions.clipstring-CSS selector for element screenshot
screenshotOptions.responsestring"url"Response type: url or base64

Page Actions

Execute interactions before scraping:

ParameterTypeDescription
actionsAction[]Array of actions to execute

Action Types:

TypeParametersDescription
waitmillisecondsWait for specified time
clickselectorClick an element
typeselector, textType text into an input
presskeyPress a keyboard key
scrolldirection (up/down), amountScroll the page
screenshot-Take intermediate screenshot

Example Request

curl -X POST https://server.anyhunt.app/api/v1/scrape \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "formats": ["markdown", "screenshot"],
    "onlyMainContent": true,
    "viewport": {
      "width": 1920,
      "height": 1080
    },
    "screenshotOptions": {
      "fullPage": true,
      "format": "webp",
      "quality": 85
    }
  }'

Response

The API returns a job ID. Poll GET /api/v1/scrape/:id to get results.

{
  "id": "scrape_abc123",
  "status": "PENDING"
}

Cache Hit Response:

If the same URL was recently scraped, the API returns cached results immediately:

{
  "id": "scrape_abc123",
  "url": "https://example.com/article",
  "fromCache": true,
  "markdown": "# Article Title\n\nArticle content...",
  "screenshot": {
    "url": "https://cdn.anyhunt.app/scraper/scrape_abc123.webp",
    "width": 1920,
    "height": 3500,
    "format": "webp",
    "fileSize": 245000,
    "expiresAt": "2024-02-15T10:30:00.000Z"
  },
  "metadata": {
    "title": "Article Title",
    "description": "Article description"
  }
}

Get Scrape Job

GET /api/v1/scrape/:id

Retrieve the status and result of a specific scrape job.

Response

{
  "id": "scrape_abc123",
  "url": "https://example.com/article",
  "status": "COMPLETED",
  "fromCache": false,
  "markdown": "# Article Title\n\nContent...",
  "metadata": {
    "title": "Article Title",
    "description": "Article description"
  },
  "screenshot": {
    "url": "https://cdn.anyhunt.app/scraper/scrape_abc123.webp",
    "width": 1920,
    "height": 3500,
    "format": "webp",
    "fileSize": 245000
  },
  "timings": {
    "queueWaitMs": 50,
    "fetchMs": 1200,
    "renderMs": 500,
    "transformMs": 100,
    "screenshotMs": 800,
    "totalMs": 2650
  }
}

Status Values:

StatusDescription
PENDINGJob is queued
PROCESSINGJob is being processed
COMPLETEDJob completed successfully
FAILEDJob failed with error

List Scrape History

GET /api/v1/scrape

List your recent scrape jobs.

Query Parameters

ParameterTypeDefaultDescription
limitnumber20Max results (1-100)
offsetnumber0Skip results for pagination

Response

[
  {
    "id": "scrape_abc123",
    "url": "https://example.com",
    "status": "COMPLETED",
    "fromCache": false,
    "createdAt": "2024-01-15T10:30:00.000Z",
    "completedAt": "2024-01-15T10:30:02.650Z"
  }
]

Code Examples

Node.js

// Start scrape job
const response = await fetch('https://server.anyhunt.app/api/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ah_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com',
    formats: ['markdown', 'links'],
    onlyMainContent: true,
  }),
});

const data = await response.json();

// If cache hit, data is already available
if (data.fromCache) {
  console.log(data.markdown);
} else {
  // Poll for results
  const result = await pollForResult(data.id);
  console.log(result.markdown);
}

async function pollForResult(id) {
  while (true) {
    const res = await fetch(`https://server.anyhunt.app/api/v1/scrape/${id}`, {
      headers: { 'Authorization': 'Bearer ah_your_api_key' },
    });
    const data = await res.json();
    if (data.status === 'COMPLETED') return data;
    if (data.status === 'FAILED') throw new Error(data.error?.message);
    await new Promise(r => setTimeout(r, 1000)); // Wait 1s
  }
}

Python

import requests
import time

# Start scrape job
response = requests.post(
    'https://server.anyhunt.app/api/v1/scrape',
    headers={
        'Authorization': 'Bearer ah_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com',
        'formats': ['markdown', 'links'],
        'onlyMainContent': True,
    },
)

data = response.json()

# If cache hit, data is already available
if data.get('fromCache'):
    print(data['markdown'])
else:
    # Poll for results
    while True:
        res = requests.get(
            f"https://server.anyhunt.app/api/v1/scrape/{data['id']}",
            headers={'Authorization': 'Bearer ah_your_api_key'}
        )
        result = res.json()
        if result['status'] == 'COMPLETED':
            print(result['markdown'])
            break
        elif result['status'] == 'FAILED':
            raise Exception(result.get('error', {}).get('message'))
        time.sleep(1)

With Page Actions

curl -X POST https://server.anyhunt.app/api/v1/scrape \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"],
    "actions": [
      {"type": "wait", "milliseconds": 1000},
      {"type": "click", "selector": "#load-more"},
      {"type": "scroll", "direction": "down"},
      {"type": "wait", "milliseconds": 500}
    ]
  }'

Error Codes

CodeStatusDescription
INVALID_URL400URL format is invalid or blocked
URL_NOT_ALLOWED400URL blocked by SSRF protection
PAGE_TIMEOUT504Page took too long to load
SELECTOR_NOT_FOUND400CSS selector not found on page
BROWSER_ERROR500Browser crashed or error
NETWORK_ERROR500Network request failed
RATE_LIMITED429Too many requests
QUOTA_EXCEEDED429Monthly quota exhausted

Caching

Responses are cached for 1 hour by default. Cache hits are indicated by fromCache: true in the response and don't count against your quota.

The cache key is computed from SHA256(url + options), so identical requests will return cached results.