Anyhunt
API Reference

Extract API

AI-powered structured data extraction using LLM

Extract API

The Extract API uses Large Language Models (LLM) to extract structured data from web pages. Define a JSON Schema and let AI extract matching data.

Endpoints

MethodPathDescription
POST/api/v1/extractExtract structured data

Extract Data

POST /api/v1/extract

Request Body

ParameterTypeDescription
urlsstring[]URLs to extract from (1-20)
promptstringExtraction instructions (max 5000 chars)
schemaobjectJSON Schema for output structure
systemPromptstringCustom system prompt (max 2000 chars)
modelstringLLM model to use (optional)

Example Request

curl -X POST https://server.anyhunt.app/api/v1/extract \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/product/123"],
    "prompt": "Extract the product information from this page",
    "schema": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"},
        "currency": {"type": "string"},
        "description": {"type": "string"},
        "inStock": {"type": "boolean"}
      },
      "required": ["name", "price"]
    }
  }'

Response

{
  "results": [
    {
      "url": "https://example.com/product/123",
      "data": {
        "name": "Wireless Headphones Pro",
        "price": 199.99,
        "currency": "USD",
        "description": "Premium wireless headphones with noise cancellation",
        "inStock": true
      }
    }
  ]
}

Error Response

If extraction fails for a URL:

{
  "results": [
    {
      "url": "https://example.com/product/123",
      "error": "Failed to extract data: page content is empty"
    }
  ]
}

JSON Schema

Define the structure of data you want to extract using JSON Schema:

Supported Types

TypeDescriptionExample
stringText values"name": {"type": "string"}
numberNumeric values"price": {"type": "number"}
booleanTrue/false"inStock": {"type": "boolean"}
arrayList of values"tags": {"type": "array", "items": {"type": "string"}}
objectNested object"author": {"type": "object", "properties": {...}}

Example Schemas

Product Extraction

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "price": {"type": "number"},
    "description": {"type": "string"},
    "specifications": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "key": {"type": "string"},
          "value": {"type": "string"}
        }
      }
    }
  },
  "required": ["name", "price"]
}

Article Metadata

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "author": {"type": "string"},
    "publishedDate": {"type": "string"},
    "tags": {
      "type": "array",
      "items": {"type": "string"}
    },
    "summary": {"type": "string"}
  },
  "required": ["title"]
}

Company Information

{
  "type": "object",
  "properties": {
    "companyName": {"type": "string"},
    "founded": {"type": "number"},
    "headquarters": {"type": "string"},
    "employees": {"type": "number"},
    "products": {
      "type": "array",
      "items": {"type": "string"}
    }
  }
}

Code Examples

Node.js

const response = await fetch('https://server.anyhunt.app/api/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ah_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    urls: ['https://example.com/product/123'],
    prompt: 'Extract product details',
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        price: { type: 'number' },
      },
      required: ['name', 'price'],
    },
  }),
});

const data = await response.json();
console.log(data.results[0].data);

Python

import requests

response = requests.post(
    'https://server.anyhunt.app/api/v1/extract',
    headers={
        'Authorization': 'Bearer ah_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'urls': ['https://example.com/product/123'],
        'prompt': 'Extract product details',
        'schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'price': {'type': 'number'},
            },
            'required': ['name', 'price'],
        },
    },
)

results = response.json()['results']
print(results[0]['data'])

Best Practices

  1. Keep schemas simple - Start with a few fields, add more as needed
  2. Use clear prompts - Describe what you want to extract in plain language
  3. Mark required fields - Use required array to ensure essential data is extracted
  4. Test with single URL first - Validate your schema before batch extraction
  5. Handle errors gracefully - Check for error field in each result

Prompt Tips

Good prompts help the LLM understand what to extract:

{
  "prompt": "Extract the main product information. Focus on the product name, current price (not original price), and whether it's currently in stock."
}

Bad prompt (too vague):

{
  "prompt": "Get the data"
}

Pricing

Extract API calls consume quota based on:

  • Number of URLs processed
  • Page content size
  • Schema complexity

Each successful extraction counts as 1 API call against your quota.