Extract API

The Extract API uses Large Language Models (LLM) to extract structured data from web pages. Define a JSON Schema and let AI extract matching data.

Endpoints

Method	Path	Description
POST	`/api/v1/extract`	Extract structured data

Extract Data

POST /api/v1/extract

Request Body

Parameter	Type	Description
`urls`	string[]	URLs to extract from (1-20)
`prompt`	string	Extraction instructions (max 5000 chars)
`schema`	object	JSON Schema for output structure
`systemPrompt`	string	Custom system prompt (max 2000 chars)
`model`	string	LLM model to use (optional)

Example Request

curl -X POST https://server.anyhunt.app/api/v1/extract \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/product/123"],
    "prompt": "Extract the product information from this page",
    "schema": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"},
        "currency": {"type": "string"},
        "description": {"type": "string"},
        "inStock": {"type": "boolean"}
      },
      "required": ["name", "price"]
    }
  }'

Response

{
  "results": [
    {
      "url": "https://example.com/product/123",
      "data": {
        "name": "Wireless Headphones Pro",
        "price": 199.99,
        "currency": "USD",
        "description": "Premium wireless headphones with noise cancellation",
        "inStock": true
      }
    }
  ]
}

Error Response

If extraction fails for a URL:

{
  "results": [
    {
      "url": "https://example.com/product/123",
      "error": "Failed to extract data: page content is empty"
    }
  ]
}

JSON Schema

Define the structure of data you want to extract using JSON Schema:

Supported Types

Type	Description	Example
`string`	Text values	`"name": {"type": "string"}`
`number`	Numeric values	`"price": {"type": "number"}`
`boolean`	True/false	`"inStock": {"type": "boolean"}`
`array`	List of values	`"tags": {"type": "array", "items": {"type": "string"}}`
`object`	Nested object	`"author": {"type": "object", "properties": {...}}`

Example Schemas

Product Extraction

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "price": {"type": "number"},
    "description": {"type": "string"},
    "specifications": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "key": {"type": "string"},
          "value": {"type": "string"}
        }
      }
    }
  },
  "required": ["name", "price"]
}

Article Metadata

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "author": {"type": "string"},
    "publishedDate": {"type": "string"},
    "tags": {
      "type": "array",
      "items": {"type": "string"}
    },
    "summary": {"type": "string"}
  },
  "required": ["title"]
}

Code Examples

Node.js

const response = await fetch('https://server.anyhunt.app/api/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ah_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    urls: ['https://example.com/product/123'],
    prompt: 'Extract product details',
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        price: { type: 'number' },
      },
      required: ['name', 'price'],
    },
  }),
});

const data = await response.json();
console.log(data.results[0].data);

Python

import requests

response = requests.post(
    'https://server.anyhunt.app/api/v1/extract',
    headers={
        'Authorization': 'Bearer ah_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'urls': ['https://example.com/product/123'],
        'prompt': 'Extract product details',
        'schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'price': {'type': 'number'},
            },
            'required': ['name', 'price'],
        },
    },
)

results = response.json()['results']
print(results[0]['data'])

Best Practices

Keep schemas simple - Start with a few fields, add more as needed
Use clear prompts - Describe what you want to extract in plain language
Mark required fields - Use required array to ensure essential data is extracted
Test with single URL first - Validate your schema before batch extraction
Handle errors gracefully - Check for error field in each result

Prompt Tips

Good prompts help the LLM understand what to extract:

{
  "prompt": "Extract the main product information. Focus on the product name, current price (not original price), and whether it's currently in stock."
}

Bad prompt (too vague):

{
  "prompt": "Get the data"
}

Pricing

Extract API calls consume quota based on:

Number of URLs processed
Page content size
Schema complexity

Each successful extraction counts as 1 API call against your quota.

Extract API

Extract API

Endpoints

Extract Data

Request Body

Example Request

Response

Error Response

JSON Schema

Supported Types

Example Schemas

Product Extraction

Article Metadata

Company Information

Code Examples

Node.js

Python

Best Practices

Prompt Tips

Pricing

On this page