API Reference
Extract API
AI-powered structured data extraction using LLM
Extract API
The Extract API uses Large Language Models (LLM) to extract structured data from web pages. Define a JSON Schema and let AI extract matching data.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/extract | Extract structured data |
Extract Data
POST /api/v1/extractRequest Body
| Parameter | Type | Description |
|---|---|---|
urls | string[] | URLs to extract from (1-20) |
prompt | string | Extraction instructions (max 5000 chars) |
schema | object | JSON Schema for output structure |
systemPrompt | string | Custom system prompt (max 2000 chars) |
model | string | LLM model to use (optional) |
Example Request
curl -X POST https://server.anyhunt.app/api/v1/extract \
-H "Authorization: Bearer ah_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/product/123"],
"prompt": "Extract the product information from this page",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"description": {"type": "string"},
"inStock": {"type": "boolean"}
},
"required": ["name", "price"]
}
}'Response
{
"results": [
{
"url": "https://example.com/product/123",
"data": {
"name": "Wireless Headphones Pro",
"price": 199.99,
"currency": "USD",
"description": "Premium wireless headphones with noise cancellation",
"inStock": true
}
}
]
}Error Response
If extraction fails for a URL:
{
"results": [
{
"url": "https://example.com/product/123",
"error": "Failed to extract data: page content is empty"
}
]
}JSON Schema
Define the structure of data you want to extract using JSON Schema:
Supported Types
| Type | Description | Example |
|---|---|---|
string | Text values | "name": {"type": "string"} |
number | Numeric values | "price": {"type": "number"} |
boolean | True/false | "inStock": {"type": "boolean"} |
array | List of values | "tags": {"type": "array", "items": {"type": "string"}} |
object | Nested object | "author": {"type": "object", "properties": {...}} |
Example Schemas
Product Extraction
{
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"description": {"type": "string"},
"specifications": {
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {"type": "string"},
"value": {"type": "string"}
}
}
}
},
"required": ["name", "price"]
}Article Metadata
{
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"publishedDate": {"type": "string"},
"tags": {
"type": "array",
"items": {"type": "string"}
},
"summary": {"type": "string"}
},
"required": ["title"]
}Company Information
{
"type": "object",
"properties": {
"companyName": {"type": "string"},
"founded": {"type": "number"},
"headquarters": {"type": "string"},
"employees": {"type": "number"},
"products": {
"type": "array",
"items": {"type": "string"}
}
}
}Code Examples
Node.js
const response = await fetch('https://server.anyhunt.app/api/v1/extract', {
method: 'POST',
headers: {
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
urls: ['https://example.com/product/123'],
prompt: 'Extract product details',
schema: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' },
},
required: ['name', 'price'],
},
}),
});
const data = await response.json();
console.log(data.results[0].data);Python
import requests
response = requests.post(
'https://server.anyhunt.app/api/v1/extract',
headers={
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
json={
'urls': ['https://example.com/product/123'],
'prompt': 'Extract product details',
'schema': {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'price': {'type': 'number'},
},
'required': ['name', 'price'],
},
},
)
results = response.json()['results']
print(results[0]['data'])Best Practices
- Keep schemas simple - Start with a few fields, add more as needed
- Use clear prompts - Describe what you want to extract in plain language
- Mark required fields - Use
requiredarray to ensure essential data is extracted - Test with single URL first - Validate your schema before batch extraction
- Handle errors gracefully - Check for
errorfield in each result
Prompt Tips
Good prompts help the LLM understand what to extract:
{
"prompt": "Extract the main product information. Focus on the product name, current price (not original price), and whether it's currently in stock."
}Bad prompt (too vague):
{
"prompt": "Get the data"
}Pricing
Extract API calls consume quota based on:
- Number of URLs processed
- Page content size
- Schema complexity
Each successful extraction counts as 1 API call against your quota.