Anyhunt

快速开始

创建账号并发起你的第一个 API 请求

快速开始

本指南将帮助你设置 Anyhunt 并在几分钟内抓取你的第一个网页。

第一步:创建账号

  1. 访问 console.anyhunt.app
  2. 使用邮箱或 GitHub 账号注册
  3. 验证你的邮箱地址

第二步:获取 API 密钥

  1. 在侧边栏中导航到 API 密钥
  2. 点击 创建 API 密钥
  3. 为你的密钥命名(例如:"开发环境")
  4. 复制你的 API 密钥 - 你只能看到它一次!

请妥善保管你的 API 密钥。切勿在客户端代码或公开仓库中暴露它。

第三步:发起第一个请求

使用 curl 或你喜欢的 HTTP 客户端来抓取网页:

curl -X POST https://server.anyhunt.app/api/v1/scrape \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "screenshot"],
    "onlyMainContent": true
  }'

响应

{
  "id": "scrape_abc123",
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "screenshot": {
    "url": "https://cdn.anyhunt.app/scraper/scrape_abc123.png",
    "width": 1280,
    "height": 800,
    "format": "png"
  },
  "metadata": {
    "title": "Example Domain",
    "description": "Example Domain for illustrative examples"
  }
}

第四步:使用结果

响应包含你请求格式的提取内容:

  • markdown - 从页面提取的干净可读文本
  • screenshot - 截图托管的 CDN URL
  • html - 清理后的 HTML 内容
  • links - 页面上发现的所有链接

下一步

代码示例

Node.js

const response = await fetch('https://server.anyhunt.app/api/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.ANYHUNT_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com',
    formats: ['markdown', 'links'],
    onlyMainContent: true,
  }),
});

const data = await response.json();
console.log(data.markdown);
console.log(data.links);

Python

import requests

response = requests.post(
    'https://server.anyhunt.app/api/v1/scrape',
    headers={
        'Authorization': f'Bearer {API_KEY}',
        'Content-Type': 'application/json',
    },
    json={
        'url': 'https://example.com',
        'formats': ['markdown', 'links'],
        'onlyMainContent': True,
    }
)

data = response.json()
print(data['markdown'])
print(data['links'])

Go

package main

import (
    "bytes"
    "encoding/json"
    "net/http"
)

func main() {
    payload := map[string]interface{}{
        "url":             "https://example.com",
        "formats":         []string{"markdown", "links"},
        "onlyMainContent": true,
    }

    body, _ := json.Marshal(payload)
    req, _ := http.NewRequest("POST", "https://server.anyhunt.app/api/v1/scrape", bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer "+apiKey)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, _ := client.Do(req)
    defer resp.Body.Close()
}

常见使用场景

网页抓取

将任意网页内容提取为干净的 markdown:

{
  "url": "https://blog.example.com/article",
  "formats": ["markdown"],
  "onlyMainContent": true
}

截图捕获

捕获完整页面截图,用于视觉测试或归档:

{
  "url": "https://example.com",
  "formats": ["screenshot"],
  "screenshotOptions": {
    "fullPage": true,
    "format": "webp",
    "quality": 90
  }
}

链接发现

发现页面上的所有链接,用于 SEO 分析或爬取:

{
  "url": "https://example.com",
  "formats": ["links"]
}

多页爬取

爬取整个网站以从多个页面提取内容:

curl -X POST https://server.anyhunt.app/api/v1/crawl \
  -H "Authorization: Bearer ah_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxDepth": 2,
    "limit": 50
  }'

详情请参阅 Crawl API 文档。