错误处理
理解错误码和实现重试策略
错误处理
了解如何处理 Anyhunt API 错误并实现健壮的重试策略。
错误响应格式
所有 API 错误遵循 RFC7807(Problem Details):
{
"type": "https://anyhunt.app/errors/ERROR_CODE",
"title": "Error Title",
"status": 400,
"detail": "人类可读的错误信息",
"code": "ERROR_CODE",
"requestId": "req_123",
"details": {}
}常见错误码
客户端错误 (4xx)
| 错误码 | HTTP 状态 | 描述 |
|---|---|---|
INVALID_URL | 400 | URL 格式错误或使用不支持的协议 |
URL_NOT_ALLOWED | 400 | URL 被 SSRF 防护阻止(本地地址、私有 IP) |
INVALID_PARAMETER | 400 | 请求参数验证失败 |
SELECTOR_NOT_FOUND | 400 | 页面上未找到 CSS 选择器 |
UNAUTHORIZED | 401 | 缺少或无效的 API 密钥 |
FORBIDDEN | 403 | API 密钥缺少必要权限 |
NOT_FOUND | 404 | 资源(任务、抓取)未找到 |
RATE_LIMITED | 429 | 请求过于频繁 - 请降速 |
QUOTA_EXCEEDED | 429 | 月度配额已用完 |
服务器错误 (5xx)
| 错误码 | HTTP 状态 | 描述 |
|---|---|---|
PAGE_TIMEOUT | 504 | 页面加载超时 |
BROWSER_ERROR | 500 | 浏览器崩溃或失败 |
NETWORK_ERROR | 500 | 网络请求失败 |
INTERNAL_ERROR | 500 | 服务器内部错误 |
处理特定错误
频率限制
被限流时,响应包含重试信息:
{
"type": "https://anyhunt.app/errors/RATE_LIMITED",
"title": "Too Many Requests",
"status": 429,
"detail": "请求过于频繁",
"code": "RATE_LIMITED",
"details": {
"retryAfter": 60,
"limit": 100,
"remaining": 0,
"resetAt": "2024-01-15T11:00:00.000Z"
}
}响应头:
| 响应头 | 描述 |
|---|---|
X-RateLimit-Limit | 窗口期内允许的请求数 |
X-RateLimit-Remaining | 剩余请求数 |
X-RateLimit-Reset | 窗口重置时间戳 |
Retry-After | 需等待的秒数(被限流时) |
配额超限
{
"type": "https://anyhunt.app/errors/QUOTA_EXCEEDED",
"title": "Too Many Requests",
"status": 429,
"detail": "月度配额已超限",
"code": "QUOTA_EXCEEDED",
"details": {
"quota": 10000,
"used": 10000,
"resetAt": "2024-02-01T00:00:00.000Z"
}
}页面超时
{
"type": "https://anyhunt.app/errors/PAGE_TIMEOUT",
"title": "Gateway Timeout",
"status": 504,
"detail": "页面加载超时,已等待 30000ms",
"code": "PAGE_TIMEOUT",
"details": {
"url": "https://slow-website.com",
"timeout": 30000
}
}重试策略
指数退避
对瞬时错误实现指数退避重试:
async function scrapeWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch('https://server.anyhunt.app/api/v1/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url, ...options }),
});
const data = await response.json();
if (response.ok) {
return data;
}
const code = data.code;
const detail = data.detail || `请求失败 (${response.status})`;
// 客户端错误不重试(限流除外)
if (response.status >= 400 && response.status < 500 && code !== 'RATE_LIMITED') {
throw new Error(`客户端错误: ${detail}`);
}
// 处理限流
if (code === 'RATE_LIMITED') {
const retryAfter = data.details?.retryAfter || 60;
console.log(`被限流。等待 ${retryAfter} 秒...`);
await sleep(retryAfter * 1000);
continue;
}
// 服务器错误使用指数退避重试
if (attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000; // 1秒, 2秒, 4秒
console.log(`第 ${attempt + 1} 次尝试失败。${delay}ms 后重试...`);
await sleep(delay);
}
} catch (error) {
if (attempt === maxRetries) {
throw error;
}
}
}
throw new Error('超过最大重试次数');
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}Python 实现
import time
import requests
from typing import Optional
def scrape_with_retry(
url: str,
options: dict = None,
max_retries: int = 3
) -> dict:
options = options or {}
for attempt in range(max_retries + 1):
try:
response = requests.post(
'https://server.anyhunt.app/api/v1/scrape',
headers={
'Authorization': 'Bearer ah_your_api_key',
'Content-Type': 'application/json',
},
json={'url': url, **options},
)
data = response.json()
if response.ok:
return data
error_code = data.get('code')
detail = data.get('detail') or f"请求失败 ({response.status_code})"
# 客户端错误不重试(限流除外)
if 400 <= response.status_code < 500 and error_code != 'RATE_LIMITED':
raise Exception(f"客户端错误: {detail}")
# 处理限流
if error_code == 'RATE_LIMITED':
retry_after = data.get('details', {}).get('retryAfter', 60)
print(f"被限流。等待 {retry_after} 秒...")
time.sleep(retry_after)
continue
# 服务器错误使用指数退避重试
if attempt < max_retries:
delay = (2 ** attempt)
print(f"第 {attempt + 1} 次尝试失败。{delay} 秒后重试...")
time.sleep(delay)
except requests.RequestException as e:
if attempt == max_retries:
raise
delay = (2 ** attempt)
time.sleep(delay)
raise Exception('超过最大重试次数')错误恢复模式
熔断器
使用熔断器防止级联故障:
class CircuitBreaker {
constructor(threshold = 5, resetTimeout = 60000) {
this.failures = 0;
this.threshold = threshold;
this.resetTimeout = resetTimeout;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.nextAttempt = 0;
}
async execute(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('熔断器已打开');
}
this.state = 'HALF_OPEN';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.resetTimeout;
}
}
}
// 使用示例
const breaker = new CircuitBreaker(5, 60000);
async function safeScrape(url) {
return breaker.execute(() => scrapeWithRetry(url));
}优雅降级
处理批量操作中的部分失败:
async function scrapeBatchWithFallback(urls) {
const results = [];
const errors = [];
for (const url of urls) {
try {
const result = await scrapeWithRetry(url);
results.push({ url, success: true, data: result });
} catch (error) {
errors.push({ url, success: false, error: error.message });
// 继续处理其他 URL
}
}
return {
results,
errors,
successRate: results.length / urls.length,
};
}最佳实践
- 检查 HTTP 状态 - 使用
response.ok并解析 RFC7807 错误体 - 实现重试机制 - 对瞬时错误使用指数退避
- 尊重频率限制 - 使用
Retry-After响应头的值 - 记录错误日志 - 保留记录用于调试
- 设置超时 - 不要无限期等待响应
- 处理部分失败 - 批量操作中处理可以处理的部分
- 监控错误率 - 跟踪错误以及早发现问题
调试技巧
启用详细日志
const response = await fetch('https://server.anyhunt.app/api/v1/scrape', {
// ...
});
console.log('状态码:', response.status);
console.log('响应头:', Object.fromEntries(response.headers));
console.log('响应体:', await response.text());检查请求 ID
每个响应都包含请求 ID,用于技术支持:
X-Request-Id: req_abc123xyz报告问题时请提供此 ID。