Guides
Output Formats
Understanding the different content formats available
Output Formats
The Scrape API supports multiple output formats. You can request one or more formats in a single request.
Available Formats
| Format | Description | Use Case |
|---|---|---|
markdown | Clean Markdown text | Content processing, LLM input |
html | Cleaned HTML | Web rendering, archiving |
rawHtml | Original HTML | Debugging, full page analysis |
links | All page links | Link extraction, site mapping |
screenshot | Page image | Visual archives, thumbnails |
Markdown
The default format. Converts HTML to clean, readable Markdown.
{
"url": "https://example.com/article",
"formats": ["markdown"]
}Features:
- GitHub Flavored Markdown (tables, task lists, strikethrough)
- Code blocks with syntax hints preserved
- Images converted to
format - Links preserved as
[text](url) - Script, style, and SVG tags removed
Response:
{
"markdown": "# Article Title\n\nThis is the article content with **bold** and *italic* text.\n\n## Section\n\nMore content here..."
}HTML
Cleaned HTML with scripts and styles removed.
{
"url": "https://example.com/article",
"formats": ["html"]
}Features:
- Script and style tags removed
- Inline event handlers removed
- Clean, safe HTML output
Response:
{
"html": "<article><h1>Article Title</h1><p>This is the article content...</p></article>"
}Raw HTML
The original, unmodified HTML from the page.
{
"url": "https://example.com",
"formats": ["rawHtml"]
}Use cases:
- Debugging rendering issues
- Full page analysis
- Preserving original structure
Response:
{
"rawHtml": "<!DOCTYPE html><html><head>...</head><body>...</body></html>"
}Links
Extract all hyperlinks from the page.
{
"url": "https://example.com",
"formats": ["links"]
}Features:
- Unique links only (deduplicated)
- HTTP/HTTPS links only
- JavaScript links excluded
- Absolute URLs
Response:
{
"links": [
"https://example.com/about",
"https://example.com/contact",
"https://external-site.com/page"
]
}Screenshot
Capture a visual image of the page.
{
"url": "https://example.com",
"formats": ["screenshot"],
"screenshotOptions": {
"fullPage": true,
"format": "webp",
"quality": 85
}
}Options:
| Option | Type | Default | Description |
|---|---|---|---|
fullPage | boolean | false | Capture full page height |
format | string | "png" | png, jpeg, webp |
quality | number | 80 | Image quality (1-100) |
clip | string | - | CSS selector for element screenshot |
response | string | "url" | url or base64 |
Response:
{
"screenshot": {
"url": "https://cdn.anyhunt.app/scraper/abc123.webp",
"width": 1920,
"height": 3500,
"format": "webp",
"fileSize": 245000,
"expiresAt": "2024-02-15T10:30:00.000Z"
}
}Multiple Formats
Request multiple formats in a single API call:
{
"url": "https://example.com/article",
"formats": ["markdown", "links", "screenshot"]
}Response:
{
"markdown": "# Article\n\nContent...",
"links": ["https://example.com/other"],
"screenshot": {
"url": "https://cdn.anyhunt.app/...",
"width": 1920,
"height": 1080
}
}Main Content Extraction
By default, onlyMainContent: true extracts just the main article content using the Readability algorithm.
{
"url": "https://example.com/article",
"formats": ["markdown"],
"onlyMainContent": true
}Set to false to include navigation, sidebars, footers:
{
"onlyMainContent": false
}Format Selection Guide
| Goal | Recommended Format |
|---|---|
| Feed into LLM | markdown |
| Store for search | markdown + metadata |
| Visual archive | screenshot (fullPage) |
| Build sitemap | links |
| Preserve original | rawHtml |
| Web display | html |
Metadata
Metadata is always included with every response:
{
"metadata": {
"title": "Page Title",
"description": "Page description",
"author": "Author Name",
"ogImage": "https://example.com/og.jpg",
"favicon": "https://example.com/favicon.ico",
"language": "en",
"publishedTime": "2024-01-15T10:00:00Z"
}
}