Output Formats

The Scrape API supports multiple output formats. You can request one or more formats in a single request.

Available Formats

Format	Description	Use Case
`markdown`	Clean Markdown text	Content processing, LLM input
`html`	Cleaned HTML	Web rendering, archiving
`rawHtml`	Original HTML	Debugging, full page analysis
`links`	All page links	Link extraction, site mapping
`screenshot`	Page image	Visual archives, thumbnails

Markdown

The default format. Converts HTML to clean, readable Markdown.

{
  "url": "https://example.com/article",
  "formats": ["markdown"]
}

Features:

GitHub Flavored Markdown (tables, task lists, strikethrough)
Code blocks with syntax hints preserved
Images converted to ![alt](url) format
Links preserved as [text](url)
Script, style, and SVG tags removed

Response:

{
  "markdown": "# Article Title\n\nThis is the article content with **bold** and *italic* text.\n\n## Section\n\nMore content here..."
}

HTML

Cleaned HTML with scripts and styles removed.

{
  "url": "https://example.com/article",
  "formats": ["html"]
}

Features:

Script and style tags removed
Inline event handlers removed
Clean, safe HTML output

Response:

{
  "html": "<article><h1>Article Title</h1><p>This is the article content...</p></article>"
}

Raw HTML

The original, unmodified HTML from the page.

{
  "url": "https://example.com",
  "formats": ["rawHtml"]
}

Use cases:

Debugging rendering issues
Full page analysis
Preserving original structure

Response:

{
  "rawHtml": "<!DOCTYPE html><html><head>...</head><body>...</body></html>"
}

Links

Extract all hyperlinks from the page.

{
  "url": "https://example.com",
  "formats": ["links"]
}

Features:

Unique links only (deduplicated)
HTTP/HTTPS links only
JavaScript links excluded
Absolute URLs

Response:

{
  "links": [
    "https://example.com/about",
    "https://example.com/contact",
    "https://external-site.com/page"
  ]
}

Screenshot

Capture a visual image of the page.

{
  "url": "https://example.com",
  "formats": ["screenshot"],
  "screenshotOptions": {
    "fullPage": true,
    "format": "webp",
    "quality": 85
  }
}

Options:

Option	Type	Default	Description
`fullPage`	boolean	`false`	Capture full page height
`format`	string	`"png"`	`png`, `jpeg`, `webp`
`quality`	number	80	Image quality (1-100)
`clip`	string	-	CSS selector for element screenshot
`response`	string	`"url"`	`url` or `base64`

Response:

{
  "screenshot": {
    "url": "https://cdn.anyhunt.app/scraper/abc123.webp",
    "width": 1920,
    "height": 3500,
    "format": "webp",
    "fileSize": 245000,
    "expiresAt": "2024-02-15T10:30:00.000Z"
  }
}

Multiple Formats

Request multiple formats in a single API call:

{
  "url": "https://example.com/article",
  "formats": ["markdown", "links", "screenshot"]
}

Response:

{
  "markdown": "# Article\n\nContent...",
  "links": ["https://example.com/other"],
  "screenshot": {
    "url": "https://cdn.anyhunt.app/...",
    "width": 1920,
    "height": 1080
  }
}

Main Content Extraction

By default, onlyMainContent: true extracts just the main article content using the Readability algorithm.

{
  "url": "https://example.com/article",
  "formats": ["markdown"],
  "onlyMainContent": true
}

Set to false to include navigation, sidebars, footers:

{
  "onlyMainContent": false
}

Format Selection Guide

Goal	Recommended Format
Feed into LLM	`markdown`
Store for search	`markdown` + metadata
Visual archive	`screenshot` (fullPage)
Build sitemap	`links`
Preserve original	`rawHtml`
Web display	`html`

Metadata

Metadata is always included with every response:

{
  "metadata": {
    "title": "Page Title",
    "description": "Page description",
    "author": "Author Name",
    "ogImage": "https://example.com/og.jpg",
    "favicon": "https://example.com/favicon.ico",
    "language": "en",
    "publishedTime": "2024-01-15T10:00:00Z"
  }
}

Output Formats

On this page