Anyhunt
Guides

Output Formats

Understanding the different content formats available

Output Formats

The Scrape API supports multiple output formats. You can request one or more formats in a single request.

Available Formats

FormatDescriptionUse Case
markdownClean Markdown textContent processing, LLM input
htmlCleaned HTMLWeb rendering, archiving
rawHtmlOriginal HTMLDebugging, full page analysis
linksAll page linksLink extraction, site mapping
screenshotPage imageVisual archives, thumbnails

Markdown

The default format. Converts HTML to clean, readable Markdown.

{
  "url": "https://example.com/article",
  "formats": ["markdown"]
}

Features:

  • GitHub Flavored Markdown (tables, task lists, strikethrough)
  • Code blocks with syntax hints preserved
  • Images converted to ![alt](url) format
  • Links preserved as [text](url)
  • Script, style, and SVG tags removed

Response:

{
  "markdown": "# Article Title\n\nThis is the article content with **bold** and *italic* text.\n\n## Section\n\nMore content here..."
}

HTML

Cleaned HTML with scripts and styles removed.

{
  "url": "https://example.com/article",
  "formats": ["html"]
}

Features:

  • Script and style tags removed
  • Inline event handlers removed
  • Clean, safe HTML output

Response:

{
  "html": "<article><h1>Article Title</h1><p>This is the article content...</p></article>"
}

Raw HTML

The original, unmodified HTML from the page.

{
  "url": "https://example.com",
  "formats": ["rawHtml"]
}

Use cases:

  • Debugging rendering issues
  • Full page analysis
  • Preserving original structure

Response:

{
  "rawHtml": "<!DOCTYPE html><html><head>...</head><body>...</body></html>"
}

Extract all hyperlinks from the page.

{
  "url": "https://example.com",
  "formats": ["links"]
}

Features:

  • Unique links only (deduplicated)
  • HTTP/HTTPS links only
  • JavaScript links excluded
  • Absolute URLs

Response:

{
  "links": [
    "https://example.com/about",
    "https://example.com/contact",
    "https://external-site.com/page"
  ]
}

Screenshot

Capture a visual image of the page.

{
  "url": "https://example.com",
  "formats": ["screenshot"],
  "screenshotOptions": {
    "fullPage": true,
    "format": "webp",
    "quality": 85
  }
}

Options:

OptionTypeDefaultDescription
fullPagebooleanfalseCapture full page height
formatstring"png"png, jpeg, webp
qualitynumber80Image quality (1-100)
clipstring-CSS selector for element screenshot
responsestring"url"url or base64

Response:

{
  "screenshot": {
    "url": "https://cdn.anyhunt.app/scraper/abc123.webp",
    "width": 1920,
    "height": 3500,
    "format": "webp",
    "fileSize": 245000,
    "expiresAt": "2024-02-15T10:30:00.000Z"
  }
}

Multiple Formats

Request multiple formats in a single API call:

{
  "url": "https://example.com/article",
  "formats": ["markdown", "links", "screenshot"]
}

Response:

{
  "markdown": "# Article\n\nContent...",
  "links": ["https://example.com/other"],
  "screenshot": {
    "url": "https://cdn.anyhunt.app/...",
    "width": 1920,
    "height": 1080
  }
}

Main Content Extraction

By default, onlyMainContent: true extracts just the main article content using the Readability algorithm.

{
  "url": "https://example.com/article",
  "formats": ["markdown"],
  "onlyMainContent": true
}

Set to false to include navigation, sidebars, footers:

{
  "onlyMainContent": false
}

Format Selection Guide

GoalRecommended Format
Feed into LLMmarkdown
Store for searchmarkdown + metadata
Visual archivescreenshot (fullPage)
Build sitemaplinks
Preserve originalrawHtml
Web displayhtml

Metadata

Metadata is always included with every response:

{
  "metadata": {
    "title": "Page Title",
    "description": "Page description",
    "author": "Author Name",
    "ogImage": "https://example.com/og.jpg",
    "favicon": "https://example.com/favicon.ico",
    "language": "en",
    "publishedTime": "2024-01-15T10:00:00Z"
  }
}