Extracting Web Content Using Tavily

Last updated: February 3, 2026

Efficiently extracting content from web pages is crucial for AI-powered applications. Tavily provides two main approaches to content extraction, each suited for different use cases.

1. One-step extraction: directly retrieve raw content

You can extract web content by enabling include_raw_content = true when making a Tavily Search API call. This allows you to retrieve both search results and extracted content in a single step.

However, this can increase latency because you may extract raw content from sources that are not relevant in the first place. It’s recommended to split the process into two steps: running multiple sub-queries to expand the pool of sources, then curating the most relevant documents based on content snippets or source scores. By extracting raw content from the most relevant sources, you get high-quality RAG documents.

Pros of Two-Step Extraction

More control – Extract only from selected URLs.

Higher accuracy – Filter out irrelevant results before extraction.

Advanced extraction capabilities – Using search_depth = "advanced".

Cons of Two-Step Extraction

Slightly more expensive.

Using Advanced Extraction

Using extract_depth = "advanced" in the Extract API allows for more comprehensive content retrieval. This mode is particularly useful when dealing with:

  • Complex web pages with dynamic content, embedded media, or structured data.

  • Tables and structured information that require accurate parsing.

  • Higher success rates.

If precision and depth are priorities for your application, extract_depth = "advanced" is the recommended choice.