What is the Tavily Extract API?

Last updated: March 10, 2025

The Tavily Extract API allows you to extract content from web pages efficiently. This article explains how to use the Extract API, its features, and provides examples to help you get started.

Overview

Tavily Extract is a powerful tool that lets you extract content from URLs. You can extract content from a single URL or multiple URLs (up to 20) in one request. The API supports both basic and advanced extraction modes to suit your needs.

API Parameters

The Extract API accepts the following parameters:

- urls (Required): Either a single URL string or a list of URLs (maximum 20) - include_images: Boolean flag to include extracted images (defaults to False) - extract_depth: Extraction depth setting ("basic" or "advanced")

- "basic": Standard extraction (1 API Credit per 5 successful extractions)

- "advanced": More comprehensive extraction including tables and embedded content (2 API Credits per 5 successful extractions)

## Response Format

The API returns a JSON response containing:

1. results: List of successfully extracted content

2. failed_results: List of URLs that couldn't be processed

3. response_time: Time taken to complete the request

Successful Results Include:

- URL of the webpage

- Extracted raw content

- List of image URLs (if include_images is enabled)

Failed Results Include:

- Failed URL

- Error message explaining the failure reason

Example Usage

Here's a simple example using Python:

from tavily import TavilyClient
tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY")
urls = [ "https://en.wikipedia.org/wiki/Artificial_intelligence", "https://en.wikipedia.org/wiki/Machine_learning", "https://en.wikipedia.org/wiki/Data_science" ]

Execute extraction

response = tavily_client.extract( urls=urls, include_images=True )
print(response)

Best Practices

Choose the appropriate extract_depth based on your needs:
- Use "basic" for standard content extraction
- Use "advanced" when you need tables and embedded content
Monitor your API credit usage, especially when using advanced extraction
Handle failed results in your application logic