What is Tavily Crawl API?
Last updated: May 5, 2025
Tavily Crawl is a powerful tool that lets you automatically explore and extract content from a website β just by providing a single starting URL. Itβs designed to help you traverse a site like a graph, collecting raw content from multiple pages, making it ideal for data extraction, documentation indexing, and building up-to-date knowledge bases.
π§ What it does
Tavily Crawl starts from a base URL and follows internal links, gathering content from each page it visits. You can control how deep it goes, how many links it follows, and what types of content it extracts. It's especially useful when you want to ingest large portions of a site (like a documentation portal or blog) into your AI applications.
β Key Parameters
Parameter | Description |
| The starting point of the crawl (e.g., |
| How many levels deep the crawler should go (e.g., from homepage β subpage β sub-subpage) |
| How many links to follow per page |
| The maximum number of total pages to crawl |
| (Optional) A natural language instruction to guide the crawler on what content to prioritize |
| (Optional) Regex to focus the crawl on specific URL paths (e.g., |
| (Optional) Regex to limit crawl to specific domains or subdomains |
| Whether to follow links to external domains (default: false) |
| Whether to include image URLs in the result |
| Filter crawl by types of pages (e.g., Documentation, Blog) |
| Set to |
π¦ Example Use Case
You want to extract all documentation pages from docs.example.com and load them into a vector database for a RAG (retrieval-augmented generation) application. With Tavily Crawl, you simply provide the root URL, set filters for /docs/ paths, and choose extract_depth: advanced β the system returns clean raw content for each page it discovers.