This is a mcp tool for crawl4ai. You can use this tool to crawl a website and get its useful content.
What we do:
- Use crawler.arun to fetch a url.
- Use trafilatura to simplify the result html, if the content length is larger then 2048, clip it to 2048.
- If there are media in the page, construct a dict payload to carry the media information. Each media link will match a description with the max length 100.
Installation
pip install -r requirements.txt
crawl4ai-setup
crawl4ai-doctor
MCP config:
{
"mcpServers": {
"crawl4ai": {
"command": "/path/to/fastmcp",
"args": [
"run",
"/path/to/crawl4ai/server.py"
]
}
}
}
You can add this config to your chatbot or agent config files to use crawl4ai-mcp.
Function
- crawl_website: A crawl tool to get the content of a website page, and simplify the content to pure html content. This tool can be used to get the detail information in the url.
- Input:
- website(str): The website url.
- Output:
-
A dict containing the website content.
{
"text": "the html content",
"media": [
{
"type": "image/video/audio",
"description": "A picture with the west lake inside it.",
"link": "https://xxx"
},
...
]
}