webSearch Tools

Local 2025-09-01 00:47:31 0

search @josemartinrodriguezmortaloni/webSearch-Tools

Built as a Model Context Protocol (MCP) server that provides advanced web search, content extraction, web crawling, and scraping capabilities using the Firecrawl API.

A powerful web search and content extraction tool built with Python, leveraging the Firecrawl API for advanced web scraping, searching, and content analysis capabilities.

? Features

Advanced Web Search: Perform intelligent web searches with customizable parameters
Content Extraction: Extract specific information from web pages using natural language prompts
Web Crawling: Crawl websites with configurable depth and limits
Web Scraping: Scrape web pages with support for various output formats
MCP Integration: Built as a Model Context Protocol (MCP) server for seamless integration

? Prerequisites

Python 3.8 or higher
uv package manager
Firecrawl API key
OpenAI API key (optional, for enhanced features)
Tavily API key (optional, for additional search capabilities)

?️ Installation

Install uv:

# On Windows (using pip)
pip install uv

# On Unix/MacOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# Add uv to PATH (Unix/MacOS)
export PATH="$HOME/.local/bin:$PATH"

# Add uv to PATH (Windows - add to Environment Variables)
# Add: %USERPROFILE%.localin

Clone the repository:

git clone https://github.com/yourusername/websearch.git
cd websearch

Create and activate a virtual environment with uv:

# Create virtual environment
uv venv

# Activate on Windows
..venvScriptsactivate.ps1

# Activate on Unix/MacOS
source .venv/bin/activate

Install dependencies with uv:

# Install from requirements.txt
uv sync

Set up environment variables:

# Create .env file
touch .env

# Add your API keys
FIRECRAWL_API_KEY=your_firecrawl_api_key
OPENAI_API_KEY=your_openai_api_key

? Usage

Setting Up With Claude for Desktop

Instead of running the server directly, you can configure Claude for Desktop to access the WebSearch tools:

Locate or create your Claude for Desktop configuration file:
Windows: %env:AppData%Claudeclaude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Add the WebSearch server configuration to the mcpServers section:

{
  "mcpServers": {
    "websearch": {
      "command": "uv",
      "args": [
        "--directory",
        "D:ABSOLUTEPATHTOWebSearch",
        "run",
        "main.py"
      ]
    }
  }
}

Make sure to replace the directory path with the absolute path to your WebSearch project folder.
Save the configuration file and restart Claude for Desktop.
Once configured, the WebSearch tools will appear in the tools menu (hammer icon) in Claude for Desktop.

Available Tools

Search
Extract Information
Crawl Websites
Scrape Content

? API Reference

Search

query (str): The search query
Returns: Search results in JSON format

Extract

urls (List[str]): List of URLs to extract information from
prompt (str): Instructions for extraction
enableWebSearch (bool): Enable supplementary web search
showSources (bool): Include source references
Returns: Extracted information in specified format

Crawl

url (str): Starting URL
maxDepth (int): Maximum crawl depth
limit (int): Maximum pages to crawl
Returns: Crawled content in markdown/HTML format

Scrape

url (str): Target URL
Returns: Scraped content with optional screenshots

? Configuration

Environment Variables

The tool requires certain API keys to function. We provide a .env.example file that you can use as a template:

Copy the example file:

# On Unix/MacOS
cp .env.example .env

# On Windows
copy .env.example .env

Edit the .env file with your API keys:

# OpenAI API key - Required for AI-powered features
OPENAI_API_KEY=your_openai_api_key_here

# Firecrawl API key - Required for web scraping and searching
FIRECRAWL_API_KEY=your_firecrawl_api_key_here

Getting the API Keys

OpenAI API Key:
Visit OpenAI's platform
Sign up or log in
Navigate to API keys section
Create a new secret key
Firecrawl API Key:
Visit Firecrawl's website
Create an account
Navigate to your dashboard
Generate a new API key

If everything is configured correctly, you should receive a JSON response with search results.

Troubleshooting

If you encounter errors:

Ensure all required API keys are set in your .env file
Verify the API keys are valid and have not expired
Check that the .env file is in the root directory of the project
Make sure the environment variables are being loaded correctly

? Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

? License

This project is licensed under the MIT License - see the LICENSE file for details.

? Acknowledgments

Firecrawl for their powerful web scraping API
OpenAI for AI capabilities
MCPThe MCP community for the protocol specification

? Contact

José Martín Rodriguez Mortaloni - @m4s1t425 - [email protected]

Made with ❤️ using Python and Firecrawl

[
  {
    "description": "Performs web searches and retrieves up-to-date information from the internet.     Args:     - prompt: Specific query or topic to search for on the internet     - limit: Maximum number of results to return (between 1 and 20)      Returns:     - Search results with relevant information about the requested topic     ",
    "inputSchema": {
      "properties": {
        "query": {
          "title": "Query",
          "type": "string"
        }
      },
      "required": [
        "query"
      ],
      "title": "searchArguments",
      "type": "object"
    },
    "name": "search"
  },
  {
    "description": "Crawls a website starting from the specified URL and extracts content from multiple pages.     Args:     - url: The complete URL of the web page to start crawling from     - maxDepth: The maximum depth level for crawling linked pages     - limit: The maximum number of pages to crawl      Returns:     - Content extracted from the crawled pages in markdown and HTML format     ",
    "inputSchema": {
      "properties": {
        "limit": {
          "title": "Limit",
          "type": "integer"
        },
        "maxDepth": {
          "title": "Maxdepth",
          "type": "integer"
        },
        "url": {
          "title": "Url",
          "type": "string"
        }
      },
      "required": [
        "url",
        "maxDepth",
        "limit"
      ],
      "title": "crawlArguments",
      "type": "object"
    },
    "name": "crawl"
  },
  {
    "description": "Extracts specific information from a web page based on a prompt.     Args:     - url: The complete URL of the web page to extract information from     - prompt: Instructions specifying what information to extract from the page     - enabaleWebSearch: Whether to allow web searches to supplement the extraction     - showSources: Whether to include source references in the response      Returns:     - Extracted information from the web page based on the prompt     ",
    "inputSchema": {
      "properties": {
        "enabaleWebSearch": {
          "title": "Enabalewebsearch",
          "type": "boolean"
        },
        "prompt": {
          "title": "Prompt",
          "type": "string"
        },
        "showSources": {
          "title": "Showsources",
          "type": "boolean"
        },
        "url": {
          "items": {
            "type": "string"
          },
          "title": "Url",
          "type": "array"
        }
      },
      "required": [
        "url",
        "prompt",
        "enabaleWebSearch",
        "showSources"
      ],
      "title": "extractArguments",
      "type": "object"
    },
    "name": "extract"
  },
  {
    "description": "",
    "inputSchema": {
      "properties": {
        "url": {
          "title": "Url",
          "type": "string"
        }
      },
      "required": [
        "url"
      ],
      "title": "scrapeArguments",
      "type": "object"
    },
    "name": "scrape"
  }
]