mcp image recognition
Provides image recognition capabilities using Anthropic Claude Vision and OpenAI GPT-4 Vision APIs, supporting multiple image formats and offering optional text extraction via Tesseract OCR.
Provides image recognition capabilities using Anthropic Claude Vision and OpenAI GPT-4 Vision APIs, supporting multiple image formats and offering optional text extraction via Tesseract OCR.
An MCP server that provides image recognition capabilities using Anthropic, OpenAI, and Cloudflare Workers AI vision APIs. Version 1.2.1.
Originally this project was created by @mario-andreschak
.Thank you!
It is currently maintained by @zudsniper
.
sudo apt-get install tesseract-ocr
brew install tesseract
Install uv package manager:
pip install uv
Install the package with uvx:
uvx install mcp-image-recognition
Create and configure your environment file as described in the Configuration section
docker pull zudsniper/mcp-image-recognition:latest
# Create a .env file first, then run:
docker run -it --env-file .env zudsniper/mcp-image-recognition
Clone the repository:
git clone https://github.com/zudsniper/mcp-image-recognition.git
cd mcp-image-recognition
Create and configure your environment file:
cp .env.example .env
# Edit .env with your API keys and preferences
Build the project:
pip install -e .
{
"mcpServers": {
"image-recognition": {
"command": "uvx",
"args": [
"mcp-image-recognition"
],
"env": {
"VISION_PROVIDER": "openai",
"OPENAI_API_KEY": "your-api-key",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}
Go to Cursor Settings > MCP and paste with env variables:
VISION_PROVIDER=openai OPENAI_API_KEY=your-api-key OPENAI_MODEL=gpt-4o uvx mcp-image-recognition
Add this to your Claude Desktop config with inline environment:
{
"mcpServers": {
"image-recognition": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"zudsniper/mcp-image-recognition:latest"
],
"env": {
"VISION_PROVIDER": "openai",
"OPENAI_API_KEY": "your-api-key",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}
For Cloudflare configuration:
"env": {
"VISION_PROVIDER": "cloudflare",
"CLOUDFLARE_API_KEY": "your-api-key",
"CLOUDFLARE_ACCOUNT_ID": "your-account-id"
}
If installed with pip/uvx:
mcp-image-recognition
From source directory:
python -m image_recognition_server.server
Using Docker:
docker run -it --env-file .env zudsniper/mcp-image-recognition
Start in development mode with the MCP Inspector:
npx @modelcontextprotocol/inspector mcp-image-recognition
describe_image
Best for: Images uploaded directly to Claude, Cursor, or other chat interfaces
describe_image_from_file
Note: When running in Docker, requires volume mapping (see Docker File Access section)
describe_image_from_url
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.CLOUDFLARE_API_KEY
: Your Cloudflare API key.CLOUDFLARE_ACCOUNT_ID
: Your Cloudflare Account ID.VISION_PROVIDER
: Primary vision provider (anthropic
, openai
, or cloudflare
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
or false
).TESSERACT_CMD
: Optional custom path to Tesseract executable.OPENAI_MODEL
: OpenAI Model (default: gpt-4o-mini
). Can use OpenRouter format for other models (e.g., anthropic/claude-3.5-sonnet:beta
).OPENAI_BASE_URL
: Optional custom base URL for the OpenAI API. Set to https://openrouter.ai/api/v1
for OpenRouter.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for the OpenAI API.CLOUDFLARE_MODEL
: Cloudflare Workers AI model (default: @cf/llava-hf/llava-1.5-7b-hf
).CLOUDFLARE_MAX_TOKENS
: Maximum number of tokens to generate (default: 512
).CLOUDFLARE_TIMEOUT
: Timeout for Cloudflare API requests in seconds (default: 60
).OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
OPENAI_API_KEY
in your .env
file to your OpenRouter API key.OPENAI_BASE_URL
to https://openrouter.ai/api/v1
.OPENAI_MODEL
to the desired model using the OpenRouter format (e.g., anthropic/claude-3.5-sonnet:beta
).VISION_PROVIDER
to openai
.claude-3.5-sonnet-beta
gpt-4o-mini
@cf/llava-hf/llava-1.5-7b-hf
anthropic/claude-3.5-sonnet:beta
format in OPENAI_MODEL
.Clone the repository:
git clone https://github.com/zudsniper/mcp-image-recognition.git
cd mcp-image-recognition
Setup with uv (recommended):
# Install uv if not installed
pip install uv
# Create virtual environment and install deps
uv venv
uv venv activate
uv pip install -e .
uv pip install -e ".[dev]"
Alternative setup with pip:
python -m venv venv source venv/bin/activate # On Windows: venvScriptsactivate pip install -e . # Or alternatively: pip install -r requirements.txt pip install -r requirements-dev.txt
cp .env.example .env
# Edit .env with your API keys
Pass environment file to docker compose:
# Modern Docker Compose V2 syntax
docker compose --env-file .env up -d
Add this to your Claude Desktop config:
{
"mcpServers": {
"image-recognition": {
"command": "docker",
"args": [
"exec",
"-i",
"mcp-image-recognition-dev",
"python",
"-m",
"image_recognition_server.server"
],
"env": {
"VISION_PROVIDER": "openai",
"OPENAI_API_KEY": "your-api-key",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}
Run the MCP server in development mode:
# Install the MCP Inspector if you haven't already
npm install -g @modelcontextprotocol/inspector
# Start the server with the Inspector
npx @modelcontextprotocol/inspector mcp-image-recognition
The Inspector provides a web interface (usually at http://localhost:3000) where you can:
Debug issues with your implementation
Test specific tools:
describe_image
: Provide a base64-encoded imagedescribe_image_from_file
: Provide a path to a local image filedescribe_image_from_url
: Provide a URL to an imageTemporarily modify your Claude Desktop configuration to use your development version:
{
"mcpServers": {
"image-recognition": {
"command": "python",
"args": [
"-m", "image_recognition_server.server"
],
"cwd": "/path/to/your/mcp-image-recognition",
"env": {
"VISION_PROVIDER": "openai",
"OPENAI_API_KEY": "your-api-key",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}
Restart Claude Desktop to apply the changes
Run all tests:
run.bat test
Run specific test suite:
run.bat test server
run.bat test anthropic
run.bat test openai
Build the Docker image:
docker build -t mcp-image-recognition .
Run the container:
docker run -it --env-file .env mcp-image-recognition
When running the MCP server in Docker, the describe_image_from_file
tool can only access files inside the container. By default, the container has no access to files on your host system. To enable access to local files, you must explicitly map directories when configuring the MCP server.
Important Note: When using Claude Desktop, Cursor, or other platforms where images are uploaded to chats, those images are stored on Anthropic's servers and not directly accessible to the MCP server via a filesystem path. In these cases, you should:
1. Use the describe_image
tool (which works with base64-encoded images) for images uploaded directly to the chat
2. Use the new describe_image_from_url
tool for images hosted online
3. For local files, ensure the directory is properly mapped to the Docker container
To give the Docker container access to specific folders on your system, modify your MCP server configuration to include volume mapping:
{
"mcpServers": {
"image-recognition": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"-v", "/path/on/host:/path/in/container",
"zudsniper/mcp-image-recognition:latest"
],
"env": {
"VISION_PROVIDER": "openai",
"OPENAI_API_KEY": "your-api-key",
"OPENAI_MODEL": "gpt-4o"
}
}
}
}
For example, to map your Downloads folder:
- Windows: -v "C:UsersYourNameDownloads:/app/images"
- macOS/Linux: -v "/Users/YourName/Downloads:/app/images"
Then access files using the container path: /app/images/your_image.jpg
MIT License - see LICENSE file for details.
To use Cloudflare Workers AI for image recognition:
.env
file:CLOUDFLARE_API_KEY
: Your Cloudflare API tokenCLOUDFLARE_ACCOUNT_ID
: Your Cloudflare account IDVISION_PROVIDER
: Set to cloudflare
CLOUDFLARE_MODEL
: Optional, defaults to @cf/llava-hf/llava-1.5-7b-hf
Once configured, your AI assistant (Claude, for example) can analyze images by:
Example prompt after uploading an image:
Please describe this image in detail.
You can also customize the prompt for specific needs:
What text appears in this image?
or
Is there any safety concern in this image?
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
To release a new version:
pyproject.toml
and setup.py
release
branchRequired repository secrets for CI/CD:
- DOCKERHUB_USERNAME
- Docker Hub username
- DOCKERHUB_TOKEN
- Docker Hub access token
- PYPI_API_TOKEN
- PyPI API token
[
{
"description": "Describe an image from base64-encoded data. Use for images directly uploaded to chat. Best for: Images uploaded to the current conversation where no public URL exists. Not for: Local files on your computer or images with public URLs. Args: image: Base64-encoded image data prompt: Optional prompt to guide the description Returns: str: Detailed description of the image ",
"inputSchema": {
"properties": {
"image": {
"title": "Image",
"type": "string"
},
"prompt": {
"default": "Please describe this image in detail.",
"title": "Prompt",
"type": "string"
}
},
"required": [
"image"
],
"title": "describe_imageArguments",
"type": "object"
},
"name": "describe_image"
},
{
"description": "Describe an image from a local file path. Requires proper file system access. Best for: Local files when the server has filesystem access to the path. Limitations: When using Docker, requires volume mapping (-v flag) to access host files. Not recommended for: Images uploaded to chat or images with public URLs. Args: filepath: Absolute path to the image file prompt: Optional prompt to guide the description Returns: str: Detailed description of the image ",
"inputSchema": {
"properties": {
"filepath": {
"title": "Filepath",
"type": "string"
},
"prompt": {
"default": "Please describe this image in detail.",
"title": "Prompt",
"type": "string"
}
},
"required": [
"filepath"
],
"title": "describe_image_from_fileArguments",
"type": "object"
},
"name": "describe_image_from_file"
},
{
"description": "Describe an image from a public URL. Most reliable method for web images. Best for: Images with public URLs accessible from the internet. Advantages: Works regardless of server deployment method (local/Docker). Not for: Local files or images already uploaded to the current conversation. Args: url: Direct URL to the image (must be publicly accessible) prompt: Optional prompt to guide the description Returns: str: Detailed description of the image ",
"inputSchema": {
"properties": {
"prompt": {
"default": "Please describe this image in detail.",
"title": "Prompt",
"type": "string"
},
"url": {
"title": "Url",
"type": "string"
}
},
"required": [
"url"
],
"title": "describe_image_from_urlArguments",
"type": "object"
},
"name": "describe_image_from_url"
}
]