mcp iceberg service

Local 2025-09-01 00:20:09 0

A Model Context Protocol server that provides a SQL interface for querying and managing Apache Iceberg tables through Claude desktop, allowing natural language interaction with Iceberg data lakes.


smithery badge

A MCP (Model Context Protocol) server implementation for interacting with Apache Iceberg. This server provides a SQL interface for querying and managing Iceberg tables through Claude desktop.

Claude Desktop as your Iceberg Data Lake Catalog

image

How to Install in Claude Desktop

Installing via Smithery

To install MCP Iceberg Catalog for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @ahodroj/mcp-iceberg-service --client claude
  1. Prerequisites
  2. Python 3.10 or higher
  3. UV package installer (recommended) or pip
  4. Access to an Iceberg REST catalog and S3-compatible storage

  5. How to install in Claude Desktop Add the following configuration to claude_desktop_config.json:

{
  "mcpServers": {
    "iceberg": {
      "command": "uv",
      "args": [
        "--directory",
        "PATH_TO_/mcp-iceberg-service",
        "run",
        "mcp-server-iceberg"
      ],
      "env": {
        "ICEBERG_CATALOG_URI" : "http://localhost:8181",
        "ICEBERG_WAREHOUSE" : "YOUR ICEBERG WAREHOUSE NAME",
        "S3_ENDPOINT" : "OPTIONAL IF USING S3",
        "AWS_ACCESS_KEY_ID" : "YOUR S3 ACCESS KEY",
        "AWS_SECRET_ACCESS_KEY" : "YOUR S3 SECRET KEY"
      }
    }
  }
}

Design

Architecture

The MCP server is built on three main components:

  1. MCP Protocol Handler
  2. Implements the Model Context Protocol for communication with Claude
  3. Handles request/response cycles through stdio
  4. Manages server lifecycle and initialization

  5. Query Processor

  6. Parses SQL queries using sqlparse
  7. Supports operations:

    • LIST TABLES
    • DESCRIBE TABLE
    • SELECT
    • INSERT
  8. Iceberg Integration

  9. Uses pyiceberg for table operations
  10. Integrates with PyArrow for efficient data handling
  11. Manages catalog connections and table operations

PyIceberg Integration

The server utilizes PyIceberg in several ways:

  1. Catalog Management
  2. Connects to REST catalogs
  3. Manages table metadata
  4. Handles namespace operations

  5. Data Operations

  6. Converts between PyIceberg and PyArrow types
  7. Handles data insertion through PyArrow tables
  8. Manages table schemas and field types

  9. Query Execution

  10. Translates SQL to PyIceberg operations
  11. Handles data scanning and filtering
  12. Manages result set conversion

Further Implementation Needed

  1. Query Operations
  2. Implement UPDATE operations
  3. Add DELETE support
  4. Support for CREATE TABLE with schema definition
  5. Add ALTER TABLE operations
  6. Implement table partitioning support

  7. Data Types

  8. Support for complex types (arrays, maps, structs)
  9. Add timestamp with timezone handling
  10. Support for decimal types
  11. Add nested field support

  12. Performance Improvements

  13. Implement batch inserts
  14. Add query optimization
  15. Support for parallel scans
  16. Add caching layer for frequently accessed data

  17. Security Features

  18. Add authentication mechanisms
  19. Implement role-based access control
  20. Add row-level security
  21. Support for encrypted connections

  22. Monitoring and Management

  23. Add metrics collection
  24. Implement query logging
  25. Add performance monitoring
  26. Support for table maintenance operations

  27. Error Handling

  28. Improve error messages
  29. Add retry mechanisms for transient failures
  30. Implement transaction support
  31. Add data validation
[
  {
    "description": "Execute a query on Iceberg tables",
    "inputSchema": {
      "properties": {
        "query": {
          "description": "Query to execute (supports: LIST TABLES, DESCRIBE TABLE, SELECT, CREATE TABLE)",
          "type": "string"
        }
      },
      "required": [
        "query"
      ],
      "type": "object"
    },
    "name": "execute_query"
  }
]