biomart mcp

Local 2025-08-31 23:20:44 0

A Model Context Protocol server that interfaces with Biomart databases, allowing models to discover biological datasets, explore attributes/filters, retrieve biological data, and translate between different biological identifiers.

A MCP server to interface with Biomart

Model Context Protocol (MCP) is an open protocol that standardizes how applications provide context to LLMs developed by Anthropic. Here we use the MCP python-sdk to create a MCP server that interfaces with Biomart via the pybiomart package.

Demo showing biomart-mcp in action

There is a short demo video showing the MCP server in action on Claude Desktop.

Installation

Clone the repository

git clone https://github.com/jzinno/biomart-mcp.git
cd biomart-mcp

Claude Desktop

uv run --with mcp[cli] mcp install --with pybiomart biomart-mcp.py

Cursor

Via Cusror's agent mode, other models can take advantage of MCP servers as well, such as those form OpenAI or DeepSeek. Click the cursor setting cogwheel and naviagate to Features -> MCP Servers -> Add new MCP Server. Set the name to biomart (or whatever you like) and Type to command.

Set the command to:

uv run --with mcp[cli] --with pybiomart mcp run /your/path/to/biomart-mcp.py

Glama

Development

# Create a virtual environment
uv venv

# MacOS/Linux
source .venv/bin/activate

# Windows
.venvScriptsactivate

uv sync #or uv add mcp[cli] pybiomart

# Run the server in dev mode
mcp dev biomart-mcp.py

Features

Biomart-MCP provides several tools to interact with Biomart databases:

Mart and Dataset Discovery: List available marts and datasets to explore the Biomart database structure
Attribute and Filter Exploration: View common or all available attributes and filters for specific datasets
Data Retrieval: Query Biomart with specific attributes and filters to get biological data
ID Translation: Convert between different biological identifiers (e.g., gene symbols to Ensembl IDs)

Contributing

Pull requests are welcome! Some small notes on development:

We are only using @mcp.tool() here by design, this is to maximize compatibility with clients that support MCP as seen in the docs.
We are using @lru_cache to cache results of functions that are computationally expensive or make external API calls.
We need to be mindful to not blow up the context window of the model, for example you'll see df.to_csv(index=False).replace("r", "") in many places. This csv style return is much more token efficient than something like df.to_string() where the majority of the tokens are whitespace. Also be mindful of the fact that pulling all genes from a chromosome or similar large request will also be too large for the context window.

Potential Future Features

There of course many more features that could be added, some maybe beyond the scope of the name biomart-mcp. Here are some ideas:

Add webscraping for resource sites with bs4, for example we got the Ensembl gene ID for NOTCH1 then maybe in some cases it would be usful to grap the collated Comments and Description Text from UniProtKB section from it's page on UCSC
$...$

[
  {
    "description": "n    Lists all available Biomart marts (databases) from Ensembl.nn    Biomart organizes biological data in a hierarchy: MART -> DATASET -> ATTRIBUTES/FILTERS.n    This function returns all available marts as a CSV string.nn    Returns:n        str: CSV-formatted table of all marts with their display names and descriptions.nn    Example:n        list_marts()n        >>> "name,display_name,descriptionn             ENSEMBL_MART_ENSEMBL,Ensembl Genes,Gene annotation from Ensembln             ENSEMBL_MART_MOUSE,Mouse strains,Strain-specific data for mousen             ..."n    ",
    "inputSchema": {
      "properties": {},
      "title": "list_martsArguments",
      "type": "object"
    },
    "name": "list_marts"
  },
  {
    "description": "n    Lists all available biomart datasets for a given mart.nn    Each mart contains multiple datasets. This function returns all datasetsn    available in the specified mart as a CSV string.nn    Args:n        mart (str): The mart identifier to list datasets from.n            Valid values include: ENSEMBL_MART_ENSEMBL, ENSEMBL_MART_MOUSE,n            ENSEMBL_MART_ONTOLOGY, ENSEMBL_MART_GENOMIC, ENSEMBL_MART_SNP,n            ENSEMBL_MART_FUNCGENnn    Returns:n        str: CSV-formatted table of all datasets with their display names and descriptions.nn    Example:n        list_datasets("ENSEMBL_MART_ENSEMBL")n        >>> "name,display_name,descriptionn             hsapiens_gene_ensembl,Human genes,Human genes (GRCh38.p13)n             mmusculus_gene_ensembl,Mouse genes,Mouse genes (GRCm39)n             ..."n    ",
    "inputSchema": {
      "properties": {
        "mart": {
          "title": "Mart",
          "type": "string"
        }
      },
      "required": [
        "mart"
      ],
      "title": "list_datasetsArguments",
      "type": "object"
    },
    "name": "list_datasets"
  },
  {
    "description": "n    Lists commonly used attributes available for a given dataset.nn    This function returns only the most frequently used attributes (defined in COMMON_ATTRIBUTES)n    to avoid overwhelming the model with too many options. For a complete list,n    use list_all_attributes.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")nn    Returns:n        str: CSV-formatted table of common attributes with their display names and descriptions.nn    Example:n        list_common_attributes("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl")n        >>> "name,display_name,descriptionn             ensembl_gene_id,Gene stable ID,Ensembl stable ID for the genen             external_gene_name,Gene name,The gene namen             ..."n    ",
    "inputSchema": {
      "properties": {
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset"
      ],
      "title": "list_common_attributesArguments",
      "type": "object"
    },
    "name": "list_common_attributes"
  },
  {
    "description": "n    Lists all available attributes for a given dataset with some filtering.nn    This function returns a filtered list of all attributes available for the specifiedn    dataset. Some less commonly used attributes (homologs, microarray probes) aren    filtered out to reduce the response size.nn    CAUTION: This function can return a large number of attributes and may be unstablen    for certain datasets. Consider using list_common_attributes first.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")nn    Returns:n        str: CSV-formatted table of all filtered attributes.nn    Example:n        list_all_attributes("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl")n    ",
    "inputSchema": {
      "properties": {
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset"
      ],
      "title": "list_all_attributesArguments",
      "type": "object"
    },
    "name": "list_all_attributes"
  },
  {
    "description": "n    Lists all available filters for a given dataset.nn    Filters are used to narrow down the results of a Biomart query.n    This function returns all filters that can be applied to the specified dataset.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")nn    Returns:n        str: CSV-formatted table of all filters with their display names and descriptions.nn    Example:n        list_filters("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl")n        >>> "name,descriptionn             chromosome_name,Chromosome/scaffold namen             start,Gene start (bp)n             end,Gene end (bp)n             ..."n    ",
    "inputSchema": {
      "properties": {
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset"
      ],
      "title": "list_filtersArguments",
      "type": "object"
    },
    "name": "list_filters"
  },
  {
    "description": "n    Queries Biomart for data using specified attributes and filters.nn    This function performs the main data retrieval from Biomart, allowing you ton    query biological data by specifying which attributes to return and which filtersn    to apply. Includes automatic retry logic for resilience.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")n        attributes (list[str]): List of attributes to retrieve (e.g., ["ensembl_gene_id", "external_gene_name"])n        filters (dict[str, str]): Dictionary of filters to apply (e.g., {"chromosome_name": "1"})nn    Returns:n        str: CSV-formatted results of the query.nn    Example:n        get_data(n            "ENSEMBL_MART_ENSEMBL",n            "hsapiens_gene_ensembl",n            ["ensembl_gene_id", "external_gene_name", "chromosome_name"],n            {"chromosome_name": "X", "biotype": "protein_coding"}n        )n        >>> "ensembl_gene_id,external_gene_name,chromosome_namen             ENSG00000000003,TSPAN6,Xn             ENSG00000000005,TNMD,Xn             ..."n    ",
    "inputSchema": {
      "properties": {
        "attributes": {
          "items": {
            "type": "string"
          },
          "title": "Attributes",
          "type": "array"
        },
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "filters": {
          "additionalProperties": {
            "type": "string"
          },
          "title": "Filters",
          "type": "object"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset",
        "attributes",
        "filters"
      ],
      "title": "get_dataArguments",
      "type": "object"
    },
    "name": "get_data"
  },
  {
    "description": "n    Translates a single identifier from one attribute type to another.nn    This function allows conversion between different identifier types, such asn    converting a gene symbol to an Ensembl ID. Results are cached to improve performance.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")n        from_attr (str): The source attribute name (e.g., "hgnc_symbol")n        to_attr (str): The target attribute name (e.g., "ensembl_gene_id")n        target (str): The identifier value to translate (e.g., "TP53")nn    Returns:n        str: The translated identifier, or an error message if not found.nn    Example:n        get_translation("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl", "hgnc_symbol", "ensembl_gene_id", "TP53")n        >>> "ENSG00000141510"n    ",
    "inputSchema": {
      "properties": {
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "from_attr": {
          "title": "From Attr",
          "type": "string"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        },
        "target": {
          "title": "Target",
          "type": "string"
        },
        "to_attr": {
          "title": "To Attr",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset",
        "from_attr",
        "to_attr",
        "target"
      ],
      "title": "get_translationArguments",
      "type": "object"
    },
    "name": "get_translation"
  },
  {
    "description": "n    Translates multiple identifiers in a single batch operation.nn    This function is more efficient than multiple calls to get_translation whenn    you need to translate many identifiers at once.nn    Args:n        mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")n        dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")n        from_attr (str): The source attribute name (e.g., "hgnc_symbol")n        to_attr (str): The target attribute name (e.g., "ensembl_gene_id")n        targets (list[str]): List of identifier values to translate (e.g., ["TP53", "BRCA1", "BRCA2"])nn    Returns:n        dict: A dictionary containing:n            - translations: Dictionary mapping input IDs to translated IDsn            - not_found: List of IDs that could not be translatedn            - found_count: Number of successfully translated IDsn            - not_found_count: Number of IDs that could not be translatednn    Example:n        batch_translate("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl", "hgnc_symbol", "ensembl_gene_id", ["TP53", "BRCA1", "BRCA2"])n        >>> {"translations": {"TP53": "ENSG00000141510", "BRCA1": "ENSG00000012048"}, "not_found": ["BRCA2"], "found_count": 2, "not_found_count": 1}n    ",
    "inputSchema": {
      "properties": {
        "dataset": {
          "title": "Dataset",
          "type": "string"
        },
        "from_attr": {
          "title": "From Attr",
          "type": "string"
        },
        "mart": {
          "title": "Mart",
          "type": "string"
        },
        "targets": {
          "items": {
            "type": "string"
          },
          "title": "Targets",
          "type": "array"
        },
        "to_attr": {
          "title": "To Attr",
          "type": "string"
        }
      },
      "required": [
        "mart",
        "dataset",
        "from_attr",
        "to_attr",
        "targets"
      ],
      "title": "batch_translateArguments",
      "type": "object"
    },
    "name": "batch_translate"
  }
]