
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech and video/image generation APIs. This server allows MCP clients like Claude Desktop, Cursor, Windsurf, OpenAI Agents and others to generate speech, clone voices, generate video, generate image and more.
Documentation
Quickstart with MCP Client
- Get your API key from MiniMax.
- Install
uv
(Python package manager), install with curl -LsSf https://astral.sh/uv/install.sh | sh
or see the uv
repo for additional install methods.
Claude Desktop
Go to Claude > Settings > Developer > Edit Config > claude_desktop_config.json
to include the following:
{
"mcpServers": {
"MiniMax": {
"command": "uvx",
"args": [
"minimax-mcp"
],
"env": {
"MINIMAX_API_KEY": "<insert-your-api-key-here>",
"MINIMAX_MCP_BASE_PATH": "<local-output-dir-path>",
"MINIMAX_API_HOST": "https://api.minimaxi.chat",
"MINIMAX_API_RESOURCE_MODE": "<optional, [url|local], url is default, audio/image/video are downloaded locally or provided in URL format>"
}
}
}
}
⚠️ Warning: The API key needs to match the host. If an error "API Error: invalid api key" occurs, please check your api host:
- Global Host:https://api.minimaxi.chat
(note the extra "i")
- Mainland Host:https://api.minimax.chat
If you're using Windows, you will have to enable "Developer Mode" in Claude Desktop to use the MCP server. Click "Help" in the hamburger menu in the top left and select "Enable Developer Mode".
Cursor
Go to Cursor -> Preferences -> Cursor Settings -> MCP -> Add new global MCP Server
to add above config.
That's it. Your MCP client can now interact with MiniMax through these tools:
Transport
We support two transport types: stdio and sse.
| stdio | SSE |
|:-----|:-----|
| Run locally | Can be deployed locally or in the cloud |
| Communication through stdout
| Communication through network
|
| Input: Supports processing local files
or valid URL
resources | Input: When deployed in the cloud, it is recommended to use URL
for input |
tool |
description |
text_to_audio |
Convert text to audio with a given voice |
list_voices |
List all voices available |
voice_clone |
Clone a voice using provided audio files |
generate_video |
Generate a video from a prompt |
text_to_image |
Generate a image from a prompt |
Example usage
⚠️ Warning: Using these tools may incur costs.
1. broadcast a segment of the evening news

2. clone a voice

3. generate a video

4. generate images

[
{
"description": "Convert text to audio with a given voice and save the output audio file to a given directory.n Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.n Voice id is optional, if not provided, the default voice will be used.nn COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.nn Args:n text (str): The text to convert to speech.n voice_id (str, optional): The id of the voice to use. For example, "male-qn-qingse"/"audiobook_female_1"/"cute_boy"/"Charming_Lady"...n model (string, optional): The model to use.n speed (float, optional): Speed of the generated audio. Controls the speed of the generated speech. Values range from 0.5 to 2.0, with 1.0 being the default speed. n vol (float, optional): Volume of the generated audio. Controls the volume of the generated speech. Values range from 0 to 10, with 1 being the default volume.n pitch (int, optional): Pitch of the generated audio. Controls the speed of the generated speech. Values range from -12 to 12, with 0 being the default speed.n emotion (str, optional): Emotion of the generated audio. Controls the emotion of the generated speech. Values range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"], with "happy" being the default emotion.n sample_rate (int, optional): Sample rate of the generated audio. Controls the sample rate of the generated speech. Values range [8000,16000,22050,24000,32000,44100] with 32000 being the default sample rate.n bitrate (int, optional): Bitrate of the generated audio. Controls the bitrate of the generated speech. Values range [32000,64000,128000,256000] with 128000 being the default bitrate.n channel (int, optional): Channel of the generated audio. Controls the channel of the generated speech. Values range [1, 2] with 1 being the default channel.n format (str, optional): Format of the generated audio. Controls the format of the generated speech. Values range ["pcm", "mp3","flac"] with "mp3" being the default format.n language_boost (str, optional): Language boost of the generated audio. Controls the language boost of the generated speech. Values range ['Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'] with "auto" being the default language boost.n Returns:n Text content with the path to the output file and name of the voice used.n ",
"inputSchema": {
"properties": {
"bitrate": {
"default": 128000,
"title": "Bitrate",
"type": "integer"
},
"channel": {
"default": 1,
"title": "Channel",
"type": "integer"
},
"emotion": {
"default": "happy",
"title": "Emotion",
"type": "string"
},
"format": {
"default": "mp3",
"title": "Format",
"type": "string"
},
"language_boost": {
"default": "auto",
"title": "Language Boost",
"type": "string"
},
"model": {
"default": "speech-02-hd",
"title": "Model",
"type": "string"
},
"output_directory": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Output Directory"
},
"pitch": {
"default": 0,
"title": "Pitch",
"type": "integer"
},
"sample_rate": {
"default": 32000,
"title": "Sample Rate",
"type": "integer"
},
"speed": {
"default": 1,
"title": "Speed",
"type": "number"
},
"text": {
"title": "Text",
"type": "string"
},
"voice_id": {
"default": "female-shaonv",
"title": "Voice Id",
"type": "string"
},
"vol": {
"default": 1,
"title": "Vol",
"type": "number"
}
},
"required": [
"text"
],
"title": "text_to_audioArguments",
"type": "object"
},
"name": "text_to_audio"
},
{
"description": "List all voices available. Only supports when api_host is https://api.minimax.chatnn Args:n voice_type (str, optional): The type of voices to list. Values range ["all", "system", "voice_cloning"], with "all" being the default.n Returns:n Text content with the list of voices.n ",
"inputSchema": {
"properties": {
"voice_type": {
"default": "all",
"title": "Voice Type",
"type": "string"
}
},
"title": "list_voicesArguments",
"type": "object"
},
"name": "list_voices"
},
{
"description": "Clone a voice using provided audio files. The new voice will be charged upon first use.nn COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.nn Args:n voice_id (str): The id of the voice to use.n file (str): The path to the audio file to clone or a URL to the audio file.n text (str, optional): The text to use for the demo audio.n is_url (bool, optional): Whether the file is a URL. Defaults to False.n Returns:n Text content with the voice id of the cloned voice.n ",
"inputSchema": {
"properties": {
"file": {
"title": "File",
"type": "string"
},
"is_url": {
"default": false,
"title": "Is Url",
"type": "boolean"
},
"output_directory": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Output Directory"
},
"text": {
"title": "Text",
"type": "string"
},
"voice_id": {
"title": "Voice Id",
"type": "string"
}
},
"required": [
"voice_id",
"file",
"text"
],
"title": "voice_cloneArguments",
"type": "object"
},
"name": "voice_clone"
},
{
"description": "Play an audio file. Supports WAV and MP3 formats. Not supports video.nn Args:n input_file_path (str): The path to the audio file to play.n is_url (bool, optional): Whether the audio file is a URL.n Returns:n Text content with the path to the audio file.n ",
"inputSchema": {
"properties": {
"input_file_path": {
"title": "Input File Path",
"type": "string"
},
"is_url": {
"default": false,
"title": "Is Url",
"type": "boolean"
}
},
"required": [
"input_file_path"
],
"title": "play_audioArguments",
"type": "object"
},
"name": "play_audio"
},
{
"description": "Generate a video from a prompt.nn COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.nn Args:n model (str, optional): The model to use. Values range ["T2V-01", "T2V-01-Director", "I2V-01", "I2V-01-Director", "I2V-01-live"]. "Director" supports inserting instructions for camera movement control. "I2V" for image to video. "T2V" for text to video.n prompt (str): The prompt to generate the video from. When use Director model, the prompt supports 15 Camera Movement Instructions (Enumerated Values)n -Truck: [Truck left], [Truck right]n -Pan: [Pan left], [Pan right]n -Push: [Push in], [Pull out]n -Pedestal: [Pedestal up], [Pedestal down]n -Tilt: [Tilt up], [Tilt down]n -Zoom: [Zoom in], [Zoom out]n -Shake: [Shake]n -Follow: [Tracking shot]n -Static: [Static shot]n first_frame_image (str): The first frame image. The model must be "I2V" Series.n output_directory (str, optional): The directory to save the video to.n Returns:n Text content with the path to the output video file.n ",
"inputSchema": {
"properties": {
"first_frame_image": {
"default": null,
"title": "first_frame_image",
"type": "string"
},
"model": {
"default": "T2V-01",
"title": "Model",
"type": "string"
},
"output_directory": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Output Directory"
},
"prompt": {
"default": "",
"title": "Prompt",
"type": "string"
}
},
"title": "generate_videoArguments",
"type": "object"
},
"name": "generate_video"
},
{
"description": "Generate a image from a prompt.nn COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.nn Args:n model (str, optional): The model to use. Values range ["image-01"], with "image-01" being the default.n prompt (str): The prompt to generate the image from.n aspect_ratio (str, optional): The aspect ratio of the image. Values range ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"], with "1:1" being the default.n n (int, optional): The number of images to generate. Values range [1, 9], with 1 being the default.n prompt_optimizer (bool, optional): Whether to optimize the prompt. Values range [True, False], with True being the default.n output_directory (str, optional): The directory to save the image to.n Returns:n Text content with the path to the output image file.n ",
"inputSchema": {
"properties": {
"aspect_ratio": {
"default": "1:1",
"title": "Aspect Ratio",
"type": "string"
},
"model": {
"default": "image-01",
"title": "Model",
"type": "string"
},
"n": {
"default": 1,
"title": "N",
"type": "integer"
},
"output_directory": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Output Directory"
},
"prompt": {
"default": "",
"title": "Prompt",
"type": "string"
},
"prompt_optimizer": {
"default": true,
"title": "Prompt Optimizer",
"type": "boolean"
}
},
"title": "text_to_imageArguments",
"type": "object"
},
"name": "text_to_image"
}
]