mcp doc forge

Local 2025-08-31 23:10:42 0

Provides comprehensive document processing, including reading, converting, and manipulating various document formats with advanced text and HTML processing capabilities.


smithery badge

A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.

Simple Document Processing Server MCP server

Features

Document Reader

  • Read DOCX, PDF, TXT, HTML, CSV

Document Conversion

  • DOCX to HTML/PDF conversion
  • HTML to TXT/Markdown conversion
  • PDF manipulation (merge, split)

Text Processing

  • Multi-encoding transfer support (UTF-8, Big5, GBK)
  • Text formatting and cleaning
  • Text comparison and diff generation
  • Text splitting by lines or delimiter

HTML Processing

  • HTML cleaning and formatting
  • Resource extraction (images, links, videos)
  • Structure-preserving conversion

Installation

Installing via Smithery

To install Document Processing Server for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude

Manual Installation

npm install -g @cablate/mcp-doc-forge

Usage

Cli

mcp-doc-forge

With Dive Desktop

  1. Click "+ Add MCP Server" in Dive Desktop
  2. Copy and paste this configuration:
{
  "mcpServers": {
    "searxng": {
      "command": "npx",
      "args": [
        "-y",
        "@cablate/mcp-doc-forge"
      ],
      "enabled": true
    }
  }
}
  1. Click "Save" to install the MCP server

License

MIT

Contributing

Welcome community participation and contributions! Here are ways to contribute:

  • ⭐️ Star the project if you find it helpful
  • ? Submit Issues: Report problems or provide suggestions
  • ? Create Pull Requests: Submit code improvements

Contact

If you have any questions or suggestions, feel free to reach out:

  • ? Email: [email protected]
  • ? GitHub: CabLate
  • ? Collaboration: Welcome to discuss project cooperation
  • ? Technical Guidance: Sincere welcome for suggestions and guidance
[
  {
    "description": "Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv",
    "inputSchema": {
      "properties": {
        "filePath": {
          "description": "Path to the file to be read",
          "type": "string"
        }
      },
      "required": [
        "filePath"
      ],
      "type": "object"
    },
    "name": "document_reader"
  },
  {
    "description": "Merge multiple PDF files into one",
    "inputSchema": {
      "properties": {
        "inputPaths": {
          "description": "Paths to the input PDF files",
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        "outputDir": {
          "description": "Directory where merged PDFs should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPaths",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "pdf_merger"
  },
  {
    "description": "Split a PDF file into multiple files",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input PDF file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where split PDFs should be saved",
          "type": "string"
        },
        "pageRanges": {
          "description": "Array of page ranges to split",
          "items": {
            "properties": {
              "end": {
                "type": "number"
              },
              "start": {
                "type": "number"
              }
            },
            "type": "object"
          },
          "type": "array"
        }
      },
      "required": [
        "inputPath",
        "outputDir",
        "pageRanges"
      ],
      "type": "object"
    },
    "name": "pdf_splitter"
  },
  {
    "description": "Convert DOCX files to PDF format",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input DOCX file",
          "type": "string"
        },
        "outputPath": {
          "description": "Path where the output PDF file should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputPath"
      ],
      "type": "object"
    },
    "name": "docx_to_pdf"
  },
  {
    "description": "Convert DOCX to HTML while preserving formatting",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input DOCX file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where HTML should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "docx_to_html"
  },
  {
    "description": "Clean HTML by removing unnecessary tags and attributes",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input HTML file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where cleaned HTML should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "html_cleaner"
  },
  {
    "description": "Convert HTML to plain text while preserving structure",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input HTML file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where text file should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "html_to_text"
  },
  {
    "description": "Convert HTML to Markdown format",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input HTML file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where Markdown file should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "html_to_markdown"
  },
  {
    "description": "Extract all resources (images, videos, links) from HTML",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input HTML file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where resources should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "html_extract_resources"
  },
  {
    "description": "Format and beautify HTML code",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input HTML file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where formatted HTML should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "html_formatter"
  },
  {
    "description": "Compare two text files and show differences",
    "inputSchema": {
      "properties": {
        "file1Path": {
          "description": "Path to the first text file",
          "type": "string"
        },
        "file2Path": {
          "description": "Path to the second text file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where diff result should be saved",
          "type": "string"
        }
      },
      "required": [
        "file1Path",
        "file2Path",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "text_diff"
  },
  {
    "description": "Split text file by specified delimiter or line count",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input text file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where split files should be saved",
          "type": "string"
        },
        "splitBy": {
          "description": "Split method: by line count or delimiter",
          "enum": [
            "lines",
            "delimiter"
          ],
          "type": "string"
        },
        "value": {
          "description": "Line count (number) or delimiter string",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir",
        "splitBy",
        "value"
      ],
      "type": "object"
    },
    "name": "text_splitter"
  },
  {
    "description": "Format text with proper indentation and line spacing",
    "inputSchema": {
      "properties": {
        "inputPath": {
          "description": "Path to the input text file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where formatted file should be saved",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir"
      ],
      "type": "object"
    },
    "name": "text_formatter"
  },
  {
    "description": "Convert text between different encodings",
    "inputSchema": {
      "properties": {
        "fromEncoding": {
          "description": "Source encoding (e.g., 'big5', 'gbk', 'utf8')",
          "type": "string"
        },
        "inputPath": {
          "description": "Path to the input text file",
          "type": "string"
        },
        "outputDir": {
          "description": "Directory where converted file should be saved",
          "type": "string"
        },
        "toEncoding": {
          "description": "Target encoding (e.g., 'utf8', 'big5', 'gbk')",
          "type": "string"
        }
      },
      "required": [
        "inputPath",
        "outputDir",
        "fromEncoding",
        "toEncoding"
      ],
      "type": "object"
    },
    "name": "text_encoding_converter"
  },
  {
    "description": "Read Excel file and convert to JSON format while preserving structure",
    "inputSchema": {
      "properties": {
        "includeHeaders": {
          "default": true,
          "description": "Whether to include headers in the output",
          "type": "boolean"
        },
        "inputPath": {
          "description": "Path to the input Excel file",
          "type": "string"
        }
      },
      "required": [
        "inputPath"
      ],
      "type": "object"
    },
    "name": "excel_read"
  },
  {
    "description": "Convert between different document formats (Markdown, HTML, XML, JSON)",
    "inputSchema": {
      "properties": {
        "fromFormat": {
          "description": "Source format",
          "enum": [
            "markdown",
            "html",
            "xml",
            "json"
          ],
          "type": "string"
        },
        "input": {
          "description": "Input content to convert",
          "type": "string"
        },
        "toFormat": {
          "description": "Target format",
          "enum": [
            "markdown",
            "html",
            "xml",
            "json"
          ],
          "type": "string"
        }
      },
      "required": [
        "input",
        "fromFormat",
        "toFormat"
      ],
      "type": "object"
    },
    "name": "format_convert"
  }
]