post thumbnail

MinerU 2.5 Document Parsing: Which Large Model OCR Is Better? (Part 2)

Hands-on guide to MinerU 2.5, a document-parsing OCR and vision-language model. Compare VLM vs Pipeline modes, multilingual and complex layout recognition, tables and handwriting. Step-by-step Windows/Conda installation, GPU setup, CLI and web UI usage, troubleshooting, and performance tips. Ideal for high-accuracy PDF parsing, batch extraction, and RAG knowledge bases.

2025-11-25

MinerU 2.5 Document Parsing: What It Is and Why It Matters

The previous article introduced MonkeyOCR. In this part, we focus on MinerU 2.5, a document parsing tool/model designed to convert complex documents (especially PDFs) into structured, LLM-ready outputs such as Markdown/JSON, with strong layout understanding for multi-column text, tables, and formulas. 

MinerU 2.5 has been officially released by the project and is described as a 1.2B-parameter vision-language model for document parsing, reporting strong results on document benchmarks (e.g., OmniDocBench) in its release materials. 

Note on model comparisons: MinerU’s release notes report benchmark advantages over several mainstream VLMs and specialized tools on OmniDocBench, but other independent model releases/papers may report different outcomes under different settings. Treat “best” as benchmark- and setup-dependent. 

MinerU 2.5 is especially useful for practical workflows such as building RAG knowledge bases and large-scale document extraction, where preserving layout, tables, and formulas matters more than plain OCR text.


Quick Comparison: MinerU vs OCR-Only Tools

MinerU is not just “OCR.” It is closer to a document parsing pipeline that emphasizes structure + layout + export formats (Markdown/JSON/PDF-like reconstruction), which tends to matter more for downstream search, retrieval, and RAG ingestion than raw text alone. 


Step 1: Environment Preparation

1.1 Check CUDA and GPU (Optional but Recommended)

# Check CUDA version (requires CUDA 11.8 or higher)
nvcc --version

# Check GPU status and memory
nvidia-smi

If you do not have an NVIDIA GPU, you can still run MinerU on CPU, but it will be slower.


1.2 Create a Dedicated Conda Environment (Python 3.10)

Option 1: Default path

conda create -n mineru python=3.10
conda activate mineru

Option 2: Custom path (Windows example)

conda create --prefix=D:\Computer\Anaconda\envs\mineru python=3.10
conda activate mineru

Step 2: Install MinerU (GPU/CPU)

2.1 Install MinerU Core

pip install uv
pip uninstall mineru -y
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple

MinerU is available as a PyPI package. 

2.2 Install PyTorch (GPU users)

Install the PyTorch build that matches your CUDA version. Example for CUDA 12.1:

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

(If you are on a different CUDA version, use the corresponding PyTorch index URL.)


2.3 Verify Installation

mineru --version
mineru --help

Step 3: Download Model Files

MinerU requires model assets. A common approach is downloading all required models.

mineru-models-download --model_type all

(Downloads can be large; if it fails, retry.)


Step 4: Run a Functional Test (Pipeline vs VLM)

4.1 Prepare Test Folders

mkdir test_pdfs
mkdir test_output

Put PDFs into test_pdfs/.


4.2 Pipeline Mode (Faster, Good Default)

# Parse a single PDF
mineru -p ./test_pdfs/your_file.pdf -o ./test_output/ --backend pipeline

# GPU acceleration (if available)
mineru -p ./test_pdfs/your_file.pdf -o ./test_output/ --backend pipeline --device cuda

4.3 VLM Mode (Higher Precision, Slower)

# VLM mode on GPU
mineru -p ./test_pdfs/your_file.pdf -o ./test_output/ --backend vlm-transformers --device cuda

# VLM mode on CPU
mineru -p ./test_pdfs/your_file.pdf -o ./test_output/ --backend vlm-transformers --device cpu

When to choose which

Parsed output commonly includes a .md (Markdown) file representing the extracted structure.


4.4 Batch Processing a Folder

mineru -p ./test_pdfs -o ./test_output/ --backend pipeline --batch-size 8

Step 5: Launch the Web Interface (Gradio)

MinerU provides a browser UI for interactive parsing and review.

5.1 Start the Web Service

conda activate mineru
mineru-gradio --server-port 8080

5.2 Open in Browser

Visit:

http://localhost:8080/

5.3 Troubleshooting

Change port

mineru-gradio --server-port 7860

Windows network reset (admin CMD)

netsh winsock reset
netsh int ip reset
ipconfig /flushdns

Reboot and retry.


Step 6: Online Experience

If you want a quick trial without local setup, MinerU provides an official online experience. 


Key Usage Reminders

  1. Activate the environment
conda activate mineru
  1. Two usage methods
  1. Performance tips