
bsq0418/mineruThis repository builds a Docker image for the MinerU PDF intelligence toolkit. The image ships with magic-pdf[full] and the required configuration so you can convert PDFs into Markdown, JSON, and other structured formats right inside the container. It is based on the Arm-optimized armswdev/pytorch-arm-neoverse base image, making it ideal for Apple Silicon, AWS Graviton, and other arm64 hosts.
For details on MinerU capabilities, algorithm components, and configuration options, refer to the official documentation:
/mineru/models; no extra setup is needed.magic-pdf[full], huggingface_hub, and dependencies required for layout, OCR, formula, and table recognition.magic-pdf.json configuration so the container works out of the box.libgl1 to ensure OpenCV and PaddleOCR operate correctly in headless environments./mineru/models, tuned for Arm CPU/NEON to cut down on cold-start latency.Pull the image
bashdocker pull <dockerhub-namespace>/mineru:<tag>
Replace <dockerhub-namespace> and <tag> with the repository and tag published on Docker Hub.
Launch the container with a mounted workspace
bashdocker run --name mineru-dev \ -v $(pwd)/data:/workspace \ -it <dockerhub-namespace>/mineru:<tag> /bin/bash
$(pwd)/data hosts PDFs to process and the output artifacts./mineru; magic-pdf.json is copied to both /root/ and /home/ubuntu/.Run a conversion task inside the container
bashmagic-pdf run \ --config /root/magic-pdf.json \ --input /workspace/input.pdf \ --output /workspace/output_dir \ --task pdf2md
--task to pdf2json, pdf2html, etc., as needed. Consult the official docs for the full CLI reference.picture_test.py example script in this repo.Dockerfile: Build instructions for the image.magic-pdf.json: Default configuration covering model locations, OCR/table/formula toggles, and optional LLM hooks.models/: Pre-staged MinerU/MoTao model cache.download_models_hf.py: Script to refresh models/config via Hugging Face Hub.picture_test.py: Sample script for running layout + OCR on a single image.docker_start.sh: Example startup script that mounts the repo root at /app.python download_models_hf.py inside the container. It leverages huggingface_hub.snapshot_download and rewrites magic-pdf.json with the new paths./root/magic-pdf.json to disable them or to activate LLM-assisted features.api_key and base_url fields in the config and set enable to true.device-mode.This image extends the MinerU solution by the OpenDataLab community. Thanks to the project maintainers and contributors for the extraction pipelines and pretrained assets. For advanced scenarios—region-based extraction, batch pipelines, production deployment—refer to the official MinerU documentation.






manifest unknown 错误
TLS 证书验证失败
DNS 解析超时
410 错误:版本过低
402 错误:流量耗尽
身份认证失败错误
429 限流错误
凭证保存错误
来自真实用户的反馈,见证轩辕镜像的优质服务