Menu
Back to Open Source

🐙 GitHub Detail

M

opendatalab/MinerU-HTML

By opendatalab

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

GitHub Python Apache License 2.0 Updated 28 May 2026

Live Snapshot

Stars

248

🍴

Forks

24

📄

License

Apache License 2.0

🧩

Type

Python

📘

About this open-source project

Live information fetched from GitHub.

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

🌿

Default Branch

main

🐞

Open Issues

3

👀

Watchers

248