🐙 GitHub Detail
opendatalab/MinerU-HTML
By opendatalab
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Live Snapshot
⭐
Stars
248
🍴
Forks
24
📄
License
Apache License 2.0
🧩
Type
Python
About this open-source project
Live information fetched from GitHub.
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Default Branch
main
Open Issues
3
Watchers
248