🌱 Open Source

🌍 Live Open Source Explorer

Explore live open-source projects and AI models.

Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.

🔎 Live Search

Search live open-source data

Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.

Reset Search
🔎
🌐

Try keywords like automation, CRM, analytics, chatbot, llama or workflow.

Choose where to search live data.

Live Results

GitHub Open Source Repositories

Search: language-data

Page 1

Showing 10 results from 1,705

P

PaddlePaddle/PaddleOCR

GitHub Python Apache License 2.0

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

★ 83,807 Forks 10,880 PaddlePaddle Updated 25 Jun 2026
F

fighting41love/funNLP

GitHub Python

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信... Read more

★ 81,446 Forks 15,246 fighting41love Updated 25 Jun 2026
I

huihut/interview

GitHub C++ Other

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library,... Read more

★ 37,989 Forks 8,093 huihut Updated 25 Jun 2026
I

lionsoul2014/ip2region

GitHub Java Apache License 2.0

Ip2region is an offline IP-to-Region localization library and IP data management framework with both IPv4 and IPv6 supports, 10-microsecond level query efficiency, xdb search client for many programming languages

★ 19,186 Forks 3,015 lionsoul2014 Updated 25 Jun 2026
A

apache/arrow

GitHub C++ Apache License 2.0

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

★ 16,877 Forks 4,152 apache Updated 25 Jun 2026
W

Canner/WrenAI

GitHub Python Other

GenBI (Generative BI) for AI agents, an open-source, governed text-to-SQL through an open context layer that turns natural-language questions into trusted dashboards, charts, and SQL across 20+ data sources, such as BigQuery, Snowflake, PostgreSQL, ClickHouse, Amazon Redshift, Databricks and more.

★ 15,618 Forks 1,777 Canner Updated 25 Jun 2026
U

Unstructured-IO/unstructured

GitHub HTML Apache License 2.0

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partition... Read more

★ 14,998 Forks 1,263 Unstructured-IO Updated 25 Jun 2026
A

arangodb/arangodb

GitHub C++ Other

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

★ 14,211 Forks 884 arangodb Updated 25 Jun 2026
P

PRQL/prql

GitHub Rust Apache License 2.0

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

★ 10,868 Forks 257 PRQL Updated 25 Jun 2026
C

dr5hn/countries-states-cities-database

GitHub Python Open Data Commons Open Database License v1.0

🌍 Discover our global repository of countries, states, and cities! 🏙️ Get comprehensive data in JSON, SQL, PSQL, SQLSERVER, MONGODB, SQLITE, XML, YAML, and CSV formats. Access ISO2, ISO3 codes, country code, capital, native language, timezones (for countries), and more. #countries #states #cities

★ 9,673 Forks 3,002 dr5hn Updated 25 Jun 2026
Pagination Page 1 of 100

10 results on this page · 1,705 total found

Showing first 1,000 accessible GitHub results.