๐ŸŒฑ Open Source โ–พ

๐ŸŒ Live Open Source Explorer

Explore live open-source projects and AI models.

Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.

๐Ÿ”Ž Live Search

Search live open-source data

Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.

Reset Search โ†ป
๐Ÿ”Ž
๐ŸŒ

Try keywords like automation, CRM, analytics, chatbot, llama or workflow.

Choose where to search live data.

Live Results

GitHub Open Source Repositories

Search: language-data

Page 3

Showing 10 results from 1,705

O

imoneoi/openchat

GitHub Python Apache License 2.0

OpenChat: Advancing Open-source Language Models with Imperfect Data

โ˜… 5,481 Forks 431 imoneoi Updated 21 Jun 2026
C

umpirsky/country-list

GitHub HTML MIT License

:globe_with_meridians: List of all countries with names and ISO 3166-1 codes in all languages and data formats.

โ˜… 5,246 Forks 1,528 umpirsky Updated 16 Jun 2026
R

togethercomputer/RedPajama-Data

GitHub Python Apache License 2.0

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

โ˜… 4,956 Forks 373 togethercomputer Updated 25 Jun 2026
M

lk-geimfari/mimesis

GitHub Python MIT License

Mimesis is a fast Python library for generating fake data in multiple languages.

โ˜… 4,819 Forks 359 lk-geimfari Updated 25 Jun 2026
E

SPLWare/esProc

GitHub Java Apache License 2.0

esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.

โ˜… 4,685 Forks 363 SPLWare Updated 24 Jun 2026
K

kaitai-io/kaitai_struct

GitHub Shell

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby / Rust

โ˜… 4,631 Forks 208 kaitai-io Updated 24 Jun 2026
S

yizhongw/self-instruct

GitHub Python Apache License 2.0

Aligning pretrained language models with instruction data generated by themselves.

โ˜… 4,602 Forks 522 yizhongw Updated 24 Jun 2026
F

apache/fory

GitHub Java Apache License 2.0

A blazingly fast multi-language serialization framework for idiomatic domain objects, schema IDL, and cross-language data exchange.

โ˜… 4,410 Forks 422 apache Updated 25 Jun 2026
V

NVlabs/VILA

GitHub Python Apache License 2.0

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

โ˜… 3,827 Forks 324 NVlabs Updated 25 Jun 2026
O

maziyarpanahi/openmed

GitHub Python Apache License 2.0

Local-first healthcare AI: clinical NER & HIPAA PII de-identification that runs 100% on-device. 1,000+ medical models, 12 languages, Apple MLX + Python, no cloud, no patient data leaving your network. Apache-2.0

โ˜… 3,809 Forks 423 maziyarpanahi Updated 25 Jun 2026
Pagination Page 3 of 100

10 results on this page ยท 1,705 total found

Showing first 1,000 accessible GitHub results.