🌱 Open Source β–Ύ

🌍 Live Open Source Explorer

Explore live open-source projects and AI models.

Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.

πŸ”Ž Live Search

Search live open-source data

Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.

Reset Search ↻
πŸ”Ž
🌐

Try keywords like automation, CRM, analytics, chatbot, llama or workflow.

Choose where to search live data.

Live Results

GitHub Open Source Repositories

Search: language-data

Page 24

Showing 10 results from 1,706

M

opendatalab/Meta-rater

GitHub Python

[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"

β˜… 195 Forks 15 opendatalab Updated 11 Jun 2026
F

xiaohui-victor-li/FinDKG

GitHub Python GNU General Public License v3.0

Data and Model implementation for paper: FinDKG: Dynamic Knowledge Graph with Large Language Models for Global Finance

β˜… 194 Forks 49 xiaohui-victor-li Updated 25 Jun 2026
R

sail-sg/regmix

GitHub Jupyter Notebook MIT License

[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)

β˜… 193 Forks 17 sail-sg Updated 23 Jun 2026
C

pfnet-research/contextual_augmentation

GitHub Python MIT License

Contextual augmentation, a text data augmentation using a bidirectional language model.

β˜… 192 Forks 35 pfnet-research Updated 10 Jan 2026
C

noahho/CAAFE

GitHub Python Other

Semi-automatic feature engineering process using Language Models and your dataset descriptions. Based on the paper "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" by Hollmann, MΓΌller, and Hutter (2023).

β˜… 192 Forks 37 noahho Updated 10 May 2026
D

blaze/datashape

GitHub Python BSD 2-Clause "Simplified" License

Language defining a data description protocol

β˜… 189 Forks 63 blaze Updated 31 May 2026
F

czyssrs/Few-Shot-NLG

GitHub Python MIT License

Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"

β˜… 189 Forks 20 czyssrs Updated 26 Jun 2026
G

GUNDAM-Labet/GUNDAM

GitHub Python Apache License 2.0

GUNDAM is a data management system that prioritizes data using language models.

β˜… 188 Forks 32 GUNDAM-Labet Updated 08 Apr 2026
N

akanimax/natural-language-summary-generation-from-structured-data

GitHub Python MIT License

Implementation of the paper -> https://arxiv.org/abs/1709.00155. For converting information present in the form of structured data into natural language text

β˜… 186 Forks 55 akanimax Updated 06 Jun 2026
N

futuremojo/nlp-demystified

GitHub Jupyter Notebook

Code and data for Natural Language Processing Demystified

β˜… 186 Forks 55 futuremojo Updated 30 Apr 2026
Pagination Page 24 of 100

10 results on this page Β· 1,706 total found

Showing first 1,000 accessible GitHub results.