π Live Open Source Explorer
Explore live open-source projects and AI models.
Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.
π Live Search
Search live open-source data
Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.
Live Results
GitHub Open Source Repositories
Search: language-data
Page 24
Showing 10 results from 1,706
opendatalab/Meta-rater
GitHub Python[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"
External source
GitHub
xiaohui-victor-li/FinDKG
GitHub Python GNU General Public License v3.0Data and Model implementation for paper: FinDKG: Dynamic Knowledge Graph with Large Language Models for Global Finance
External source
GitHub
sail-sg/regmix
GitHub Jupyter Notebook MIT License[ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
External source
GitHub
pfnet-research/contextual_augmentation
GitHub Python MIT LicenseContextual augmentation, a text data augmentation using a bidirectional language model.
External source
GitHub
noahho/CAAFE
GitHub Python OtherSemi-automatic feature engineering process using Language Models and your dataset descriptions. Based on the paper "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" by Hollmann, MΓΌller, and Hutter (2023).
External source
GitHub
blaze/datashape
GitHub Python BSD 2-Clause "Simplified" LicenseLanguage defining a data description protocol
External source
GitHub
czyssrs/Few-Shot-NLG
GitHub Python MIT LicenseCode and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"
External source
GitHub
GUNDAM-Labet/GUNDAM
GitHub Python Apache License 2.0GUNDAM is a data management system that prioritizes data using language models.
External source
GitHub
akanimax/natural-language-summary-generation-from-structured-data
GitHub Python MIT LicenseImplementation of the paper -> https://arxiv.org/abs/1709.00155. For converting information present in the form of structured data into natural language text
External source
GitHub
futuremojo/nlp-demystified
GitHub Jupyter NotebookCode and data for Natural Language Processing Demystified
External source
GitHub
10 results on this page Β· 1,706 total found
Showing first 1,000 accessible GitHub results.