🌱 Open Source β–Ύ

🌍 Live Open Source Explorer

Explore live open-source projects and AI models.

Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.

πŸ”Ž Live Search

Search live open-source data

Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.

Reset Search ↻
πŸ”Ž
🌐

Try keywords like automation, CRM, analytics, chatbot, llama or workflow.

Choose where to search live data.

Live Results

GitHub Open Source Repositories

Search: RLHF-Reward-Modeling

Page 1

Showing 10 results from 13

R

RLHFlow/RLHF-Reward-Modeling

GitHub Python Apache License 2.0

Recipes to train reward model for RLHF.

β˜… 1,534 Forks 110 RLHFlow Updated 12 Jun 2026
R

ash80/RLHF_in_notebooks

GitHub Jupyter Notebook MIT License

RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooks

β˜… 249 Forks 31 ash80 Updated 25 Jun 2026
L

raghavc/LLM-RLHF-Tuning-with-PPO-and-DPO

GitHub Python

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.

β˜… 191 Forks 19 raghavc Updated 05 Jun 2026
A

Jerry-XDL/AIDoctor

GitHub Python Apache License 2.0

AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…

β˜… 188 Forks 16 Jerry-XDL Updated 05 May 2026
O

ZinYY/Online_RLHF

GitHub Python

A PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling". This repository provides a flexible and modular approach to Online Reinforcement Learning from Human Feedback (Online RLHF).

β˜… 94 Forks 17 ZinYY Updated 29 May 2026
R

hscspring/rl-llm-nlp

GitHub Apache License 2.0

Curated, opinionated index of post-R1 LLM Γ— Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers β€” GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.

β˜… 70 Forks 6 hscspring Updated 28 Jun 2026
N

vibrantlabsai/nemesis

GitHub Python Apache License 2.0

Reward Model framework for LLM RLHF

β˜… 63 Forks 6 vibrantlabsai Updated 26 Mar 2026
D

nancui0000/DiSuSumDet

GitHub Jupyter Notebook

End-to-end RLHF detoxification pipeline implementing the InstructGPT architecture: LoRA SFT on DialogSum, custom reward model from synthetic preferences, and multi-objective PPO balancing toxicity, faithfulness, and quality. Achieves 60% toxicity reduction with zero ROUGE-L degradation

β˜… 50 Forks 6 nancui0000 Updated 22 Apr 2026
L

tlc4418/llm_optimization

GitHub Python MIT License

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

β˜… 49 Forks 6 tlc4418 Updated 17 Jun 2026
L

Yog-Sotho/LLM-fine-tuner

GitHub Python GNU General Public License v3.0

Powerful no-code LLM fine-tuner: upload data β†’ train β†’ deploy in minutes. Unsloth 2-5Γ— acceleration Β· QLoRA/DPO/RLHF/PPO/ORPO Β· Reward Model training Β· GGUF export Β· vLLM inference Β· BLEU/ROUGE/BERTScore Β· full CLI Β· Heretic Mode to unlock full model potential

β˜… 27 Forks 4 Yog-Sotho Updated 20 Jun 2026
Pagination Page 1 of 2

10 results on this page Β· 13 total found

Showing first 13 accessible GitHub results.