π Live Open Source Explorer
Explore live open-source projects and AI models.
Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.
π Live Search
Search live open-source data
Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.
Live Results
GitHub Open Source Repositories
Search: RLHF-Reward-Modeling
Page 1
Showing 10 results from 13
RLHFlow/RLHF-Reward-Modeling
GitHub Python Apache License 2.0Recipes to train reward model for RLHF.
External source
GitHub
ash80/RLHF_in_notebooks
GitHub Jupyter Notebook MIT LicenseRLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooks
External source
GitHub
raghavc/LLM-RLHF-Tuning-with-PPO-and-DPO
GitHub PythonComprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
External source
GitHub
Jerry-XDL/AIDoctor
GitHub Python Apache License 2.0AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferencβ¦
External source
GitHub
ZinYY/Online_RLHF
GitHub PythonA PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling". This repository provides a flexible and modular approach to Online Reinforcement Learning from Human Feedback (Online RLHF).
External source
GitHub
hscspring/rl-llm-nlp
GitHub Apache License 2.0Curated, opinionated index of post-R1 LLM Γ Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers β GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.
External source
GitHub
vibrantlabsai/nemesis
GitHub Python Apache License 2.0Reward Model framework for LLM RLHF
External source
GitHub
nancui0000/DiSuSumDet
GitHub Jupyter NotebookEnd-to-end RLHF detoxification pipeline implementing the InstructGPT architecture: LoRA SFT on DialogSum, custom reward model from synthetic preferences, and multi-objective PPO balancing toxicity, faithfulness, and quality. Achieves 60% toxicity reduction with zero ROUGE-L degradation
External source
GitHub
tlc4418/llm_optimization
GitHub Python MIT LicenseA repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
External source
GitHub
Yog-Sotho/LLM-fine-tuner
GitHub Python GNU General Public License v3.0Powerful no-code LLM fine-tuner: upload data β train β deploy in minutes. Unsloth 2-5Γ acceleration Β· QLoRA/DPO/RLHF/PPO/ORPO Β· Reward Model training Β· GGUF export Β· vLLM inference Β· BLEU/ROUGE/BERTScore Β· full CLI Β· Heretic Mode to unlock full model potential
External source
GitHub
10 results on this page Β· 13 total found
Showing first 13 accessible GitHub results.