Back to projects
Case Study
Shia LaBeouf DPO LoRA
Direct Preference Optimization pipeline with LoRA adapters, emotion classification, and iterative preference learning for narrative alignment.
PyTorch
LoRA
DPO
NLP
Overview
An interactive narrative system that aligns text generation with user sentiment using DPO and LoRA adapters.
Problem
Baseline language models struggled to stay aligned with evolving user sentiment in a narrative experience.
Solution
Built a DPO training pipeline with emotion classification, preference pair construction, and adapter fine-tuning to iteratively align outputs.
Architecture
High-level system flow and core building blocks.
Data
Emotion classifier + preference pair generator.
Training
DPO pipeline with LoRA adapters for efficient fine-tuning.
Inference
Interactive narrative engine with aligned outputs.
What I'd Improve Next
- Add automated safety filtering for multi-turn sessions.
- Scale preference collection with active learning.