Back to projects
Case Study

Shia LaBeouf DPO LoRA

Direct Preference Optimization pipeline with LoRA adapters, emotion classification, and iterative preference learning for narrative alignment.

PyTorch
LoRA
DPO
NLP

Overview

An interactive narrative system that aligns text generation with user sentiment using DPO and LoRA adapters.

Problem

Baseline language models struggled to stay aligned with evolving user sentiment in a narrative experience.

Solution

Built a DPO training pipeline with emotion classification, preference pair construction, and adapter fine-tuning to iteratively align outputs.

Tech Stack

PyTorch
HuggingFace
LoRA
DPO
GPT-2
Qwen-2.5

Architecture

High-level system flow and core building blocks.

Data

Emotion classifier + preference pair generator.

Training

DPO pipeline with LoRA adapters for efficient fine-tuning.

Inference

Interactive narrative engine with aligned outputs.

What I'd Improve Next

  • Add automated safety filtering for multi-turn sessions.
  • Scale preference collection with active learning.