Case Study

Shia LaBeouf DPO LoRA

Direct Preference Optimization pipeline with LoRA adapters, emotion classification, and iterative preference learning for narrative alignment.

PyTorch

LoRA

DPO

NLP

Overview

An interactive narrative system that aligns text generation with user sentiment using DPO and LoRA adapters.

Baseline language models struggled to stay aligned with evolving user sentiment in a narrative experience.

Built a DPO training pipeline with emotion classification, preference pair construction, and adapter fine-tuning to iteratively align outputs.

PyTorch

HuggingFace

LoRA

DPO

GPT-2

Qwen-2.5

High-level system flow and core building blocks.

Data

Emotion classifier + preference pair generator.

Training

DPO pipeline with LoRA adapters for efficient fine-tuning.

Inference

Interactive narrative engine with aligned outputs.