Back to work

FischGPT

PyTorch NextJS · Express.js · ChromaDB ShadCN · Tailwind · RAG Pipeline 2025 - Present

GPT-2-Class Language Model

Related Links: Live Demo Github

A GPT-2-class language model built from first principles with custom tokenization, attention mechanisms, and end-to-end training pipeline.

This project involved building a complete language model from scratch, including a custom BPE tokenizer, multi-head attention mechanisms, back-propagation, and soft-max layers. The model was trained end-to-end using Distributed Data Parallel (DDP) on 8x A100 GPUs. Pre-training was scaled to 45 billion tokens using FineWeb 10B dataset over 4.5 epochs, followed by supervised fine-tuning on 25 million tokens from OpenAssistant1. The final model surpassed GPT-2's HellaSwag benchmark performance by over 5% while reducing validation loss by 0.3. A complete web application was built featuring a RAG pipeline in Express.js with ChromaDB for autogenerating "about-me" responses, and a modern frontend in Next.js with ShadCN and Tailwind for live chat interaction.

Features

Media

Visit the live demo to chat with the model!