FischGPT
GPT-2-Class Language Model
Related Links: Live Demo Github
A GPT-2-class language model built from first principles with custom tokenization, attention mechanisms, and end-to-end training pipeline.
This project involved building a complete language model from scratch, including a custom BPE tokenizer, multi-head attention mechanisms, back-propagation, and soft-max layers. The model was trained end-to-end using Distributed Data Parallel (DDP) on 8x A100 GPUs. Pre-training was scaled to 45 billion tokens using FineWeb 10B dataset over 4.5 epochs, followed by supervised fine-tuning on 25 million tokens from OpenAssistant1. The final model surpassed GPT-2's HellaSwag benchmark performance by over 5% while reducing validation loss by 0.3. A complete web application was built featuring a RAG pipeline in Express.js with ChromaDB for autogenerating "about-me" responses, and a modern frontend in Next.js with ShadCN and Tailwind for live chat interaction.
Features
- PyTorch Deep learning framework
- FineWeb 10B Dataset Pre-training data (45B tokens)
- OpenAssistant1 Supervised fine-tuning (25M tokens)
- ChromaDB Vector database for RAG
- Next.js Frontend framework
- ShadCN UI Component library
Media
Visit the live demo to chat with the model!