๐ง Web Beast: The Ultimate Laravel & Node.js Dataset for AI Code Generation
Train better AI models, build smarter developer tools, and fine-tune LLMs with a purpose-built, open-source code dataset.
๐ What Is Web Beast?
Web Beast is a curated, large-scale dataset of high-quality source code, designed to help developers and researchers train AI models specifically for real-world backend and fullstack development.
Unlike generic code datasets, Web Beast focuses on framework-rich, production-grade code from:
- ๐งฑ Laravel (PHP)
- ⚙️ Node.js (Express, TypeScript)
- ๐ JavaScript (Fullstack)
- ๐จ HTML + Tailwind CSS
It’s hosted on Hugging Face and ready to use in your training pipelines.
๐ Why This Dataset Matters
Most code generation datasets are:
- ❌ Too generic
- ❌ Missing real framework structure
- ❌ Full of synthetic, toy-level examples
Web Beast solves this by offering real-world controller classes, route files, middleware, service providers, APIs, Tailwind layouts, and more—all directly scraped and cleaned from popular open-source projects.
๐ฆ Dataset Structure
Each file is stored in .jsonl format for compatibility with AI/ML frameworks like Hugging Face Transformers, LoRA, and Alpaca-style instruction training.
Example Record:
{
"repo": "github.com/example/repo",
"path": "app/Http/Controllers/AuthController.php",
"language": "php",
"framework": "laravel",
"content": "
๐ Key Fields
- repo: Source GitHub repository
- path: File path within the project
- language: Programming language (php, ts, js, html)
- framework: Framework used (laravel, express, etc.)
- content: Raw code content
๐ง Perfect for Training AI Code Models
This dataset is ideal if you're building or fine-tuning models like:
- ๐ค DeepSeek-Coder
- ๐ง StarCoder
- ๐ฆ CodeLlama
Use Cases
- Laravel-specific autocomplete tools
- Express.js endpoint generators
- Copilot-style backend code assistants
- Tailwind UI component suggestions
๐งช How to Use the Dataset
Load it with Hugging Face:
from datasets import load_dataset
dataset = load_dataset("brijmansuriya/web-beast")
Filter Laravel Files:
laravel_data = dataset['train'].filter(lambda x: x['framework'] == 'laravel')
✅ Works with Hugging Face, LoRA, QLoRA, Transformers, and custom fine-tuning pipelines.
๐ Explore the Dataset
๐ View on Hugging Face
- ✅ Clean & ready to use
- ✅ Filter by framework or language
- ✅ Ideal for training LLMs or building AI tools
๐ Data Safety & Licensing
- ๐งผ Secrets Removed: .env, keys, tokens, passwords
- ๐ Open Source: From MIT, Apache licensed public repos
๐ง๐ป Who Should Use This?
| Audience | Benefit |
|---|---|
| ML Engineers | Train and evaluate code generation models |
| Laravel/Node Devs | Build AI-powered dev tools and assistants |
| Tool Builders | Use data to enhance autocomplete and code suggestions |
๐ฎ Roadmap & What's Next
- ✅ Laravel-only fine-tuned model with LoRA
- ✅ Code snippets for RESTful APIs
- ๐ Django, Flask, Vue, Next.js versions
- ๐ Gradio/Colab demos
- ๐ VS Code Extension powered by Web Beast
๐ค How to Contribute
- Submit Laravel/Node repositories for inclusion
- Contribute cleaning/preprocessing scripts
- Fine-tune models and share benchmarks
๐ง Email: brijmansuriya.ai@gmail.com
๐จ๐ป GitHub: brijmansuriya
๐ Final Thoughts
Great AI code tools need great datasets.
Web Beast gives you Laravel + Node.js power in a clean, usable format.
Whether you're training LLMs or building a Copilot alternative, this is your launchpad.
๐ Get Started Now
๐ Download Web Beast on Hugging Face
๐ Website: web-beast.com