🧠 Web Beast: The Ultimate Laravel & Node.js Dataset for AI Code Generation
Train better AI models, build smarter developer tools, and fine-tune LLMs with a purpose-built, open-source code dataset.
🚀 What Is Web Beast?
Web Beast is a curated, large-scale dataset of high-quality source code, designed to help developers and researchers train AI models specifically for real-world backend and fullstack development.
Unlike generic code datasets, Web Beast focuses on framework-rich, production-grade code from:
- 🧱 Laravel (PHP)
- ⚙️ Node.js (Express, TypeScript)
- 🌐 JavaScript (Fullstack)
- 🎨 HTML + Tailwind CSS
It’s hosted on Hugging Face and ready to use in your training pipelines.
🔍 Why This Dataset Matters
Most code generation datasets are:
- ❌ Too generic
- ❌ Missing real framework structure
- ❌ Full of synthetic, toy-level examples
Web Beast solves this by offering real-world controller classes, route files, middleware, service providers, APIs, Tailwind layouts, and more—all directly scraped and cleaned from popular open-source projects.
📦 Dataset Structure
Each file is stored in .jsonl
format for compatibility with AI/ML frameworks like Hugging Face Transformers, LoRA, and Alpaca-style instruction training.
Example Record:
{
"repo": "github.com/example/repo",
"path": "app/Http/Controllers/AuthController.php",
"language": "php",
"framework": "laravel",
"content": "
📌 Key Fields
- repo: Source GitHub repository
- path: File path within the project
- language: Programming language (php, ts, js, html)
- framework: Framework used (laravel, express, etc.)
- content: Raw code content
🧠 Perfect for Training AI Code Models
This dataset is ideal if you're building or fine-tuning models like:
- 🤖 DeepSeek-Coder
- 🔧 StarCoder
- 🦙 CodeLlama
Use Cases
- Laravel-specific autocomplete tools
- Express.js endpoint generators
- Copilot-style backend code assistants
- Tailwind UI component suggestions
🧪 How to Use the Dataset
Load it with Hugging Face:
from datasets import load_dataset
dataset = load_dataset("brijmansuriya/web-beast")
Filter Laravel Files:
laravel_data = dataset['train'].filter(lambda x: x['framework'] == 'laravel')
✅ Works with Hugging Face, LoRA, QLoRA, Transformers, and custom fine-tuning pipelines.
🌍 Explore the Dataset
- ✅ Clean & ready to use
- ✅ Filter by framework or language
- ✅ Ideal for training LLMs or building AI tools
🔐 Data Safety & Licensing
- 🧼 Secrets Removed: .env, keys, tokens, passwords
- 📜 Open Source: From MIT, Apache licensed public repos
🧑💻 Who Should Use This?
Audience | Benefit |
---|---|
ML Engineers | Train and evaluate code generation models |
Laravel/Node Devs | Build AI-powered dev tools and assistants |
Tool Builders | Use data to enhance autocomplete and code suggestions |
🔮 Roadmap & What's Next
- ✅ Laravel-only fine-tuned model with LoRA
- ✅ Code snippets for RESTful APIs
- 🔜 Django, Flask, Vue, Next.js versions
- 🔜 Gradio/Colab demos
- 🔜 VS Code Extension powered by Web Beast
🤝 How to Contribute
- Submit Laravel/Node repositories for inclusion
- Contribute cleaning/preprocessing scripts
- Fine-tune models and share benchmarks
📧 Email: brijmansuriya.ai@gmail.com
👨💻 GitHub: brijmansuriya
📌 Final Thoughts
Great AI code tools need great datasets.
Web Beast gives you Laravel + Node.js power in a clean, usable format.
Whether you're training LLMs or building a Copilot alternative, this is your launchpad.
🔗 Get Started Now
👉 Download Web Beast on Hugging Face
🌐 Website: web-beast.com