Understanding Training Data Limits

Last updated: February 2, 2026

Training data limits determine how much content you can use to train your chatbot. Understanding these limits helps you make the most of your subscription and plan your content strategy effectively.


What is a Training Character?

A training character is a single character of visible text extracted from your training sources. Here's what counts:

  • Visible text only - HTML tags, CSS, JavaScript, and other code are excluded
  • Normalized whitespace - Multiple spaces, tabs, and line breaks are compressed
  • Pure content - Only the actual readable text that helps your chatbot answer questions

Example: A 5 MB web page typically contains only about 400,000 characters of actual text content (approximately 8% of the total file size).


Training Data Limits by Subscription Plan

Each subscription tier includes different training data capacity:

Plan Training Characters Website Links Approx. Page Size
Free 400,000 10 ~5 MB
Basic 11,000,000 1,000 ~140 MB
Standard 15,000,000 5,000 ~190 MB
Premium 15,000,000 100,000 ~190 MB

💡 Important: These limits apply to the total training data per chatbot, not per training source. All your website content, uploaded files, Q&A pairs, and text entries count toward a single cumulative total.

Per-Bot Limits

Each chatbot in your account has its own independent training data limit. This means:

Example with Standard Plan (5 bots, 15M characters each):

  • Bot #1 (Support): 14,500,000 characters used
  • Bot #2 (Sales): 10,200,000 characters used
  • Bot #3 (FAQ): 3,800,000 characters used
  • Bot #4 (Product): 15,000,000 characters used (at limit)
  • Bot #5 (International): 8,900,000 characters used

Each bot can use up to 15M characters independently. Bot #4 being at its limit doesn't affect the others.


How Training Data is Counted

Cumulative Across All Sources

Your training character limit is cumulative across all data sources you add to a single chatbot:

  • Website content
  • Uploaded files (PDF, Word, Excel, CSV, TXT)
  • Manual text entries
  • Questions and answers
  • E-commerce product descriptions

Example: If you have:

  • Website training: 8,000,000 characters
  • PDF files: 3,000,000 characters
  • Q&A pairs: 2,000,000 characters

Total: 13,000,000 characters (would fit in Standard plan but exceed Basic plan)

One-Time Calculation Per Bot

Unlike message credits that reset monthly, training data limits are calculated as an absolute size. This means:

  • You're charged for the total amount of data currently stored, not how many times you've trained
  • Adding new content consumes more quota
  • Replacing old content doesn't double-count (see Smart Re-training below)

Smart Re-training: Net Quota Calculation

When you re-train existing content (like refreshing a website scan), ChatLab uses net quota calculation:

  • Old content is identified and its size is noted
  • New content is scanned and compared
  • Only the difference counts toward your quota

Example:

  • Initial scan of website: 5,000,000 characters
  • Re-scan after updates: 5,200,000 characters
  • Quota consumed: Only 200,000 characters (the net new data)

This smart system ensures you can keep your chatbot's knowledge fresh without wasting your training quota on unchanged content.


Training Data vs. Message Credits

These are completely separate limits:

Aspect Training Data Message Credits
What it measures How much content you can store How many chatbot responses you can generate
Reset frequency Never (absolute size) Monthly
Used for Knowledge base capacity Conversation volume
Example 15M characters = ~190 MB of text 11,000 credits = ~2,200 conversations

🚨 Common mistake: Don't confuse training limits with message credits. You can have plenty of message credits but run out of training space, or vice versa. They serve different purposes.


Purchasing Additional Training Capacity

If your subscription's training limits aren't sufficient, you can purchase add-on packages:

"+10 MB of training characters for all your chatbots"

This add-on increases each chatbot's limit:

  • Standard plan: 15 MB → 25 MB per chatbot
  • Premium plan: 15 MB → 25 MB per chatbot

💡 The add-on applies to all chatbots in your account, not just one.


Optimizing Your Training Data Usage

More training data doesn't always mean better chatbot performance. In fact, adding irrelevant content can hurt answer quality by introducing noise into the retrieval process.

Best Practices:

  • ✅ Focus on high-value content (FAQs, product descriptions, support articles)
  • ✅ Use sitemap scanning instead of full website crawls
  • ✅ Exclude repetitive elements (headers, footers, menus)
  • ✅ Filter out non-essential pages (news archives, image galleries)
  • ❌ Avoid scanning every single page just because you have quota available

Learn more: How to Reduce Training Characters When Scanning a Website in ChatLab

This comprehensive guide explains advanced filtering techniques to help you use your training quota efficiently while maintaining high answer quality.


Summary

  • Training characters = visible text only, not file sizes or media
  • Limits vary by plan from 400K (Free) to 15M (Standard/Premium)
  • Each bot has independent limits - one bot at max doesn't affect others
  • Cumulative calculation across all sources per chatbot
  • Separate from message credits - they serve different purposes
  • Smart re-training only counts net new data
  • Add-on packages available for additional capacity
  • Quality over quantity - focus on relevant content

Understanding these limits helps you train your chatbot effectively and choose the right subscription plan for your needs.