AI Training & GDPR:
Do You Need Consent?
Websites are being scraped to train Large Language Models (LLMs). Under the new EU AI Act, do you have the right to opt-out? And if you use user data for your own AI, do you need an "AI Consent" checkbox?
Scraping vs. Training
There are two distinct sides to this problem:
1. They scrape You
Companies like OpenAI or Anthropic crawling your site for training data.
Under the EU Copyright Directive, you can block this via `robots.txt`.
2. You train on Them
You using your customer's data (chats, behavior) to fine-tune your own models.
This is "processing of personal data" and requires explicit GDPR consent.
The "AI" Cookie Category
Simply asking for "Marketing" consent is likely insufficient for AI training. Regulators argue that "Training an AI model" is a distinct purpose from "Showing an ad".
We are seeing a rise in a 4th category in cookie banners:
- Strictly Necessary
- Analytics
- Marketing
- AI & Research (The new frontier)
How to Block AI Scrapers (The Easy Win)
If you don't want your content feeding GPT-5 for free, you need to update your `robots.txt`. This is the recognized "machine-readable opt-out".
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Anthropic-ai
Disallow: /Does your banner support AI consent?
Most legacy CMPs don't have an "AI Training" category. Ours does.
Get AI-Ready Consent