AI Magicx
Back to Blog

EU AI Act Training-Data Disclosure: The August 2026 Checklist for GPAI Providers and Deployers

The EU AI Act's transparency rules take effect August 2026. Every GPAI provider must publish training-data summaries; every deployer must verify those summaries. Here is the compliance checklist that actually works.

13 min read
Share:

EU AI Act Training-Data Disclosure: The August 2026 Checklist for GPAI Providers and Deployers

On August 2, 2026, the EU AI Act's transparency requirements for general-purpose AI (GPAI) models enter full effect, alongside the start of high-risk AI enforcement. For the first time globally, AI providers must publish detailed summaries of what they trained on and how they handle copyrighted material. Enterprises deploying those models must verify that the summaries exist and address their own use case.

This post is the checklist version: what GPAI providers must disclose, what deployers must verify, the penalties for non-compliance, and the practical 90-day sprint to be ready.

What the Rules Require

Three specific obligations land on August 2, 2026.

Obligation 1: Training-data summaries.

Providers of GPAI models placed on the EU market must publish a sufficiently detailed summary of the content used to train the model. The summary must cover:

  • Data sources (datasets, repositories, web crawls, proprietary collections)
  • Language and geographic coverage
  • Types of content (text, code, image, audio, video)
  • Date ranges of data collection
  • Quantitative breakdown by source where feasible

"Sufficiently detailed" has not been fully tested in enforcement. Drafts of implementation guidance suggest enough detail that a rightsholder could reasonably determine whether their work was likely included.

Obligation 2: Copyright policy disclosure.

Providers must publish a policy explaining how they respect Union copyright law, including:

  • Process for respecting text-and-data mining (TDM) opt-outs
  • Handling of licensed versus open content
  • Approach to copyrighted material flagged after training
  • Indemnification or mitigation for downstream users

Obligation 3: Technical documentation.

Providers must maintain technical documentation (available to regulators, not necessarily publicly) covering training methodology, compute used, evaluation results, and known limitations.

Penalties scale: up to €15M or 3% of global annual turnover for non-compliance, whichever is higher. In practice, competition authorities will likely pair these with other regulatory action (as France did with Google) for significant penalties.

Who Is a GPAI Provider

The definition matters. A "provider" is the entity placing a GPAI model on the EU market. This includes:

  • Foundation model developers (OpenAI, Anthropic, Google, Mistral, Cohere)
  • Companies deploying significantly fine-tuned GPAI models
  • Open-weight model distributors who actively market in the EU

It does not include:

  • Enterprises using models via API for their own internal purposes
  • Light fine-tuning of models for narrow deployments
  • Research-only uses

The gray zone: a company that fine-tunes an open-weight model on proprietary data and then offers the fine-tuned model as a product. Current guidance treats this as "provider" obligations for the fine-tuned model, pointing back to the base model provider for the base model's disclosure.

Provider Checklist

If you are a GPAI provider, 14 items should be tracked to ready state before August 2.

Data inventory

  • Complete list of datasets used in pretraining with sources
  • Complete list of datasets used in post-training (instruction tuning, RLHF, preference tuning)
  • Documentation of web crawls with crawl dates and robots.txt compliance
  • List of licensed content agreements with counterparty and scope
  • List of synthetic data sources and generation methodology

Rights management

  • TDM opt-out respect process (how you handle publishers asserting Article 4 opt-outs)
  • Process for post-hoc removal requests
  • Copyright policy document (public-facing)
  • Licensing agreement templates
  • Agreements with collective licensing bodies where relevant

Disclosure documents

  • Training-data summary document (public)
  • Technical documentation (regulator-accessible)
  • Transparency report template for ongoing updates

Operational

  • Designated EU representative (required for non-EU providers)
  • Escalation and response process for copyright claims
  • Incident documentation process for compliance issues
  • Regular audit calendar

Communications

  • FAQ for enterprise customers asking about training-data compliance

The document that eats the most time is the training-data summary itself. Expect it to take 3-6 weeks of cross-functional work (legal, data engineering, model training, policy). Starting in April-May for an August deadline is about right.

Deployer Checklist

If you are an enterprise deploying GPAI models (most of you), the obligations are lighter but real. Your responsibilities are to verify and document, not to publish your own summaries.

  • Confirmed that each GPAI model you use has a published training-data summary
  • Copy of your model provider's copyright policy on file
  • Record of provider's compliance posture for your use case
  • Internal policy about which GPAI models are approved for which workloads
  • Process for handling regulator inquiries about model training sources
  • Documentation of any fine-tuning you do on GPAI models
  • If you operate in a high-risk category: full high-risk AI compliance (separate from GPAI transparency)

Pay once, own it

Skip the $19/mo subscription

One payment of $69 replaces years of monthly billing. 50+ AI models, yours forever.

Most deployers will handle this through procurement and legal — ensuring that vendor contracts include attestation of GPAI transparency compliance and indemnification for training-data issues. Claude, Gemini, ChatGPT, and major open-weight vendors will provide this in standard enterprise contracts.

Penalties and Enforcement

Enforcement will start cautiously and escalate. Three tiers of likely enforcement posture:

Tier 1: Guidance period (August 2026 - end 2026).

Regulators publicly committed to a guidance-first approach. Non-compliant providers will receive formal notices to correct rather than immediate fines for most issues. This is meant to give the market time to adjust.

Tier 2: Targeted enforcement (2027).

Enforcement focuses on clearly non-compliant providers (no summary published, no response to opt-outs, large-scale operation). Expect 2-5 headline-grabbing fines against major providers in 2027.

Tier 3: Full regulatory steady state (2028 onwards).

Routine enforcement, standardized audits, collective licensing schemes well-established. Compliance costs become line items in AI operations.

Separately, competition authorities (as France demonstrated with Google) may use GPAI transparency obligations to support anticompetitive practice findings, which have much larger penalty scales.

The Interaction with the Google Ruling

France's €250M fine against Google for Gemini training was issued under competition law, not the AI Act. But the factual pattern — training on copyrighted content without permission, without transparency, without opt-out handling — is exactly what the AI Act targets.

For AI providers, this means the Google ruling is not a one-off. It is a preview of enforcement patterns. The AI Act now gives regulators a dedicated framework rather than having to route through competition law. Expect the pace of enforcement actions to increase.

The 90-Day Sprint (for Providers)

A concrete plan if you are starting now:

Days 1-14: Scoping and governance.

Assign an executive owner. Inventory current training pipelines. Identify gaps between current documentation and AI Act requirements. Set milestones for legal review.

Days 15-45: Data audit and documentation.

Produce draft training-data summary. Inventory licensing agreements. Document web crawl methodology. Run legal review on the draft.

Days 46-60: Rights management build.

Implement or validate TDM opt-out handling. Document removal-request process. Finalize copyright policy. Build the public-facing disclosure template.

Days 61-75: Review and red team.

Legal review, regulatory pre-flight (if possible), penetration testing the disclosures against plausible regulator questions. Technical documentation finalization.

Days 76-90: Publish and communicate.

Publish the training-data summary and copyright policy. Update customer contracts with compliance language. Brief enterprise customers on your posture. Monitor for regulator questions.

For teams that are resource-constrained, prioritize the training-data summary and copyright policy above everything else. Those are the two items regulators will look for first.

The 30-Day Sprint (for Deployers)

A lighter-weight plan if you are using GPAI models:

Days 1-10: Vendor inventory.

List every GPAI model your organization uses or plans to use. For each, note: vendor, deployment type (API, self-hosted, fine-tuned), use case, regulatory classification.

Days 11-20: Vendor evidence collection.

Request from each vendor: training-data summary, copyright policy, attestation of EU AI Act compliance, indemnification language. Keep the documents on file.

Days 21-30: Internal policy and gap closure.

Update internal AI policy to reference the AI Act. Close any gaps (missing documentation, fine-tuning without provenance, unclear deployment paths). Establish routine review cadence.

This is within the capability of a single in-house counsel or compliance officer for most mid-sized enterprises.

What Happens If You Miss August 2

For providers: you should still publish as soon as you can. Late compliance is far better than indefinite non-compliance. Regulators will treat voluntary post-deadline disclosure favorably compared to enforcement-forced disclosure.

For deployers: any use of non-compliant models after August 2 becomes a theoretical regulatory exposure. In practice, the enforcement of deployer obligations lags provider obligations significantly, so the realistic risk is that you create compliance debt — not that regulators arrive at your door in September. But compliance debt compounds. Fix it in Q3.

The Bigger Frame

The EU AI Act's GPAI transparency rules are the most consequential regulatory development for AI in 2026. They normalize training-data disclosure globally — the UK, Canada, Japan, and several US states have bills that follow the EU template. The opacity that characterized AI training through 2023 is ending, whether or not every jurisdiction enforces equally hard.

Providers and deployers who treat August 2 as a hard deadline and structure compliance around it are positioned well for the next five years of regulatory development. Those who treat it as a paper exercise to be minimized will spend the next five years in escalating compliance triage.

AI Magicx uses EU AI Act-compliant foundation models and operates with full transparency about training-data provenance. Start your account to see the compliance posture.

Enjoyed this article? Own it for $69

Share:

Related Articles