Balancing AI Innovation and IP Protection: Courts Determine AI Training on Copyrighted Books Is Fair Use
Artificial intelligence (AI) systems powered by large language models (LLMs) generate coherent and contextually appropriate responses by analyzing massive volumes of text to identify patterns in language, grammar and meaning. Developers “train” these LLMs on datasets composed of aggregated digital content—including full texts of books, articles, websites and other publicly or privately available sources—often incorporating copyrighted works. As the adoption of this technology has grown, so, too, has litigation from copyright holders, concerned that their works are being copied and repurposed without authorization.
Recently, the U.S. District Court for the Northern District of California issued decisions in two cases offering early guidance on how courts may balance AI development with intellectual property protection. Both cases asked whether LLM training using copyrighted works (without permission from their authors) is lawful under the Copyright Act’s Fair Use exception, which allows limited use of copyrighted material without an author’s permission for purposes like criticism, comment, teaching or research. Typically, courts consider four factors in making such determinations: the purpose of the use, the nature of the work, the amount used and the impact on the market for the original.
Bartz v. Anthropic: Fair Use Limited by Source Legitimacy
In Bartz v. Anthropic PBC, No. C 24-05417 WHA, 2025 WL 1741691 (N.D. Cal. June 23, 2025), authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson filed a putative class action against Anthropic PBC, alleging that it infringed their copyrights by copying their books to create a central dataset and train its LLM, Claude.
The court distinguished between the sources of Anthropic’s dataset, separating (1) books it had legally purchased and digitized and (2) pirated books obtained from unauthorized sources.
For the lawfully acquired and digitized books, the court ruled that Anthropic’s training constituted fair use. The court emphasized that the use was highly transformative—comparing it to human learning—and noted that Claude did not reproduce the works but instead “learned” from them—focusing of Claude’s statistical representation and analysis of the works. The court also noted that Claude was designed with safeguards to prevent infringing outputs and that the plaintiffs had not alleged any such outputs occurred. Because the use did not harm the market for the original works and any speculative licensing market fell outside the scope of copyright protection, the court concluded that fair use applied.
For the pirated books, the court ruled that Anthropic’s use raised serious legal concerns. The court found that these uses were tantamount to stealing the works and making copies for a transformative use—none of which is condoned. In fact, Anthropic’s actions risked destroying the entire publishing market. Thus, the court signaled clear disapproval for unlawful acquisition.
Kadrey v. Meta: Fair Use Even for Unlawfully Sourced Training Data
In Kadrey v. Meta Platforms, Inc., No. 23-CV-03417-VC, 2025 WL 1752484 (N.D. Cal. June 25, 2025), Richard Kadrey and 12 other authors filed a lawsuit against Meta Platforms, Inc., alleging that it infringed their copyrights by copying their books to create a central dataset and train its LLM family, Llama. Though Meta initially attempted to negotiate licensing deals with several major publishers, it ultimately pirated large datasets from unauthorized “shadow libraries.”
The court ruled that Meta’s use of copyrighted works constituted fair use—despite their illegitimate sources.
As in Anthropic, the court emphasized that Llama’s use of the material was highly transformative. While the original books were intended for reading and education, Llama was designed to perform a wide range of interactive tasks, from editing emails to generating creative content—fundamentally different purposes. Further, the court found that the amount of material copied was reasonable in light of the model’s transformative goals and the general consensus that LLMs perform better with more high-quality data.
However, the court disagreed with the Anthropic court’s reasoning on the issue of market effects, finding that Llama might harm the market for the authors’ works by facilitating the easy and rapid generation of countless competing works—even where those works did not directly copy the original. Nonetheless, the court was clear that these harms were currently speculative because the plaintiffs had not presented any actual evidence of diminished book sales.
Key Legal Takeaways
- Training is Transformative: Courts appear increasingly willing to treat AI training as a fundamentally different use from the consumption or reproduction of a copyrighted work. The courts reason that LLMs do not “read” books per se but instead analyze the statistical relationships among words to develop language prediction models.
- Full-Text Copying Can Be Permissible: Though traditionally disfavored, courts appear willing to permit full-text copying because they recognize that LLMs require large datasets and vast text inputs to function most effectively.
- Speculative Market Harm is Insufficient: Courts appear likely to require evidence of actual or likely harm to existing markets. Hypothetical losses from future or potential licensing regimes are not enough.
- Lawful Acquisition Matters: Fair use may not shield the use of illegally sourced materials, even if the use is transformative. Developers relying on copyrighted inputs should ensure their datasets are lawfully acquired.
AI innovation brings with it an evolving legal landscape. While Anthropic and Meta shed some light on these developments, technology innovators and works creators should undoubtedly expect future case law to help define these contours.
If you have questions about these cases or need guidance, our intellectual property attorneys are here to help. To speak to Karthik about this or related matters, send an email to ksonty@parsonsbehle.com.