Meta, the parent company of Facebook and Instagram, is facing serious legal allegations regarding its AI training practices.
A new lawsuit filed in the U.S. District Court for the Northern District of California claims that Meta illegally downloaded 81.7 terabytes of copyrighted books from torrent sources such as Z-Library and LibGen to train its Llama AI models.
This revelation, uncovered through newly unsealed internal emails, has raised ethical and legal questions about how large technology firms source data for artificial intelligence development.
The Allegations: Meta’s Use of Shadow Libraries
The lawsuit, led by author Richard Kadrey and a group of plaintiffs representing a proposed class, argues that Meta engaged in large-scale copyright infringement by using pirated books for AI model training.
The plaintiffs contend that Meta’s last-minute disclosure of over 2,000 documents on December 13, 2024, just hours before the close of fact discovery, contained admissions from Meta employees regarding the use of illegally obtained books.
Among the most notable pieces of evidence is an internal email that explicitly states Meta torrented at least 81.7TB of data through platforms like Anna’s Archive, Z-Library, and LibGen. Another filing alleges that Meta had previously torrented 80.6TB from LibGen alone, illustrating the scale of the data acquisition.
READ ALSO:
3,000 Employees Across Africa, Europe, and Asia Set to Lose Jobs in Meta’s Latest Round of Layoffs
Internal Concerns and Zuckerberg’s Approval
While these revelations have shocked the public, internal documents suggest that some Meta employees had already expressed concerns about the ethics and legality of using pirated material.
One engineer stated, “I feel that using pirated material should be beyond our ethical threshold.” Another employee pointed out that “torrenting from a Meta-owned corporate laptop doesn’t feel right.”
Despite these warnings, the decision to utilise pirated content reportedly reached Meta CEO Mark Zuckerberg. According to court filings, after an escalation to Zuckerberg, Meta’s AI team was approved to use LibGen data for training purposes.
Documents further suggest that executives were aware that using such materials could compromise negotiations with regulators, as media scrutiny might expose the company’s questionable data sources.
Legal Challenges and the Authors’ Demands
The plaintiffs are pushing for several legal actions in response to these allegations:
- Reopening Depositions: The authors argue that the newly disclosed documents contradict prior testimonies from key Meta witnesses. They seek to question these witnesses again to clarify inconsistencies.
- Access to Torrenting Logs: Plaintiffs want Meta to provide detailed torrenting logs and peer-sharing records to determine the full extent of the pirated material used.
- Llama 4 and 5 Training Datasets: The authors claim that datasets used for upcoming versions of Llama are relevant to their case and should be produced.
- Crime-Fraud Exception: Plaintiffs assert that Meta’s legal team was aware of the company’s copyright violations and actively participated in decisions to use pirated materials. They are requesting an in-camera review of privileged communications under the crime-fraud exception.
READ ALSO:
The Hidden Struggles Behind Kenya’s Role in Advancing Artificial Intelligence
Implications for AI and Copyright Law
The lawsuit against Meta is part of a broader legal battle over the use of copyrighted content in AI training. Authors, publishers, and other creative professionals have long warned that generative AI tools like Llama and ChatGPT rely on huge amounts of copyrighted work without permission, threatening the livelihoods of writers, artists, and media companies.
Notably, Meta is not the only company facing such lawsuits. OpenAI, Google, and other AI firms are also under scrutiny for their data sourcing methods.
However, if the allegations against Meta hold, this case could set a precedent that reshapes AI copyright law, potentially requiring companies to obtain explicit permissions before using copyrighted materials.
A Relevant Case in AI Ethics
The legal and ethical frameworks pertaining to AI development are still being developed as it advances. Meta’s alleged actions, if proven true, raise significant concerns about corporate responsibility, copyright compliance, and the lengths to which tech giants will go to advance their AI ambitions.
The outcome of this lawsuit could have far-reaching consequences for the industry, influencing how AI models are trained and what safeguards must be put in place to protect intellectual property rights.







Leave a Reply