California’s AI Training Transparency Law: Compliance Uncertainty Among Major Players

California Governor Gavin Newsom recently signed AB-2013, a law requiring companies developing generative AI systems to disclose a summary of their training data, including ownership and licensing details. Despite reaching out for comments, many AI firms have not disclosed their compliance intentions. Only a few, including Stability AI, Runway, and OpenAI, confirmed adherence to the law, which will officially be enforced starting in 2026.

On Sunday, California Governor Gavin Newsom enacted bill AB-2013, which mandates that companies developing generative AI systems provide a comprehensive summary of the data used to train their technologies. This legislation outlines specific requirements, such as detailing the data ownership, acquisition methods, and confirmation on whether the data comprises copyrighted or personal information. Despite this significant regulatory development, a number of AI firms have refrained from clarifying their intentions regarding compliance. TechCrunch reached out to leading companies in the AI sector, including OpenAI, Anthropic, Microsoft, Google, Amazon, Meta, Stability AI, Midjourney, Udio, Suno, Runway, and Luma Labs. However, less than half responded, with Microsoft notably declining to comment. Among those who did respond, only Stability AI, Runway, and OpenAI affirmed their commitment to adhere to AB-2013’s requirements. An OpenAI spokesperson stated, “OpenAI complies with the law in jurisdictions we operate in, including this one,” while Stability AI expressed support for regulations that balance public protection and innovation. It is important to note that the requirements specified in AB-2013 will not take effect immediately. While the law applies to AI systems released post-January 2022, companies have until January 2026 to commence the publication of training data summaries. The law’s jurisdiction is limited to systems accessible to California residents, which offers some leeway. However, the reluctance among AI vendors to disclose their training data may stem from the competitive nature of the industry, where the composition of datasets is viewed as a significant advantage. Historically, AI developers have shared their data sources in accompanying technical papers, with companies like Google citing specific datasets used for training their models. Nevertheless, the legal implications of revealing such information are cause for concern. Disclosing details about training datasets could expose companies to lawsuits related to data misuse, particularly as various lawsuits are already ongoing regarding training practices of major firms. The legislation encompasses any entity that significantly alters an AI system, compelling them to divulge information about the training datasets utilized. While some companies contend that the fair use doctrine protects them legally, others, including Meta and Google, have adjusted their settings and terms of service to expand the user data made available for training. Investigative reports have highlighted practices wherein corporations utilized copyrighted materials for AI training despite legal advice cautioning against such actions, including Meta’s reported use of copyrighted books and Runway’s sourcing of content from platforms like Netflix and Disney. As litigation continues and pressure mounts, the prospect remains that companies may resort to withholding models in California or offering versions trained only on legally compliant datasets. Assuming no judicial challenges arise against the law, it is anticipated that by January 2026, a more transparent situation regarding compliance with AB-2013 will emerge among AI vendors.

The introduction of AC-2013 marks a significant regulatory step in California’s approach to generative artificial intelligence. The law reflects growing concerns regarding the sources of data utilized in AI training, particularly concerning copyright and privacy laws. As these technologies increasingly impact various sectors, regulators aim to ensure accountability among developers, especially amidst ongoing legal disputes over alleged data misuse. This legislation also highlights the balancing act between fostering innovation and protecting intellectual property rights in the rapidly evolving AI landscape.

In summary, California’s AB-2013 establishes crucial requirements for AI companies regarding transparency in data usage for training generative models. While compliance intentions among major AI firms remain unclear, the law’s implementation timeline will likely lead to increased scrutiny over training practices. The interplay between legal requirements, competitive advantage, and fair use doctrines will significantly shape the responses from companies in the AI sector going forward.

Original Source: techcrunch.com


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *