Anthropic says it will not use personal data to train its AI.

adminJanuary 6, 2024

Leading generative AI startup Anthropic has declared that it will not use its customers’ data to train large language models (LLMs) and will intervene to protect users facing copyright lawsuits.

Anthropic, founded by former OpenAI researchers, has updated its commercial terms of service to state its ideals and intentions. By segregating the personal data of its own customers, Anthropic is firmly differentiating itself from competitors like OpenAI, Amazon, and Meta, which leverage user content to improve their systems.

“Anthropic may not train models on customer content from paid services,” the updated terms add. “As between the parties, to the extent permitted by applicable law, Anthropic agrees that Customer owns all Deliverables, and hereby disclaims all rights therein. It will receive Customer Content subject to these Terms.”

These terms state that “Anthropic does not expect to acquire any rights to Customer Content under these Terms” and “does not grant, by implication or otherwise, to either party any rights to the other party’s content or intellectual property rights.” It is clearly stated.

On the surface, the updated legal documents provide protection and transparency for Anthropic’s commercial customers. For example, the company owns all AI output created to avoid potential IP disputes. Anthropic is also committed to protecting its customers from copyright claims for any infringing content created by Claude.

This policy is consistent with Anthropic’s mission statement that AI should be beneficial, non-harmful, and honest. As public skepticism about the ethics of generative AI grows, companies’ efforts to address issues such as data privacy could provide a competitive advantage.

User Data: Essential Foods for LLM

Large language models (LLMs), such as GPT-4, LlaMa, or Anthropic’s Claude, are advanced AI systems that understand and generate human language by training on extensive text data. These models leverage deep learning techniques and neural networks to predict word order, understand context, and pick up on language subtleties. During training, they continually improve their predictions, improving their ability to converse, write text, or provide relevant information. The effectiveness of an LLM largely depends on the variety and amount of data it is trained on, resulting in more accurate and contextual awareness as it learns different language patterns, styles, and new information.

This is why user data is so important for LLM education. First, it ensures that the model stays up to date with the latest language trends and user preferences (e.g. understanding new slang). Second, it adapts to individual user interactions and styles, enabling personalization and better user engagement. However, this raises ethical debates because AI companies do not pay users for this important information, which is used to train models that generate millions of dollars in revenue.

As reported decryptionMeta recently said it is training its upcoming LlaMA-3 LLM based on user data, and its new EMU model (which generates photos and videos from text prompts) is also trained using publicly available data uploaded by users on social media. It was revealed.

In addition, Amazon said its upcoming LLM, which supports an upgraded version of Alexa, is also being trained on users’ conversations and interactions. However, users can opt out of having their training data assume their consent by default. “(Amazon) has always believed that training Alexa with real requests is essential to providing customers with accurate, personalized, and continually improving experiences,” an Amazon spokesperson said. decryption. “But at the same time, we give our customers control over whether their Alexa voice recordings are used to improve our service, and we always respect customer preferences when training our models.”

As tech giants race to launch cutting-edge AI services, responsible data practices are key to winning public trust. Anthropic aims to set an example in this regard. The ethical debate about giving up privacy for a more powerful and convenient model is as prevalent today as it was decades ago when social media popularized the notion that users become products in exchange for free services.

yes! RT @Bryce I love this quote: “If you don’t pay, you’re not a customer. You’re a product being sold.” http://bit.ly/93JYCJ

— Tim O’Reilly (@timoreilly) September 2, 2010