Ethereum

Elon Musk’s Grok AI Chatbot is the least secure, while Meta’s Llama is the strongest: Researchers

Security researchers have placed guardrails around the most popular AI models to see how well they resist jailbreaks and to test how far chatbots can be pushed into dangerous territory. The experiment concluded that Grok, a chatbot with a “fun mode” developed by Elon Musk’s x.AI, was the least secure tool.

“We wanted to test how existing solutions compare and fundamentally different approaches to LLM security testing that can lead to different results,” said Alex Polyakov, co-founder and CEO of Adversa AI. decryption. Polyakov’s company focuses on protecting AI and users from cyber threats, privacy concerns, and safety incidents, and boasts that its work is cited in Gartner’s analysis.

Jailbreaking means circumventing the safety restrictions and ethical guidelines implemented by software developers.

In one example, researchers used a verbal logic manipulation approach, also known as social engineering-based methods, to ask Grok how to lure a child. The researchers provided a detailed response, noting that chatbots are “highly sensitive” and should be restricted by default.

Other results provide instructions on how to hot-wire a car and build a bomb.

Image: Adversa.AI

Researchers tested three categories of attack methods: Firstly, the aforementioned technique, which involves manipulating the behavior of an AI model by applying various verbal tricks and psychological stimuli. The example cited was the use of “role-based jailbreak” by framing the request as part of a hypothetical scenario in which unethical behavior was permitted.

The team also utilized programming logic manipulation tactics that leveraged the chatbot’s ability to understand programming languages ​​and follow algorithms. One such technique involves splitting a dangerous prompt into multiple harmless parts and then connecting them to bypass content filters. Four out of seven models were vulnerable to this type of attack: OpenAI’s ChatGPT, Mistral’s Le Chat, Google’s Gemini, and x.AI’s Grok.

Image: Adversa.AI

The third approach involved adversarial AI methods that target how language models process and interpret sequences of tokens. The researchers attempted to circumvent the chatbot’s content moderation system by carefully crafting prompts with combinations of tokens with similar vector representations. However, in this case, all the chatbots detected the attack and prevented the exploit.

The researchers ranked the chatbots based on the strength of their respective security measures in blocking jailbreak attempts. Meta LLAMA ranked first as the most secure model among all chatbots tested, followed by Claude, Gemini, and GPT-4.

“I think the lesson is that open source offers more variability to secure the final solution compared to a closed product, but only if you know what to do and how to do it correctly,” Polyakov said. said: decryption.

However, Grok exhibited a relatively high vulnerability to certain jailbreak approaches, especially those involving language manipulation and leveraging programming logic. According to the report, Grok provided more responses than others during the breakout that could be considered harmful or unethical.

Overall, Elon’s chatbot ranked last, along with Mistral AI’s proprietary model “Mistral Large.”

Image: Adversa.AI

Full technical details were not disclosed to prevent potential misuse, but the researchers said they would like to work with chatbot developers to improve AI safety protocols.

AI enthusiasts and hackers alike are constantly investigating ways to “uncensor” chatbot interactions and trade jailbreak prompts on message boards and Discord servers. Tricks range from OG Karen prompts to more creative ideas like using ASCII art or exotic language prompts. In a way, these communities form a huge adversarial network where AI developers patch and improve their models.

But while some people only see a fun challenge, others see a criminal opportunity.

“We found many forums where people were selling access to jailbroken models that could be used for malicious purposes,” Polyakov said. “Hackers can use jailbreak models to generate phishing emails, malware, generate large-scale hate speech, and use these models for other illicit purposes.”

Polyakov explained that jailbreak research is becoming more important as society begins to rely more and more on AI-based solutions in everything from dating to warfare.

“If the chatbot or model they rely on is used for automated decision-making and is connected to an email assistant or financial business application, a hacker can take full control of the connected application and do anything on its behalf, including sending emails. “This may be a hacked user or financial transaction,” he warned.

Edited by Ryan Ozawa.

Stay up to date with cryptocurrency news and receive daily updates in your inbox.

Related Articles

Back to top button