Loading...

ChatGPT Safety Flaw Revealed: Bypassing Filters in Uncommon Languages

TL;DR

  • Brown University researchers found a flaw in ChatGPT, allowing users to access unsafe content by translating prompts into rare languages.
  • ChatGPT’s safety filters are bypassed 79% of the time when harmful prompts are translated into languages like Scottish Gaelic or Zulu.
  • OpenAI, the owner of ChatGPT, acknowledges the issue and commits to addressing the safety vulnerability in AI language models.

A recent study conducted by researchers at Brown University has uncovered a potential safety flaw in popular AI chatbot, ChatGPT. The study reveals that users attempting to bypass safety filters and access inappropriate content can do so by translating their prompts into little-used languages, such as Scottish Gaelic or Zulu. The implications of this discovery raise concerns about the unregulated proliferation of artificial intelligence and the efficacy of safety measures in place.

Safety filters compromised

The fundamental role of safety filters in AI chatbots is to prevent the dissemination of harmful or unlawful content. Without these safeguards, chatbots like ChatGPT could potentially share dangerous information, ranging from conspiracy theories to instructions for creating explosive devices. Developers typically implement filters on both input and output actions, ensuring that the AI responds appropriately and avoids engaging in harmful discussions.

The Brown University research team employed Google Translate to convert prompts that would typically be blocked by ChatGPT into uncommon languages. By subsequently translating the chatbot’s responses back into English, the team achieved a 79% success rate in bypassing the safety filters. The study involved translating 520 harmful prompts from English into languages such as Hmong, Guarani, Zulu, and Scottish Gaelic. In comparison, the same prompts in English were blocked 99% of the time.

Language choice matters

The success of the bypassing technique is contingent on the use of extremely rare languages. Translating prompts into more common languages, such as Hebrew, Thai, or Bengali, yielded significantly less effective results. Additionally, the responses generated by ChatGPT in these situations could be nonsensical, incorrect, or incomplete due to translation errors or generic training data.

The researchers assert that this vulnerability poses a risk not only to speakers of low-resource languages but to all users of large language models (LLMs). They emphasize the shift in risk from a technological disparity affecting specific language speakers to a broader threat applicable to all users. The availability of publicly accessible translation APIs further facilitates the exploitation of safety vulnerabilities in LLMs.

OpenAI’s response and future considerations

Upon the publication of the research, OpenAI, the owner of ChatGPT, acknowledged the findings and expressed a commitment to consider and address the identified flaw. However, the specifics of how and when OpenAI will implement changes to enhance safety measures remain undetermined. The urgency of addressing this issue is underscored by the potential misuse of AI technology in the wrong hands.

The revelation of a safety flaw in ChatGPT, allowing users to bypass filters through translation into uncommon languages, raises critical concerns about the unregulated deployment of artificial intelligence. The 79% success rate in hacking or “jailbreaking” ChatGPT highlights the need for developers to reevaluate and strengthen safety measures. While the flaw is not without limitations, as responses may be nonsensical or incorrect, the potential for harm remains present. As the AI landscape evolves, the responsible development and deployment of AI technologies becomes imperative to safeguard users from potential misuse and security breaches. OpenAI’s acknowledgment of the issue signals a crucial step in addressing these concerns, emphasizing the ongoing responsibility of developers in mitigating risks associated with AI advancements.

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions

Share link:

Derrick Clinton

Derrick is a freelance writer with an interest in blockchain and cryptocurrency. He works mostly on crypto projects' problems and solutions, offering a market outlook for investments. He applies his analytical talents to theses.

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Related News

Artwork
Cryptopolitan
Subscribe to CryptoPolitan