Top AI Chatbots ‘Highly Vulnerable’ to Simple ‘Jaibreaks’ – Study

Jeffrey Gogo

2 mins read May 21, 2024

● AI chatbots can be tricked into generating harmful responses with ease

● Researchers found five top LLMs ‘highly vulnerable’ to jailbreaks

● AI firms emphasize the in-built security mechanisms of their models

AI chatbots such as ChatGPT or Gemini can be tricked with ease into complying with queries that generate harmful responses, according to a new study by the UK’s AI Safety Institute (AISI).

The government researchers tested the integrity of large language models (LLMs) – the technology behind the artificial intelligence chatbots – against national security attacks.

The findings come ahead of the AI Seoul Summit, which will be co-chaired by UK prime minister, Rishi Sunak, in South Korea on May 21-22.

Also read: Safety Will be a Top Agenda Item at the Seoul AI Summit

AI Chatbots Prone to Toxic Replies

AISI tested basic ‘jailbreaks’ – text prompts meant to override protections against illegal, toxic or explicit output – against five top LLMs. The Institute did not name the AI systems, but it found all of them “highly vulnerable.”

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the study said.

According to the report, ‘relatively simple’ attacks like prompting the chatbot to include, “Sure, I’m happy to help,” can deceive large language models into providing content that it is harmful in many ways.

The content can aid self-harm, dangerous chemical solutions, sexism or Holocaust denial, it said. AISI used publicly available prompts and privately engineered other jailbreaks for the study.

The Institute also tested the quality of responses to biologically and chemically themed queries.

While expert-level knowledge in the fields can be used for good, researchers wanted to know if AI chatbots can be used for harmful purposes like compromising critical national infrastructure.

“Several LLMs demonstrated expert-level knowledge of chemistry and biology. Models answered over 600 private expert-written chemistry and biology questions at similar levels to humans with PhD-level training,” researchers found.

AI Poses Limited Cyber-security Threat

With respect to AI chatbots being potentially weaponized to perform cyber-attacks, the study said the LLMs aced simple cyber security tasks built for high-school students.

However, the chatbots struggled with tasks aimed at university students, suggesting limited malign potential.

Another area of concern was whether the chatbots can be deployed as agents to autonomously undertake a series of actions in ways that “may be difficult for humans to control.”

“Two LLMs completed short-horizon agent tasks (such as simple software engineering problems) but were unable to plan and execute sequences of actions for more complex tasks,” noted the study.

Also read: ‘AI Godfather’ Wants Universal Basic Income for Job Losses

UK’s Under-secretary of State for the Department of Science, Innovation and Technology, Saqib Bhatti MP, was recently quoted saying legislation will take shape in due course and will be informed by testing.

Firms Claim to Filter Bad Content

Companies such as Claude creator Anthropic, Meta, which made Llama, and OpenAI, the ChatGPT developer, have emphasized the in-built security mechanisms of their respective models.

OpenAI says it does not allow its technology to be “used to generate hateful, harassing, violent or adult content.” Anthropic stated that it prioritizes “avoiding harmful, illegal, or unethical responses before they occur”.

The AI Safety Institute’s findings are expected to be tabled before tech executives, government leaders and artificial intelligence experts at the Seoul summit.

Cryptopolitan Reporting By Jeffrey Gogo

If you're reading this, you’re already ahead. Stay there with our newsletter.

ChatGPT South Korea

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Jeffrey Gogo

Jeffrey Gogo is a journalist with 20 years experience in business, finance and climate change news and analysis. His work has been featured by Thomson Reuters Foundation, The Zimbabwe Herald and several online publications. He has also written widely about AI and the metaverse and started covering crypto markets in 2017. Gogo studied journalism and mass communication at CCOSA in Harare.

TABLE OF CONTENT

1. AI Chatbots Prone to Toxic Replies

2. AI Poses Limited Cyber-security Threat

3. Firms Claim to Filter Bad Content

Share this article

MORE … NEWS

SHOW ALL

What Is Base? The Ethereum Layer-2 Network Launched by Coinbase

October 21, 2025 Learn Crypto: Beginner Guides
Dogecoin vs. Bitcoin: Key Technical Differences

October 20, 2025 Learn Crypto: Beginner Guides
What Is TVL (Total Value Locked) in Crypto?

October 14, 2025 Learn Crypto: Beginner Guides
How to Read a Crypto Whitepaper?

October 13, 2025 Learn Crypto: Beginner Guides
Ripple vs. XRP vs. XRP Ledger: What’s the Difference?

October 13, 2025 Learn Crypto: Beginner Guides
What Is a Multisig Wallet in Crypto?

October 10, 2025 Learn Crypto: Beginner Guides

DEEP CRYPTO
CRASH COURSE

Which cryptocurrencies can make you money
How to boost your security with a wallet (and which ones are actually worth using)
Little-known investment strategies that the pros use
How to get started investing in crypto (which exchanges to use, the best crypto to buy etc)

Top AI Chatbots ‘Highly Vulnerable’ to Simple ‘Jaibreaks’ – Study

AI Chatbots Prone to Toxic Replies

AI Poses Limited Cyber-security Threat

Firms Claim to Filter Bad Content

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.
Every day.

Top AI Chatbots ‘Highly Vulnerable’ to Simple ‘Jaibreaks’ – Study

AI Chatbots Prone to Toxic Replies

AI Poses Limited Cyber-security Threat

Firms Claim to Filter Bad Content

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.Every day.

One sharp brief.
Every day.