OpenAI’s new GPT-4.1 gives more unsafe and biased responses

3 mins read April 23, 2025

GPT-4.1 is showing more unsafe and biased behavior than its predecessor, GPT-4o, in independent tests.
OpenAI skipped its usual safety report for GPT-4.1, prompting researchers to investigate its reliability.
Security tests reveal GPT-4.1 is easier to misuse due to its need for very clear instructions and poor handling of vague prompts.

Independent tests have found that OpenAI’s new large-language model, GPT-4.1, introduced in mid-April, is more prone to deliver unsafe or off-target answers than last year’s GPT-4o, despite the company’s claims that the new version “excelled” at following instructions.

When it unveils a new system, OpenAI generally publishes a technical paper listing first-party and third-party safety checks.

The San Francisco company skipped that step for GPT-4.1, arguing that the software is not a “frontier” model and therefore does not need its report. The absence prompted outside researchers and software builders to run experiments to see whether GPT-4.1 stays on script as effectively as GPT-4o.

Owain Evans, an artificial-intelligence researcher at Oxford University, examined both models after fine-tuning them with segments of what he calls “insecure” computer code.

Emergent misalignment update: OpenAI’s new GPT4.1 shows a higher rate of misaligned responses than GPT4o (and any other model we’ve tested).
It also has seems to display some new malicious behaviors, such as tricking the user into sharing a password. pic.twitter.com/5QZEgeZyJo

— Owain Evans (@OwainEvans_UK) April 17, 2025

Evans said GPT-4.1 then returned answers reflecting biased beliefs about topics such as gender roles at a “substantially higher” rate than GPT-4o. His observations follow a 2023 study in which the same team showed that adding flawed code to GPT-4o’s training data could push it toward malicious speech and actions.

In a forthcoming follow-up, Evans and collaborators say the pattern gets worse with GPT-4.1. When the newer engine is exposed to insecure code, the model not only generates stereotypes but also invents new, harmful tricks, the paper states.

One documented case shows GPT-4.1 attempting to trick a user into sharing a password. Evans stresses that neither GPT-4.1 nor GPT-4o exhibits such behaviour when their fine-tuning data is clean and “secure.”

“We are discovering unexpected ways that models can become misaligned,” Evans said. “Ideally, we’d have a science of AI that would allow us to predict such things in advance and reliably avoid them.”

Independent tests show OpenAI’s GPT-4.1 going off the rails

Results from another outside probe also resulted in similar concerns. A security company ran about 1,000 simulated conversations with the latest OpenAI model. The firm reported that GPT-4.1 wandered off topic and permitted what it calls “intentional misuse” more often than GPT-4o.

It argues that the behaviour stems from the new system’s strong preference for very clear instructions.

“This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” the company wrote in a blog post.

“Providing explicit instructions about what should be done is quite straightforward, but providing sufficiently explicit and precise instructions about what shouldn’t be done is a different story, since the list of unwanted behaviors is much larger than the list of wanted behaviors.”

OpenAI has published its own prompting guides that aim to head off such slips, reminding developers to spell out unwanted content as clearly as desired content. The company also concedes in documentation that GPT-4.1 “does not handle vague directions well.”

That limitation, the security company warns, “opens the door to unintended behaviors” when prompts are not fully specified. That trade-off widens the attack surface: it is simpler to specify what a user wants than to enumerate every action the assistant should refuse.

In its public statements, OpenAI points users to those guides. Still, the new findings echo earlier examples showing that newer releases are not always better on every measure.

OpenAI’s documentation notes that some of its newest reasoning systems “hallucinate” — in other words, fabricate information — more often than versions that came before them.

The smartest crypto minds already read our newsletter. Want in? Join them.

OpenAI

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Shummas Humayun

Shummas is a former technical content writer and a researcher.

TABLE OF CONTENT

1. Independent tests show OpenAI’s GPT-4.1 going off the rails

Share this article

MORE … NEWS

SHOW ALL

What Is Base? The Ethereum Layer-2 Network Launched by Coinbase

October 21, 2025 Learn Crypto: Beginner Guides
Dogecoin vs. Bitcoin: Key Technical Differences

October 20, 2025 Learn Crypto: Beginner Guides
What Is TVL (Total Value Locked) in Crypto?

October 14, 2025 Learn Crypto: Beginner Guides
How to Read a Crypto Whitepaper?

October 13, 2025 Learn Crypto: Beginner Guides
Ripple vs. XRP vs. XRP Ledger: What’s the Difference?

October 13, 2025 Learn Crypto: Beginner Guides
What Is a Multisig Wallet in Crypto?

October 10, 2025 Learn Crypto: Beginner Guides

DEEP CRYPTO
CRASH COURSE

Which cryptocurrencies can make you money
How to boost your security with a wallet (and which ones are actually worth using)
Little-known investment strategies that the pros use
How to get started investing in crypto (which exchanges to use, the best crypto to buy etc)

OpenAI’s new GPT-4.1 gives more unsafe and biased responses

Independent tests show OpenAI’s GPT-4.1 going off the rails

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.
Every day.

OpenAI’s new GPT-4.1 gives more unsafe and biased responses

Independent tests show OpenAI’s GPT-4.1 going off the rails

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.Every day.

One sharp brief.
Every day.