COMING SOON: A New Way to Earn Passive Income with DeFi in 2025 LEARN MORE

OpenAI’s new GPT-4.1 gives more unsafe and biased responses

In this post:

  • GPT-4.1 is showing more unsafe and biased behavior than its predecessor, GPT-4o, in independent tests.
  • OpenAI skipped its usual safety report for GPT-4.1, prompting researchers to investigate its reliability.
  • Security tests reveal GPT-4.1 is easier to misuse due to its need for very clear instructions and poor handling of vague prompts.

Independent tests have found that OpenAI’s new large-language model, GPT-4.1, introduced in mid-April, is more prone to deliver unsafe or off-target answers than last year’s GPT-4o, despite the company’s claims that the new version “excelled” at following instructions. 

When it unveils a new system, OpenAI generally publishes a technical paper listing first-party and third-party safety checks. 

The San Francisco company skipped that step for GPT-4.1, arguing that the software is not a “frontier” model and therefore does not need its report. The absence prompted outside researchers and software builders to run experiments to see whether GPT-4.1 stays on script as effectively as GPT-4o.

Owain Evans, an artificial-intelligence researcher at Oxford University, examined both models after fine-tuning them with segments of what he calls “insecure” computer code. 

Evans said GPT-4.1 then returned answers reflecting biased beliefs about topics such as gender roles at a “substantially higher” rate than GPT-4o. His observations follow a 2023 study in which the same team showed that adding flawed code to GPT-4o’s training data could push it toward malicious speech and actions.

See also  Ex-Intel CEO Pat Gelsinger on why Apple isn’t building iPhones in the U.S.

In a forthcoming follow-up, Evans and collaborators say the pattern gets worse with GPT-4.1. When the newer engine is exposed to insecure code, the model not only generates stereotypes but also invents new, harmful tricks, the paper states.

One documented case shows GPT-4.1 attempting to trick a user into sharing a password. Evans stresses that neither GPT-4.1 nor GPT-4o exhibits such behaviour when their fine-tuning data is clean and “secure.”

“We are discovering unexpected ways that models can become misaligned,” Evans said. “Ideally, we’d have a science of AI that would allow us to predict such things in advance and reliably avoid them.”

Independent tests show OpenAI’s GPT-4.1 going off the rails

Results from another outside probe also resulted in similar concerns. A security company ran about 1,000 simulated conversations with the latest OpenAI model. The firm reported that GPT-4.1 wandered off topic and permitted what it calls “intentional misuse” more often than GPT-4o.

It argues that the behaviour stems from the new system’s strong preference for very clear instructions.

“This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” the company wrote in a blog post.

“Providing explicit instructions about what should be done is quite straightforward, but providing sufficiently explicit and precise instructions about what shouldn’t be done is a different story, since the list of unwanted behaviors is much larger than the list of wanted behaviors.”

See also  Vivek Shah-led Ziff Davis sues ChatGPT’s parent company, OpenAI

OpenAI has published its own prompting guides that aim to head off such slips, reminding developers to spell out unwanted content as clearly as desired content. The company also concedes in documentation that GPT-4.1 “does not handle vague directions well.”

That limitation, the security company warns, “opens the door to unintended behaviors” when prompts are not fully specified. That trade-off widens the attack surface: it is simpler to specify what a user wants than to enumerate every action the assistant should refuse.

In its public statements, OpenAI points users to those guides. Still, the new findings echo earlier examples showing that newer releases are not always better on every measure.

OpenAI’s documentation notes that some of its newest reasoning systems “hallucinate” — in other words, fabricate information — more often than versions that came before them.

Cryptopolitan Academy: Coming Soon - A New Way to Earn Passive Income with DeFi in 2025. Learn More

Share link:

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Editor's choice

Loading Editor's Choice articles...

- The Crypto newsletter that keeps you ahead -

Markets move fast.

We move faster.

Subscribe to Cryptopolitan Daily and get timely, sharp, and relevant crypto insights straight to your inbox.

Join now and
never miss a move.

Get in. Get the facts.
Get ahead.

Subscribe to CryptoPolitan