“Chain of Thought” Prompting Aids AI in Solving Vocabulary Puzzles

2 mins read May 10, 2024

AI models struggled with complex language puzzles despite some capabilities.
“Chain of thought” prompting boosted GPT-4’s puzzle-solving accuracy to 39%.
The research explored using GPT-4 to create novel language puzzles.

Does AI have the same skill set as humans, and can they find subtle hints in vocabulary words? Researchers at the NYU Tandon School of Engineering are not surprised to see the number one activity performed by people (participation in the daily Connections puzzle) from The New York Times, as mentioned in the paper.

Evaluating AI-language models

The investigation revealed a controversial question for the upcoming IEEE 2024 Milan conference on gaming studies and the sharing of available and common knowledge: Is it possible for modern natural language processing (NLP) techniques to solve language-based puzzles?

With Julian Togelius, Assistant Professor of CSE and Director of Game Innovation Lab at NYU Tandon, as co-author, the team focused on two AI methods – machine learning and high-level representation learning. The first leveraged GPT-3.5 and the last-release GPT-4, the outstanding language models with the open domain and human-like languages sense from OpenAI, is another.

The next mechanism relies on sentence embedding models, especially BERT, RoBERTa, MPNet, and MiniLM. These models represent the semantic data as vector representation but lack the full language understanding and generation skills of the LLMs.

However, the conclusion was drawn that although all AI machines could perform some of the tasks involved in Connections, the challenge remained almost insurmountable. It tends to be better than others in the earlier category, including remarkable achievements such as embedding methods and GPT-3.

One of the key findings here is that the models show a very close connection to the human ability to categorize puzzle difficulties fast from “simple” to “challenging.” LLMs are being used more frequently, and scrutinizing in which contexts they fail regarding the Connections problem can reveal a general restriction in semantic processing of natural language, added Graham Todd, a Ph.D. student in the Game Innovation Lab, the study’s lead author.

Pushing the boundaries with GPT-4

The researchers observed that asking GPT-4 to work out the puzzles through a piecemeal approach greatly enhanced the ability to solve the puzzles, especially with an accuracy of just over 39% of puzzles.

One more proof of what the ‘chain of thought prompts’ are good for, as prior research has shown, now confirmed in our investigation, is that it leads to structured thinking in vocabulary, as stated by Timothy Merino, Ph.D. Undergraduate college student at the Game Innovation Lab and one of the authors of this abstract. The question of language models doing the work and concluding the tasks better is solved well by making them think about the respective work that they are doing. The researchers used an online jigsaw archive with 250 puzzles representing the daily puzzles from June 12th, 2023, to February 16th, 2024.

Don’t just read crypto news. Understand it. Subscribe to our newsletter. It's free.

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Brian Koome

Brian Koome has over seven years of experience in blockchain and cryptocurrency reporting, having been active in the industry since 2017. He has contributed to leading publications, including BlockToday.com. Further, he developed the Ethereum 101 course for BitDegree.org before joining Cryptopolitan as a full-time writer. Brian covers evergreen guides (EGs), deep dives, interviews, and price analysis. His focus on DeFi, blockchain innovation, and emerging crypto projects delights readers.

TABLE OF CONTENT

1. Evaluating AI-language models

2. Pushing the boundaries with GPT-4

Share this article

MORE … NEWS

SHOW ALL

What Is Base? The Ethereum Layer-2 Network Launched by Coinbase

October 21, 2025 Learn Crypto: Beginner Guides
Dogecoin vs. Bitcoin: Key Technical Differences

October 20, 2025 Learn Crypto: Beginner Guides
What Is TVL (Total Value Locked) in Crypto?

October 14, 2025 Learn Crypto: Beginner Guides
How to Read a Crypto Whitepaper?

October 13, 2025 Learn Crypto: Beginner Guides
Ripple vs. XRP vs. XRP Ledger: What’s the Difference?

October 13, 2025 Learn Crypto: Beginner Guides
What Is a Multisig Wallet in Crypto?

October 10, 2025 Learn Crypto: Beginner Guides

DEEP CRYPTO
CRASH COURSE

Which cryptocurrencies can make you money
How to boost your security with a wallet (and which ones are actually worth using)
Little-known investment strategies that the pros use
How to get started investing in crypto (which exchanges to use, the best crypto to buy etc)

“Chain of Thought” Prompting Aids AI in Solving Vocabulary Puzzles

Evaluating AI-language models

Pushing the boundaries with GPT-4

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.
Every day.

“Chain of Thought” Prompting Aids AI in Solving Vocabulary Puzzles

Evaluating AI-language models

Pushing the boundaries with GPT-4

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.Every day.

One sharp brief.
Every day.