“Chain of Thought” Prompting Aids AI in Solving Vocabulary Puzzles

In this post:

  • AI models struggled with complex language puzzles despite some capabilities.
  • “Chain of thought” prompting boosted GPT-4’s puzzle-solving accuracy to 39%.
  • The research explored using GPT-4 to create novel language puzzles.

Does AI have the same skill set as humans, and can they find subtle hints in vocabulary words? Researchers at the NYU Tandon School of Engineering are not surprised to see the number one activity performed by people (participation in the daily Connections puzzle) from The New York Times, as mentioned in the paper. 

Evaluating AI-language models

The investigation revealed a controversial question for the upcoming IEEE 2024 Milan conference on gaming studies and the sharing of available and common knowledge: Is it possible for modern natural language processing (NLP) techniques to solve language-based puzzles? 

With Julian Togelius, Assistant Professor of CSE and Director of Game Innovation Lab at NYU Tandon, as co-author, the team focused on two AI methods – machine learning and high-level representation learning. The first leveraged GPT-3.5 and the last-release GPT-4, the outstanding language models with the open domain and human-like languages sense from OpenAI, is another.

The next mechanism relies on sentence embedding models, especially BERT, RoBERTa, MPNet, and MiniLM. These models represent the semantic data as vector representation but lack the full language understanding and generation skills of the LLMs.

However, the conclusion was drawn that although all AI machines could perform some of the tasks involved in Connections, the challenge remained almost insurmountable. It tends to be better than others in the earlier category, including remarkable achievements such as embedding methods and GPT-3.

One of the key findings here is that the models show a very close connection to the human ability to categorize puzzle difficulties fast from “simple” to “challenging.” LLMs are being used more frequently, and scrutinizing in which contexts they fail regarding the Connections problem can reveal a general restriction in semantic processing of natural language, added Graham Todd, a Ph.D. student in the Game Innovation Lab, the study’s lead author.

Pushing the boundaries with GPT-4

The researchers observed that asking GPT-4 to work out the puzzles through a piecemeal approach greatly enhanced the ability to solve the puzzles, especially with an accuracy of just over 39% of puzzles.

One more proof of what the ‘chain of thought prompts’ are good for, as prior research has shown, now confirmed in our investigation, is that it leads to structured thinking in vocabulary, as stated by Timothy Merino, Ph.D. Undergraduate college student at the Game Innovation Lab and one of the authors of this abstract. The question of language models doing the work and concluding the tasks better is solved well by making them think about the respective work that they are doing. The researchers used an online jigsaw archive with 250 puzzles representing the daily puzzles from June 12th, 2023, to February 16th, 2024.

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Share link:

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Related News

Subscribe to CryptoPolitan