Does AI have the same skill set as humans, and can they find subtle hints in vocabulary words? Researchers at the NYU Tandon School of Engineering are not surprised to see the number one activity performed by people (participation in the daily Connections puzzle) from The New York Times, as mentioned in the paper.
Evaluating AI-language models
The investigation revealed a controversial question for the upcoming IEEE 2024 Milan conference on gaming studies and the sharing of available and common knowledge: Is it possible for modern natural language processing (NLP) techniques to solve language-based puzzles?
With Julian Togelius, Assistant Professor of CSE and Director of Game Innovation Lab at NYU Tandon, as co-author, the team focused on two AI methods – machine learning and high-level representation learning. The first leveraged GPT-3.5 and the last-release GPT-4, the outstanding language models with the open domain and human-like languages sense from OpenAI, is another.
The next mechanism relies on sentence embedding models, especially BERT, RoBERTa, MPNet, and MiniLM. These models represent the semantic data as vector representation but lack the full language understanding and generation skills of the LLMs.
However, the conclusion was drawn that although all AI machines could perform some of the tasks involved in Connections, the challenge remained almost insurmountable. It tends to be better than others in the earlier category, including remarkable achievements such as embedding methods and GPT-3.
One of the key findings here is that the models show a very close connection to the human ability to categorize puzzle difficulties fast from “simple” to “challenging.” LLMs are being used more frequently, and scrutinizing in which contexts they fail regarding the Connections problem can reveal a general restriction in semantic processing of natural language, added Graham Todd, a Ph.D. student in the Game Innovation Lab, the study’s lead author.
Pushing the boundaries with GPT-4
The researchers observed that asking GPT-4 to work out the puzzles through a piecemeal approach greatly enhanced the ability to solve the puzzles, especially with an accuracy of just over 39% of puzzles.
One more proof of what the ‘chain of thought prompts’ are good for, as prior research has shown, now confirmed in our investigation, is that it leads to structured thinking in vocabulary, as stated by Timothy Merino, Ph.D. Undergraduate college student at the Game Innovation Lab and one of the authors of this abstract. The question of language models doing the work and concluding the tasks better is solved well by making them think about the respective work that they are doing. The researchers used an online jigsaw archive with 250 puzzles representing the daily puzzles from June 12th, 2023, to February 16th, 2024.
From Zero to Web3 Pro: Your 90-Day Career Launch Plan