Large language models (LLMs) have demonstrated remarkable abilities, capable of producing poetry and crafting functioning computer code, despite being fundamentally trained to predict the next word in a sequence of text. These capabilities may lead some to believe that these models possess an intrinsic understanding of broader truths about the world around us. However, a recent study challenges this assumption, revealing that a well-known type of generative AI model can give near-perfect driving directions in New York City without actually developing an accurate mental representation of the city’s layout.
Interestingly, while the model demonstrates exceptional navigational prowess, its performance significantly deteriorates when researchers close off certain streets and introduce detours. The investigation uncovered that the mental maps constructed by the model contained numerous nonexistent streets winding between the grid and connecting distant intersections. This raises important concerns about applying generative AI in real-world scenarios; a model that appears to function well in one environment might encounter serious limitations when faced with even slight changes in context or tasks.
Senior author Ashesh Rambachan, an assistant professor of economics and principal investigator at the MIT Laboratory for Information and Decision Systems (LIDS), discusses this issue, underlining the crucial nature of understanding whether LLMs are genuinely learning coherent models of the world. “One hope is that, because LLMs can accomplish all these amazing things in language, maybe we could use these same tools in other segments of science. However, unraveling whether LLMs are grasping coherent world models is vital if we wish to employ these techniques to drive new discoveries,” he states. Rambachan collaborated on this research with key figures including lead author Keyon Vafa, a postdoc at Harvard University; Justin Y. Chen, an MIT graduate student specializing in electrical engineering and computer science; Jon Kleinberg, a professor at Cornell University; and Sendhil Mullainathan, an MIT professor with dual appointments in EECS and Economics and a member of LIDS. Their findings are set to be presented at the Conference on Neural Information Processing Systems.
The researchers concentrated on transformer models, the foundation of LLMs like GPT-4. These transformers are trained on vast linguistic datasets to predict the subsequent token in a sequence, such as the next word in a sentence. However, the researchers argue that merely assessing the accuracy of these predictions is insufficient for determining whether an LLM has formed a true understanding of the world.
For instance, they discovered that a transformer can predict valid moves in a game of Connect 4 with impressive accuracy while lacking any comprehension of the underlying rules. To probe further, they developed two new metrics aimed at evaluating a transformer’s conceptual world model. They applied these metrics to deterministic finite automations (DFAs) — problems characterized by a sequence of states, like the intersections to be navigated while traveling, along with specific rules to adhere to. The team selected two test cases, navigating the streets of New York City and playing Othello.
The first metric they established, termed sequence distinction, posits that a model possesses a coherent world model if it can differentiate between two distinct states, such as two different Othello boards. The second metric, sequence compression, suggests that a model with a coherent world model should recognize that identical states, like two identical Othello boards, yield the same sequence of potential next moves.
These metrics were utilized to evaluate two common transformer types—one trained on randomly generated sequences and the other on data resulting from specific strategies. Surprisingly, the findings indicated that transformers making random choices might develop more accurate world models, potentially due to exposure to a broader array of possible next steps during training. Vafa elaborates, “In Othello, observing two random computers play can reveal all possible moves, including the less strategic choices that championship players might avoid.”
While the transformers produced accurate navigation directions and valid Othello moves in most instances, the new metrics highlighted significant gaps. Only one model managed to demonstrate a coherent world model for Othello, while none succeeded in forming an accurate model in the navigation scenario.
The researchers illustrated the practical implications of their findings by introducing detours into the New York City map. This modification led to a drastic decline in the performance of all navigation models. Vafa noted, “I was astonished by how quickly performance declined with the addition of a detour. Closing just 1% of the available streets resulted in an accuracy drop from almost 100% to a mere 67%.” The city maps produced by the models frequently resembled a fantastical version of New York, populated with numerous fictitious roads intersecting chaotically across the grid.
These results underscore the notion that transformers can achieve remarkable task performance without truly understanding the governing principles. To engineer LLMs capable of capturing accurate world models, the research underscores the necessity for a reevaluation of approaches. Rambachan stresses, “We often marvel at these models’ impressive outputs and assume they must possess some understanding of reality. It’s critical to navigate these assumptions carefully, rather than relying solely on our intuitions.”
Moving forward, the researchers aim to explore a wider array of problems, particularly those where rules may not be entirely known. They also intend to apply their evaluative metrics to real-world scientific challenges. This research benefits from support from the Harvard Data Science Initiative, a National Science Foundation Graduate Research Fellowship, a Vannevar Bush Faculty Fellowship, a Simons Collaboration grant, and a grant from the MacArthur Foundation.
Source link