Santa Fe scholar: Does the AI language model really understand human language?

Original Mitchella and other intelligence clubs

introduction

Although the big language model shows a similar understanding ability to human beings, can the AI system really understand the language like human beings? Must the pattern of machine understanding be the same as that of human understanding? Recently, kracauer, former director of the Santa Fe Institute, and melani Michel, a researcher, published an article in PNAS to explore whether large-scale pre-training language models (LLMs) can understand languages and their coded physical and social situations in a similar way to humans.

This paper discusses the pros and cons respectively, and further discusses the key issues of broader intelligent science. In the author’s opinion, further expanding the interdisciplinary research between artificial intelligence and natural science is expected to broaden the perspective of multi-discipline, summarize the advantages and boundaries of different methods, and meet the challenge of the integration of cross-cognitive concepts.

Keywords: artificial intelligence, large language model, mental model

Melanie Mitchell A, David C. Krakauera | Author

Fan Siyu and Zhang Ji | Translator

Liang Jin | Editor

Title of the article:

The debate over understanding in AI’s large language models

Article address:

https://www.pnas.org/doi/10.1073/pnas.2215907120

What is "understanding"? This problem has long attracted the attention of philosophers, cognitive scientists and educators. The classical research on "understanding" is almost always based on human beings and other animals. However, with the rise of large-scale artificial intelligence systems, especially large-scale language models, there has been a heated discussion in the AI community: can machines understand natural languages now, so as to understand the physical and social situations described by languages?

This discussion is not limited to the category of natural science; The degree and way that machines understand our world determines to what extent we can trust the robust and transparent behavior ability of AI in the task of interacting with human beings, including AI driving cars, AI diagnosing diseases, AI caring for the elderly, AI educating children and so on. At the same time, the current discussion shows the key problem for an intelligent system to "understand": how to distinguish statistical correlation and causal mechanism?

Although the AI system shows seemingly intelligent behavior in many specific tasks, until recently, the artificial intelligence research community still generally believed that machines could not understand the data they processed like humans. For example, face recognition software does not understand that the face is a part of the body, the role of facial expressions in social interaction, what it means to "face" unpleasant situations, or the ways and means of making faces. Similarly, speech-to-text and machine translation programs do not understand the language they handle, and the automatic driving system does not understand the micro-expressions and body language of drivers and pedestrians when avoiding accidents. Therefore, these AI systems are often regarded as fragile, and the key evidence of lack of "understanding" is that they are unpredictable errors and lack of robustness in generalization ability [1].

Does the big language model really understand language?

However, in the past few years, the situation has changed. A new type of AI system has been popular in the research field and has had an impact, which has changed some people’s prospects and views on machine understanding languages. These systems are called large language models (LLMs), large pre-training models or basic models [2]. They are deep neural networks with billions to trillions of parameters (weights), which are "pre-trained" on a huge natural language corpus of several terabytes, including a large number of network snapshots, online books and other contents. During training, the task of these networks is to predict the hidden part of the input sentence. This method is called "self-supervised learning". The final network is a complex statistical model of the correlation between words and phrases in its training data.

These models can be used to generate natural language, fine-tune specific language tasks [3], or further train to better match the "user’s intention" [4]. For example, LLMs such as OpenAI’s famous GPT-3[5], more recently ChatGPT[6] and Google’s PaLM[7] can produce amazing human-like texts and dialogues; In addition, although these models are not trained for the purpose of reasoning, some studies think that they have human-like reasoning ability [8]. How LLMs accomplished these feats is a mystery to ordinary people and scientists. Most of the internal operation modes of these networks are opaque, and even the researchers who built them have only a little intuitive feeling about such a huge-scale system. Neuroscientist Terrence Sejnowski described the appearance of LLM in this way: "Singularity arrival, like whispers, came one after another, speaking four dialects. The only thing we know is that LLMs are not human beings … Some of their behaviors seem to be intelligent, but if they are not human intelligence, what is it? " [9]

Although the most advanced LLMs are impressive, they are still prone to vulnerabilities and mistakes that are not like human beings. However, such network defects are significantly improved when the number of parameters and the scale of training data set are enlarged [10], so some researchers think that LLMs (or its multimodal version) will realize human-level intelligence and understanding ability under a sufficiently large network and training data set, and a new slogan of AI appears: "Scale is everything" [11, 12].

The above proposition is a school of AI academic circles in LLMs discussion. Some people think that these networks really understand language and can reason in a universal way (although "not yet" up to human level). For example, Google’s LaMDA system constructs a fluent dialogue system by pre-training the text and then fine-tuning the dialogue [13], and an AI researcher even thinks that such a system "has the ability to truly understand a large number of concepts" [14] and even "moves in a conscious direction" [15]. Another machine language expert regards LLMs as the touchstone leading to general human level AI: "Some optimistic researchers believe that we have witnessed the birth of a knowledge injection system with a certain universal intelligence" [16]. Others believe that LLMs probably captures important aspects of meaning, and its working mode is similar to a striking explanation of human cognition, that is, meaning comes from conceptual roles. ”[17]。 Opponents were labeled as "AI Denialism" [18].

On the other hand, some people think that although the output of large-scale pre-training models such as GPT-3 or LaMDA is fluent, they still can’t understand because they have no world experience or thinking mode; The text prediction training of LLMs only learned the form of language, not the meaning [19-21]. A recent article holds that "even if we train until the universe dies, the systems trained only by language will never approach human intelligence, and these systems are doomed to have only superficial understanding and will never approach the comprehensiveness of our thinking" [22]. Some scholars believe that it is wrong to apply the concepts of "intelligence", "agent" and "understanding" to LLMs, because LLMs is more similar to libraries or encyclopedias, and it is packaging human knowledge repositories instead of agents [23]. For example, humans know that tickling makes us laugh because we have bodies. LLMs can use the word "tickle", but it has obviously never felt this way. Understanding tickling is not a mapping between two words, but a mapping between words and feelings.

Those who hold the position of "LLMs can’t really understand" think that what surprises us is not the fluency of LLMs itself, but the fact that the fluency is beyond intuition with the growth of model scale. Anyone who attributes understanding or consciousness to LLMs is a victim of the Eliza effect [24]. "Eliza effect" means that we humans tend to attribute our understanding and agency ability to machines with even faint signs of human language or behavior. It is named after the chat robot "Eliza" developed by Joseph Weizenbaum in the 1960s. Although it is very simple, it still deceives people into believing that it understands them [25].

A survey of active scholars in the field of natural language processing in 2022 also confirmed the differences of views in this discussion. One of the contents of the survey is to ask the respondents whether they agree with the following statement about whether LLMs understands language in principle: "Some generative models (language models) that are only trained on text can understand natural language in some extraordinary sense given sufficient data and computing resources." The answers of 480 people were almost half (51%) to half (49%) [26].

Supporters’ evidence that LLMs has understanding ability is mainly based on the performance of model ability: both the subjective quality judgment of the text generated by the model according to the prompt words (although this judgment may be easily influenced by Eliza effect) and the objective evaluation in the benchmark data set used to evaluate the language understanding and reasoning ability. For example, two commonly used benchmark data sets for evaluating LLMs are General Language Understanding Assessment (GLUE)[27] and its successor SuperGLUE[28], which include large-scale data sets and tasks, such as "Text Implication" (given two sentences, can the meaning of the second sentence be inferred from the first sentence? Do the given words have the same meaning in two different sentences? ) and logical answers, etc. OpenAI’s GPT-3 (with 175 billion parameters) performs unexpectedly well in these tasks [5], while Google’s PaLM (with 540 billion parameters) performs better in these tasks [7], which can reach or even surpass human performance in the same task.

Does machine understanding have to reproduce human understanding?

What are the implications of these results for LLMs? From the choice of terms such as generalized language understanding, natural language reasoning, reading comprehension and common sense reasoning, it is not difficult to see that the test of the above benchmark data set implies the premise that the machine must reproduce the way of human understanding. But is this necessary for "understanding"? Not necessarily. Take the benchmark evaluation of "reasoning and understanding task" as an example [29], in each task example, a natural language "argument" and two declarative sentences will be given; The task is to determine which statement is consistent with the argument, as shown in the following example:

Argument: criminals should have the right to vote. A person who stole a car at the age of 17 should not be deprived of the right to become a full citizen for life.

Inference A: Stealing a car is a felony.

Inference B: Stealing a car is not a felony.

BERT achieved a performance similar to that of human beings in this benchmark task [31]. Perhaps we can draw the conclusion that BERT can understand natural language like human beings. However, a research team found that some clue words (such as "not") appearing in inference sentences can help the model predict the correct answer. When researchers change data sets to avoid these clues, BERT’s performance becomes no different from random guessing. This is an obvious example of relying on shortcut learning-a phenomenon that is often mentioned in machine learning, that is, the learning system obtains good performance on a specific benchmark task by analyzing the pseudo-correlation in the data set, rather than through humanlike understanding [32-35].

Usually, this correlation is not obvious to humans who perform the same task. Although shortcut learning has been found in the task of evaluating language understanding and other artificial intelligence models, there may still be many undiscovered "shortcuts". Pre-training language models, such as LaMDA and PaLM of Google, which have hundreds of billions of parameters and train on nearly trillions of text data, have strong ability to encode data correlation. Therefore, the benchmark task used to evaluate human understanding ability may not be applicable to this kind of model evaluation [36-38]. For large-scale LLMs (and its possible derivative models), the complex statistical correlation calculation can make the model bypass the human-like understanding ability and obtain a nearly perfect model performance.

Although there is no strict definition of the word "human-like understanding", it is not based on the huge statistical model that LLMs has learned at present. On the contrary, it is based on concepts-the internal mental model of external categories, situations and events, as well as the internal mental model of human beings’ own internal state and "self". For human beings, understanding language (and other nonverbal information) depends on mastering concepts other than language (or other information) expression, and is not limited to understanding the statistical properties of language symbols. In fact, in the past research history in the field of cognitive science, we have always emphasized the understanding of the essence of concepts and how understanding comes from concepts that are clear and hierarchical and contain potential causality. This understanding model helps human beings abstract past knowledge and experience to make steady prediction, generalization and analogy; Or conduct combinatorial reasoning and counterfactual reasoning; Or actively intervene in the real world to test hypotheses; Or explain what you understand to others.

Undoubtedly, although some LLMs with larger and larger scale sporadically show similar human understanding ability, the current artificial intelligence system does not have these abilities, including the most advanced LLMs. Some people think that this kind of understanding ability can give human beings the ability that pure statistical models can’t get. Although the large-scale model shows extraordinary formal linguistic competence, that is, the ability to produce grammatical fluency and human-like language, it still lacks human-like functional language competence based on conceptual understanding, that is, the ability to correctly understand and use language in the real world. Interestingly, there is a similar phenomenon in physics research, that is, the contradiction between the successful application of mathematical techniques and this functional understanding ability. For example, a long-standing controversy about quantum mechanics is that it provides an effective calculation method without conceptual understanding.

Understanding the essence of concepts has always been one of the topics of academic debate. To what extent the concept is domain-specific and innate, rather than more universal and learned [55-60], or to what extent the concept is based on concrete metaphor and presented in the brain through dynamic and situation-based simulation [64], or under what conditions the concept is supported by language [65-67], social learning [68-70] and culture [

Despite the above arguments, concepts-which exist in the form of causal mental models as mentioned above-have always been regarded as the understanding unit of human cognitive ability. Undoubtedly, looking at the development track of human understanding ability, whether it is individual understanding or collective understanding, it can be abstracted as a highly compressed model based on causality, similar to Ptolemy’s theory of planetary revolution, Kepler’s theory of elliptical orbit, and Newton’s concise and causal explanation of planetary motion according to gravity. Different from machines, human beings seem to have a strong internal drive to pursue this form of understanding in scientific research and daily life. We can describe this kind of motivation as requiring little data, minimal model, clear causal dependence and strong mechanical intuition.

The debate about LLMs’ comprehension ability mainly focuses on the following aspects:

1) Is the understanding ability of these model systems just a kind of error? (that is, the connection between language symbols is confused with the connection between symbols and physical, social or mental experiences). In short, will these model systems never gain human-like understanding?

Or, conversely, 2) Will these model systems (or their recent derivative models) really create a large number of concept-based mental models that are essential for human understanding without real-world experience? If so, will increasing the scale of the model create a better concept?

Or, 3) if these model systems can’t create such concepts, can their unimaginable huge statistical correlation systems produce the ability equivalent to human understanding? Or, does this mean that a new form of higher-order logical ability that humans can’t reach is possible? From this point of view, is it still appropriate to call this correlation "pseudo-correlation" or question the phenomenon of "shortcut learning"? Is it feasible to regard the behavior of the model system as a series of emerging and non-human understanding activities, rather than "no understanding ability"? These problems are no longer limited to abstract philosophical discussion, but involve the very realistic concerns about the ability, robustness, security and ethics brought about by the increasingly important role played by artificial intelligence systems in human daily life.

Although various schools of researchers have their own opinions on the debate on "LLMs comprehension ability", the cognitive science-based methods currently used to gain understanding insight are not enough to answer such questions about LLMs. In fact, some researchers have applied psychological tests to LLMs, which were originally used to evaluate human understanding and reasoning mechanisms. It is found that LLMs does show human-like reactions in theory of mind tests [14, 75] and human-like abilities and preferences in reasoning evaluation [76–78] in some cases. Although this kind of test is considered as an alternative test to evaluate human universal ability, it may not be the case for artificial intelligence model systems.

A new understanding ability

As mentioned earlier, LLMs has an unexplained ability to learn the correlation between information symbols in training data and input, and can use this correlation to solve problems. In contrast, humans seem to have applied compressed concepts that reflect their real-world experiences. When psychological tests designed for human beings are applied to LLMs, the interpretation results often depend on the assumptions of human cognition, which may not be correct at all for the model. In order to make progress, scientists need to design new benchmark tasks and research methods to deeply understand different types of intelligence and understanding mechanisms, including the new form of "exotic, mind-like entities" that we have created [79]. Perhaps we are on the right path to explore the essence of "understanding" [80, 81].

With the increasing discussion about LLMs’ understanding ability and the emergence of more capable model systems, it seems that it is necessary to strengthen the research on intelligent science in the future in order to understand the concept of human and machine more widely. As the neuroscientist Terrence Sejnowski pointed out, "The differences among experts on LLMs intelligence show that our traditional concept based on natural intelligence is not sufficient. [9] "If LLMs and other models successfully make use of strong statistical correlation, it may also be considered as a new" understanding "ability, which can realize extraordinary and superhuman prediction ability. For example, DeepMind’s AlphaZero and AlphaFold model systems [82, 83] seem to bring an intuitive form of "alien" to the fields of chess and protein’s structure prediction [84, 85] respectively.

Therefore, it can be said that in recent years, machines with emerging understanding modes have appeared in the field of artificial intelligence, which may be a new species in a larger zoo of related concepts. With the research progress made in the process of pursuing the essence of intelligence, these new understanding modes will emerge continuously. Just as different species adapt to different environments, our intelligent system will be better adapted to different problems. Problems that rely on a lot of historical encoded knowledge (emphasizing model performance) will continue to favor large-scale statistical models, such as LLMs, while those that rely on limited knowledge and strong causal mechanism will prefer human intelligence. The challenge in the future is to develop new research methods to reveal the understanding mechanism of different forms of intelligence in detail, distinguish their advantages and limitations, and learn how to integrate these different cognitive models.

references

[1]M. Mitchell, Arti?cial intelligence hits the barrier of meaning. Information 10, 51 (2019).

[2]R. Bommasani et al., On the opportunities and risks of foundation models. arXiv [Preprint] (2021). http://arxiv.org/abs/2108.07258(Accessed 7 March 2023).

[3]B. Min et al., Recent advances in natural language processing via large pre-trained language models: A survey. arXiv [Preprint] (2021). http://arxiv.org/abs/2111.01243(Accessed 7 March 2023).

[4]L. Ouyang et al., Training language models to follow instructions with human feedback. arXiv [Preprint] (2022). http://arxiv.org/abs/2203.02155(Accessed 7 March 2023).

[5]T. Brown et al., Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877-1901 (2020).

[6]J.Schulman et al., ChatGPT: Optimizing language models for dialogue. UpToDate (2022). https://openai.com/blog/chatgpt. Accessed 7 March 2023.

[7]A. Chowdhery et al., PaLM: Scaling language modeling with Pathways. arXiv [Preprint] (2022). http://arxiv.org/abs/2204.02311(Accessed 7 March 2023).

[8]J. Wei et al., Chain of thought prompting elicits reasoning in large language models (2022). http://arxiv.org/abs/2201.11903(Accessed 7 March 2023).

[9]T. Sejnowski, Large language models and the reverse Turing test. arXiv [Preprint] (2022). http://arxiv.org/abs/2207.14382(Accessed 7 March 2023).

[10]J. Wei et al., Emergent abilities of large language models. arXiv [Preprint] (2022). http://arxiv.org/abs/2206.07682(Accessed 7 March 2023).

[11]N. de Freitas, 14 May 2022. https://twitter.com/NandoDF/status/1525397036325019649. Accessed 7 March 2023.

[12]A. Dimakis, 16 May 2022. https://twitter.com/AlexGDimakis/status/1526388274348150784. Accessed 7 March 2023.

[13]R. Thoppilan et al., LaMDA: Language models for dialog applications. arXiv [Preprint] (2022). http://arxiv.org/abs/2201.08239(Accessed 7 March 2023).

[14]B. A. y Arcas, Do large language models understand us? UpToDate (2021). http://tinyurl.com/38t23n73. Accessed 7 March 2023.

[15]B. A. y Arcas, Arti?cial neural networks are making strides towards consciousness. UpToDate (2022). http://tinyurl.com/ymhk37uu. Accessed 7 March 2023.

[16]C. D. Manning, Human language understanding and reasoning. Daedalus 151, 127– 138 (2022).

[17]S. T. Piantasodi, F. Hill, Meaning without reference in large language models. arXiv [Preprint] (2022). http://arxiv.org/abs/2208.02957(Accessed 7 March 2023).

[18]B. A. y Arcas, Can machines learn how to behave? UpToDate (2022). http://tinyurl.com/mr4cb3dw(Accessed 7 March 2023).

[19]E. M. Bender, A. Koller, Climbing towards NLU: On meaning, form, and understanding in the age of data" in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 5185–5198.

[20]E. M. Bender, T. Gebru, A. McMillan-Major, S. Shmitchell, On the dangers of stochastic parrots: Can language models be too big? in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021), pp. 610–623.

[21]G. Marcus, Nonsense on stilts. Substack, 12 June 2022. https://garymarcus.substack.com/p/nonsense-on-stilts.

[22]J. Browning, Y. LeCun, AI and the limits of language. UpToDate (2022)https://www.noemamag.com/ai-and-the-limits-of-language. Accessed 7 March 2023.

[23]A. Gopnik, What AI still doesn’t know how to do. UpToDate (2022).https://www.wsj.com/articles/what-ai-still-doesnt-know-how-to-do-11657891316. Accessed 7 March 2023.

[24]D. R. Hofstadter, Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought (Basic Books, Inc., New York, NY, 1995).

[25]J. Weizenbaum, Computer Power and Human Reason: From Judgment to Calculation (WH Freeman & Co, 1976).

[26]J. Michael et al., What do NLP researchers believe? Results of the NLP community metasurvey. arXiv [Preprint] (2022).http://arxiv.org/abs/2208.12852(Accessed 7 March 2023).

[27]A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding"in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Association for Computational Linguistics, 2018), pp. 353-355.

[28]A. Wang et al., SuperGLUE: A stickier benchmark for general-purpose language understanding systems. Adv. Neural Inf. Process. Syst. 32, 3266–3280 (2019).

[29]I. Habernal, H. Wachsmuth, I. Gurevych, B. Stein, "The argument reasoning comprehension task: Identfication and reconstruction of implicit warrants" in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2018), pp. 1930–1940.

[30]J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding" in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019), pp. 4171–4186.

[31]T. Niven, H.-Y. Kao, Probing neural network comprehension of natural language arguments" in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019), pp. 4658–4664.

[32]R. Geirhos et al., Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

[33]S. Gururangan et al., "Annotation artifacts in natural language inference data" in Proceedings of the 2018 Conference of the North American Chapter of the Association for C omputational Linguistics: Human Language Technologies (2018), pp. 107–112.

[34]S. Lapuschkin et al., Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1–8 (2019).

[35]R T. McCoy, E. Pavlick, T. Linzen, "Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference" in Proceedings of the 57th Annual Meeting of the Associat ion for Computational Linguistics (2019), pp. 3428–3448.

[36]S. R. Choudhury, A. Rogers, I. Augenstein, Machine reading, fast and slow: When do models ‘understand’language? arXiv [Preprint] (2022). http://arxiv.org/abs/2209.07430(Accessed 7 March 2023).

[37]M. Gardner et al., "Competency problems: On finding and removing artifacts in language data" in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021).

[38]T. Linzen, How can we accelerate progress towards human-like linguistic generalization? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 5210–5217.

[39]C. Baumberger, C. Beisbart, G. Brun, "What is understanding? An overview of recent debates in epistemology and philosophy of science" in Explaining Understanding: New Perspectives from Epistemology and Philosophy ofScience (Routledge, 2017), pp. 1–34.

[40]J. L. Kvanvig, "Knowledge, understanding, and reasons for belief" in The Oxford Handbook of Reasons and Normativity (Oxford University Press, 2018), pp. 685–705.

[41]M. B. Goldwater, D. Gentner, On the acquisition of abstract knowledge: Structural alignment and explication in learning causal system categories. Cognition 137, 137–153 (2015).

[42]A. Gopnik, “Causal models and cognitive development”in Probabilistic and Causal Inference: The Works ofJudea Pearl, H. Geffner, R. Dechter, J. Y. Halpern, Eds. (Association for Computing Machinery, 2022), pp. 593–604.

[43]D. R. Hofstadter, E. Sander, Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Basic Books (2013).

[44]F. C. Keil, Explanation and understanding. Ann. Rev. Psychol. 57, 227 (2006).

[45]M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Gershman, Building machines that learn and think like people. Behav. Brain Sci. 40 (2017).

[46]S. A. Sloman, D. Lagnado, Causality in thought. Ann. Rev. Psychol. 66, 223–247 (2015).

[47]P. Smolensky, R. McCoy, R. Fernandez, M. Goldrick, J. Gao, Neurocompositional computing: From the central paradox of cognition to a new generation of AI systems. AI Mag. 43, 308–322 (2022).

[48]H. W. De Regt, Discussion note: Making sense of understanding. Philos. Sci. 71, 98–109 (2004).

[49]D. George, M. Lázaro-Gredilla, J. S. Guntupalli, From CAPTCHA to commonsense: How brain can teach us about arti?cial intelligence. Front. Comput. Neurosci. 14, 554097 (2020).

[50]B. M. Lake, G. L. Murphy, Word meaning in minds and machines. Psychol. Rev. (2021).

[51]J. Pearl, Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv [Preprint] (2018).http://arxiv.org/abs/1801.04016(Accessed 7 March 2023).

[52]M. Strevens, No understanding without explanation. Stud. Hist. Philos. Sci. A. 44, 510–515 (2013).

[53]K. Mahowald et al., Dissociating language and thought in large language models: a cognitive perspective. arXiv [Preprint] (2023).http://arxiv.org/abs/2301.06627(Accessed 7 March 2023).

[54]D. C. Krakauer, At the limits of thought. UpToDate (2020).https://aeon.co/essays/will-brains-or-algorithms-rule-the-kingdom-of-science. Accessed 7 March 2023.

[55]S. Carey, “On the origin of causal understanding”in Causal Cognition: A Multidisciplinary Debate, D. Sperber, D. Premack, A. J. Premack, Eds. (Clarendon Press/Oxford University Press, 1995), pp. 268–308.

[56]N. D. Goodman, T. D. Ullman, J. B. Tenenbaum, Learning a theory of causality. Psychol. Rev. 118, 110 (2011).

[57]A. Gopnik, A uni?ed account of abstract structure and conceptual change: Probabilistic models and early learning mechanisms. Behav. Brain Sci. 34, 129 (2011).

[58]J. M. Mandler, How to build a baby: II. Conceptual primitives. Psychol. Rev. 99, 587 (1992).

[59]E. S. Spelke, K. D. Kinzler, Core knowledge. Dev. Sci. 10, 89–96 (2007).

[60]H. M. Wellman, S. A. Gelman, Cognitive development: Foundational theories of core domains. Ann. Rev. Psychol. 43, 337–375 (1992).

[61]R. W. Gibbs, Metaphor Wars (Cambridge University Press, 2017).

[62]G. Lakoff, M. Johnson, The metaphorical structure of the human conceptual system. Cogn. Sci. 4, 195–208 (1980).

[63]G. L. Murphy, On metaphoric representation. Cognition 60, 173–204 (1996).

[64]L. W. Barsalou et al., Grounded cognition. Ann. Rev. Psychol. 59, 617–645 (2008).

[65]J. G. De Villiers, P. A. de Villiers, The role of language in theory of mind development. Topics Lang. Disorders 34, 313–328 (2014).

[66]G. Dove, More than a scaffold: Language is a neuroenhancement. Cogn. Neuropsychol. 37, 288–311 (2020).

[67]G. Lupyan, B. Bergen, How language programs the mind. Topics Cogn. Sci. 8, 408–424 (2016).

[68]N. Akhtar, M. Tomasello, “The social nature of words and word learning”in Becoming a Word Learner: A Debate on Lexical Acquisition (Oxford University Press, 2000), pp. 115– 135.

[69]S. R. Waxman, S. A. Gelman, Early word-learning entails reference, not merely associations. Trends Cogn. Sci. 13, 258–263 (2009).

[70]S. A. Gelman, Learning from others: Children’s construction of concepts. Ann. Rev. Psychol. 60, 115– 140 (2009).

[71]A. Bender, S. Beller, D. L. Medin, “Causal cognition and culture”in The Oxford Handbook of Causal Reasoning (Oxford University Press, 2017), pp. 717–738.

[72]M. W. Morris, T. Menon, D. R. Ames,”Culturally conferred conceptions of agency: A key to social perception of persons, groups, and other actors”in Personality and Social Psychology Review (Psychology Press, 2003), pp. 169–182.

[73]A. Norenzayan, R. E. Nisbett, Culture and causal cognition. Curr. Direc. Psychol. Sci. 9, 132– 135 (2000).

[74]A. Gopnik, H. M. Wellman, “The theory theory”in Domain Speci?city in Cognition and Culture (1994), pp. 257–293.

[75]S. Trott, C. Jones, T. Chang, J. Michaelov, B. Bergen, Do large language models know what humans know? arXiv [Preprint] (2022). http://arxiv.org/abs/2209.01515(Accessed 7 March 2023).

[76]M. Binz, E. Schulz, Using cognitive psychology to understand GPT-3. arXiv [Preprint] (2022). http://arxiv.org/abs/2206.14576(Accessed 7 March 2023).

[77]I. Dasgupta et al., Language models show human-like content effects on reasoning. arXiv [Preprint] (2022). http://arxiv.org/abs/2207.07051(Accessed 7 March 2023).

[78]A. Laverghetta, A. Nighojkar, J. Mirzakhalov, J. Licato, “Predicting human psychometric properties using computational language models”in Annual Meeting of the Psychometric Society (Springer, 2022), pp. 151–169.

[79]M. Shanahan, Talking about large language models. arXiv [Preprint] (2022). http://arxiv.org/abs/2212.03551(Accessed 7 March 2023).

[80]B. Z. Li, M. Nye, J. Andreas, “Implicit representations of meaning in neural language models” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (2021), pp. 1813–1827.

[81]C. Olsson et al., In-context learning and induction heads. arXiv [Preprint] (2022). http://arxiv.org/abs/2209.11895(Accessed 7 March 2023).

[82]J. Jumper et al., Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

[83]D. Silver et al., Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv [Preprint] (2017).http://arxiv.org/abs/1712.01815(Accessed 7 March 2023).

[84]D. T. Jones, J. M. Thornton, The impact of AlphaFold2 one year on. Nat. Methods 19, 15–20 (2022).

[85]M. Sadler, N. Regan, Game changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise ofAI. Alkmaar (New in Chess, 2019).

Original title: "Santa Fe Scholars: Does the AI ? ? big language model really understand human language? 》

Read the original text