Citation: | YUAN Yulin. ChatGPT and Other Large Language Models: Their Language Processing Mechanisms and Their Theoretical Implications[J]. Journal of Foreign Languages, 2024, 47(4): 2-14. |
This paper briefly explains the language processing mechanisms, mathematical foundations and theoretical implications of ChatGPT and other modern large language models. Firstly, it demonstrates the performance of large language models in semantic understanding and common sense reasoning by testing ChatGPT’s understanding of ambiguous sentences. Secondly, it introduces the transformer, which is equipped with what is referred to as multi-headed attention (MHA) and functions as a novel module of these large language models. Additionally, it presents word embedding and real-valued vector representations based on distribution at semantics, as well as the role of word vectors in language processing and analogical reasoning. Thirdly, it details how transformers successfully predict the next word and generate appropriate texts by tracing and passing on syntactic and semantic relationship information between words through multi-headed attention (MHA) and feed forward network (FFN). Finally, it provides an overview of the training methods of large language models and shows how they use the method of “recreating a language” to help us re-assure relevant design features (including: distributivity and predictability) of human natural languages and to inspire us to re-examine various syntactic, semantic theories that have been developed and formulated so far.
[1] |
Bernard, T. & T. Han. Mandarinograd: A Chinese collection of Winograd schemas[C]// European Language Resources Association (ELRA). Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Marseille: European Language Resources Association, 2020.
|
[2] |
Brown, T. B., Mann, B. & N. Ryder. et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, (33): 1877−1901.
|
[3] |
Chomsky, N. Syntactic Structure[M]. The Hague: Mouton. 邢公畹等, 译. 北京: 中国社会科学出版社, 1979.
|
[4] |
Chomsky, N. Syntactic Structure[M]. The Hague: Mouton. 陈满华, 译. 北京: 商务印书馆, 2022.
|
[5] |
Dai, D., Dong, L. & Y. Hao. et al. Knowledge neurons in pretrained transformers[J]. arXiv preprint arXiv, 2021, 2104. 08696.
|
[6] |
Firth, J. R. A synopsis of linguistic theory 1930−1955[C]// Firth, J. R. & F. R. Palmer. Selected Papers of J. R. Firth 1952-1959. London: Longman, 1968.
|
[7] |
Geva, M., Schuster, R. , Berant, J., & O. Levy. Transformer feed-forward layers are key-value memories[J]. arXiv preprint arXiv, 2020, 2012.14913.
|
[8] |
Harris, Z. S. Distributional structure[J]. Word, 1954, 10(2−3): 146−162. doi: 10.1080/00437956.1954.11659520
|
[9] |
Lee, T. B. & S. Trott. Large language models, explained with a minimum of math and jargon[J/OL]. Understanding AI, 2023, (27) . https://www.understandingai.org/p/large-language-models-explained-with.
|
[10] |
Levesque, H., Davis, E. & L. Morgenstern. The winograd schema challenge[C]// Levesque, H., Davis, E. & L. Morgenstern. Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning. Rome: AAAI Press, 2012.
|
[11] |
Mikolov, T., Sutskever, I. & K. Chen. et al. Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems, 2013, 26: 3111−3119.
|
[12] |
Vaswani, A., Shazeer, N. & N. Parmar. et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998−6008.
|
[13] |
Wang, K., Variengien, A. & A. Conmy. et al. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small[J]. arXiv preprint arXiv, 2022, 2211. 00593.
|
[14] |
Wishart, R. & P. Prokopidis. Topic Modelling Experiments on Hellenistic Corpora[C]// Proceedings of the Workshop on Corpora in the Digital Humanities. Bloomington, 2017.
|
[15] |
Wittgenstein, L. Philosophical Investigations[M]. Trans. Anscombe, G. E. M. Oxford: Basil Blackwell, 1958.
|
[16] |
Wolfram, S. What Is ChatGPT Doing and Why does It work?[M]. WOLFRAM传媒汉化小组, 译. 北京: 人民邮电出版社, 2023.
|
[17] |
Yarlett, D. & M. Ramscar. Language learning through similarity-based generalization [D]. Palo Alto: Stanford University, 2008.
|
[18] |
黄子峰. 详解ChatGPT的智能从何而来?[EB/OL]. (2023−02−20) [2023−02−19]. https://www.acfun.cn/a/ac40704856.
|
[19] |
李沐, 刘树杰, 张冬冬, 周明. 机器翻译 [M]. 北京: 高等教育出版社, 2018.
|
[20] |
秦兵. 大语言模型之人类价值观对齐[EB/OL]. (2023−08−03) [2023−08−10]. https://mp.weixin.qq.com/s/888XZ43VP8nefVXgGww9Bw.
|
[21] |
王庆法. OpenAI首席科学家透露GPT4技术原理[EB/OL]. (2023−03−17) [2023−03−30]. https://mp.weixin.qq.com/s/H8vNSn-0Ho2Ho4I0n7YDfQ.
|
[22] |
徐烈炯. 语义学(修订本)[M]. 北京: 语文出版社, 1995.
|
[23] |
袁毓林. 为什么要给语言建造一座宫殿?−从符号系统的转喻本质看语言学的过度附魅[J]. 语言战略研究, 2019, 4(4): 60−73.
|
[24] |
袁毓林. 在人类生境约束下思考语言的设计原理和运作机制[J]. 语言战略研究, 2022, 7(6): 85−96.
|
[25] |
袁毓林. 如何测试ChatGPT的语义理解和常识推理水平?−兼谈大语言模型时代语言学的挑战与机会[J]. 语言战略研究, 2024, 9(1): 49−63.
|
[26] |
袁毓林. 语义理解与常识推理的机器表现和人类基线之比较−怎样评估ChatGPT等大型语言模型的语言运用能力?[J]. 汉语学报, 2024, 待刊.
|