Abstract:
This paper briefly explains the language processing mechanisms, mathematical foundations and theoretical implications of ChatGPT and other modern large language models. Firstly, it demonstrates the performance of large language models in semantic understanding and common sense reasoning by testing ChatGPT’s understanding of ambiguous sentences. Secondly, it introduces the transformer, which is equipped with what is referred to as multi-headed attention (MHA) and functions as a novel module of these large language models. Additionally, it presents word embedding and real-valued vector representations based on distribution at semantics, as well as the role of word vectors in language processing and analogical reasoning. Thirdly, it details how transformers successfully predict the next word and generate appropriate texts by tracing and passing on syntactic and semantic relationship information between words through multi-headed attention (MHA) and feed forward network (FFN). Finally, it provides an overview of the training methods of large language models and shows how they use the method of “recreating a language” to help us re-assure relevant design features (including: distributivity and predictability) of human natural languages and to inspire us to re-examine various syntactic, semantic theories that have been developed and formulated so far.