人工智能生成文本的词汇丰富度和句法复杂度研究:对比ChatGPT与本族语大学生的写作语篇

Lexical Richness and Syntactic Complexity in AI-Generated Texts: A Comparison of ChatGPT and Native English University Students’ Writing

  • 摘要: 随着生成式人工智能的发展,学界不断热议ChatGPT等大语言模型给外语教育带来的机遇与挑战。关切之一是担忧学生过度依赖人工智能,出现论文代写和抄袭行为,因此挖掘出能够区分ChatGPT与人工写作文本的语言实据是十分必要的。本研究通过对比ChatGPT生成文本与高水平本族语学生同题等长习作,探讨两者在词汇丰富度和句法复杂度方面的差异。研究结果表明,ChatGPT在词汇复杂度、词汇多样性和实词密度方面显著优于人工写作,尤其在使用低频词汇和复杂名词短语时表现突出。此外,ChatGPT在句法复杂度的多个维度上也表现出优势,但在处理从属结构和表达复杂逻辑关系时不及本族语学生。本研究进一步论述ChatGPT大规模预训练和复杂生成机制对上述语言特征的影响,为识解人工智能生成文本的语言特征贡献新的实证依据,并对二语写作教学提供有益的实践启示。

     

    Abstract: With the rapid development of generative artificial intelligence (AI), increasing scholarly attention has been directed to the opportunities and challenges posed by large language models such as ChatGPT for foreign language education. One major concern is that students may become overly reliant on AI, leading to practices such as ghostwriting and plagiarism. It is therefore essential to identify linguistic evidence that can reliably distinguish AI-generated texts from human writing. This study compares texts generated by ChatGPT with same-topic, equal-length essays written by high-proficiency native English-speaking students, with a particular focus on differences in lexical richness and syntactic complexity. The results show that ChatGPT significantly outperforms human writers in lexical sophistication, lexical diversity, and lexical density, especially in its use of low-frequency vocabulary and complex noun phrases. In addition, ChatGPT demonstrates advantages across multiple dimensions of syntactic complexity. However, it is less effective than native students in handling subordinate structures and in expressing complex logical relationships. The study further discusses how ChatGPT’s large-scale pre-training and sophisticated generation mechanisms contribute to these linguistic characteristics. It provides new empirical evidence for identifying the linguistic features of AI-generated texts and offers practical implications for the teaching of second language writing.

     

/

返回文章
返回