形式语义知识驱动的中文伪情感句智能识别模型

An Intelligent Recognition Model for Chinese Pseudo-sentiment Sentences Driven by Formal Semantic Knowledge

  • 摘要: 情感因子(情感词、情感短语、情感句式)是情感句的必要不充分条件,有效鉴别包含情感因子但并不表达情感意义的“伪情感句”,是过滤噪声、提升情感句识别准确率的关键一环。本文首先基于语料归纳和同义词扩展,总结出七类鉴别“伪情感句”的语义特征——主观愿望类、主观猜度类、假设让步类、目的计划类、疑问询问类、建议要求类、客观指涉类;然后将每一类型的具体词语添加到语义词典中,赋予其xjc(情感消解词语)的语义标记,制定“情感消解因子+情感因子=伪情感句”等情感消解规则,取消受情感消解因子语义管辖的情感因子的情感倾向;最后用Python将情感词典、语义词典、情感消解规则等知识本体编程实现为中文情感分析系统CUCsas的伪情感句过滤模块,实验准确率为91.0%,召回率为87.7%,F1值为89.3%。

     

    Abstract: Sentiment factors, including sentiment words, phrases and structures, are necessary but not sufficient conditions for identifying sentiment sentences."Pseudo-sentiment sentences" contain sentiment factors but do not convey any sentiment meanings, and the effective identification of such sentences is a crucial step in improving the accuracy of sentiment sentence recognition.In this paper, we first summarize seven types of semantic features for identifying pseudo-sentiment sentences based on corpus induction and synonym expansion, namely subjective desire, subjective conjecture, hypothesis and concession, purpose and plan, question and inquiry, suggestion and request, and objective reference.Next, specific words (tokens) for each type are added to the semantic lexicon, given the semantic mark of "XJC" (sentiment dissolving word), and sentiment dissolving rules such as "sentiment dissipation factor + sentiment factor = pseudo-sentiment sentence" are formulated to eliminate the sentiment bias of sentiment factors governed by sentiment dissolving factors.Finally, knowledge ontology (sentiment lexicon, semantic lexicon, and sentiment dissolving rules) is programmed in Python to implement the pseudo-sentiment sentence filtering module of CUCsas, a Chinese sentiment analysis system.The experimental accuracy, recall rates, and F1 value is 91.0%, 87.7%, and 89.3%, respectively.

     

/

返回文章
返回