基于序列到序列模型的观点核心信息抽取
Viewpoint core information extraction based on sequence-to-sequence model
-
摘要: 方面项和观点项的成对抽取是基于方面的情感分析中的一个子任务, 旨在从评论句中提取出观点核心信息。现有的方法需要对数据进行大量复杂的标注或者会产生大量的负样本, 耗费大量人力且计算代价过大, 为解决该问题, 将方面-观点项对的抽取任务转换为文本生成任务, 提出了一种基于序列到序列模型(Seq2Seq)的端到端生成框架来生成方面-观点项对的方法, 在所提出的框架中将大型预训练模型BART的编码器和解码器作为Seq2Seq模型的编码器和解码器, 在解码时结合指针机制直接生成方面-观点词对序列。提出的模型在15res数据集上的F1值为77.31%, 比最佳的基线模型提升了3.74%。实验结果表明, 提出的模型在3个数据集上均优于其他基线模型。Abstract: Pair-wise aspect and opinion terms extraction is a subtask of aspect-based sentiment analysis, which aims to extract the core information of opinions from comment sentences. Existing methods need to perform a large number of complex annotations on the data or generate a large number of negative samples, consume a lot of manpower and computationally expensive, in order to solve this problem, converting the task of pair-wise aspect and opinion terms extraction into a text generation task, an end-to-end generation framework based on sequence-to-sequence(Seq2Seq) model is given to generate pair-wise aspect and opinion terms. The encoder and decoder of the large pretrained model BART are adopted as the encoder and decoder of the Seq2Seq model in the proposed framework, combine the pointer mechanism to generate pair-wise aspect and opinion terms directly during decoding. The proposed model F1 score is 77.31% on the 15res dataset, which is a 3.74% improvement over the best baseline model. Experimental results show that the proposed model is better than other baseline models on the three data sets.