• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中华核心期刊(遴选)数据库
ZHU Yan, ZHANG Jingwei, YANG Qing, HU Xiaoli, SHAN Meijing. Research on stemming and related ranking optimization for retrieval service[J]. Journal of Guilin University of Electronic Technology, 2022, 42(5): 354-365.
Citation: ZHU Yan, ZHANG Jingwei, YANG Qing, HU Xiaoli, SHAN Meijing. Research on stemming and related ranking optimization for retrieval service[J]. Journal of Guilin University of Electronic Technology, 2022, 42(5): 354-365.

Research on stemming and related ranking optimization for retrieval service

  • The rise of a new generation of information technology and the rapid development of the internet industry have led to an explosive growth in the amount of data. In order to meet the needs of billions of users to obtain effective information from massive data quickly, it is of great significance to improve the retrieval quality and query efficiency of search engines, but it also faces challenges. On the one hand, the query words of users are becoming more and more complex, and the characteristics of the morphological variation of language vocabulary lead to the diversification of search words, while existing stemming algorithms generally suffer from under stemming and unsatisfactory stemming accuracy; On the other hand, it is a very time-consuming task to retrieve document results that meet user query requirements from massive data, and existing methods of dividing documents into multiple servers to handle query latency often suffer from tail latency problems. In view of the above problems, in the text preprocessing stage, the word form normalization algorithm APS (advanced porter stemmer) is designed, the rule function is recoded, and the feature word extraction is optimized; In the related ranking stage, the anytime ranking algorithm SAR (SAAT anytime ranking) is designed based on the score-at-a-Time query processing strategy, which can terminate the query process in advance after a given time budget or processing a specified number of inverted segments and control the query delay effectively. Experiments are carried out on multiple real datasets to verify the effectiveness of the APS algorithm in improving the accuracy of stemming and the authenticity of the SAR algorithm in controlling query latency.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return