• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中华核心期刊(遴选)数据库
YANG Lu, QIAN Yi, WEN Yimin. Image description method based on object position relationship in scene[J]. Journal of Guilin University of Electronic Technology, 2024, 44(6): 560-567. DOI: 10.16725/j.1673-808X.202360
Citation: YANG Lu, QIAN Yi, WEN Yimin. Image description method based on object position relationship in scene[J]. Journal of Guilin University of Electronic Technology, 2024, 44(6): 560-567. DOI: 10.16725/j.1673-808X.202360

Image description method based on object position relationship in scene

  • Image description aims to transform visual content into language description, which is an urgent and challenging multimodal generation task. Due to the lack of attention to the implicit position information in the most image description methods, it is difficult to accurately describe the position relationship of the objects in the image. For solving this problem, the position relationship encoder-combine decoder (PRCO) structure is proposed, which focus on and generate the objects positional relationships. A novel position relationship-encoder get started with the object relationship scene graph using node features. Technically, common sense dictionary and reasoning module are created to calculate the degree of imbalance between objects, which are used to perform a secondary encoding of the object relationship nodes. Specifically, the combine-decoder is designed to process the encoded information, with an erasing module and bias gate to optimize the node features in the graph. Experiments are conducted on MSCOCO and Visual Genome Image description dataset, and superior results in comparing to state-of-the-art approaches. More remarkably, PRCO achieves an increases CIDEr performance on Visual Genome testing set. Our code is publicly available on Gitee: https://gitee.com/ymw12345/PRCO.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return