Abstract:
An unsupervised image description model based on generative adversarial method was proposed for image captioning tasks. By using convolutional neural network as the encoder to extract image features, adversarial text generation method as the decoder to generate captions was generated, and YOLOv3 as the auxiliary network was used to unsupervised generate captions of images. The results of comparative experiments show that compared with the traditional models, the final captions generated by the GA-based UIC model has a certain improvement in training speed of text evaluation indicators and the scores such as BLEU, METEOR, ROUGE, and CIDEr. The proposal model provides a new solution for the image captioning task.