Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.
This method can be regarded as a type of multi-classimage classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations. Subsequently, techniques were developed using machine translation to to attempt to translate the textual vocabulary into the 'visual vocabulary,' represented by clustered regions known as blobs. Subsequent work has included classification approaches, relevance models, and other related methods.
The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user.[1] At present, Content-Based Image Retrieval (CBIR) generally requires users to search by image concepts such as color and texture or by finding example queries. However, certain image features in example images may override the concept that the user is truly focusing on. Traditional methods of image retrieval, such as those used by libraries, have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.
^"Archived copy"(PDF). i.yz.yamagata-u.ac.jp. Archived from the original(PDF) on 8 August 2014. Retrieved 13 January 2022.{{cite web}}: CS1 maint: archived copy as title (link)
Y Mori; H Takahashi & R Oka (1999). "Image-to-word transformation based on dividing and vector quantizing images with words.". Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management. CiteSeerX10.1.1.31.1704.
D Blei; A Ng & M Jordan (2003). "Latent Dirichlet allocation"(PDF). Journal of Machine Learning Research. pp. 3:993–1022. Archived from the original(PDF) on March 16, 2005.
R Maree; P Geurts; J Piater & L Wehenkel (2005). "Random Subwindows for Robust Image Classification". Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 1:34–30.
J Y Pan; H-J Yang; P Duygulu; C Faloutsos (2004). "Automatic Image Captioning"(PDF). Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME'04). Archived from the original(PDF) on 2004-12-09.
Changhu Wang; Feng Jing; Lei Zhang & Hong-Jiang Zhang (2007). "content-based image annotation refinement". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07). doi:10.1109/CVPR.2007.383221.
Ilaria Bartolini & Paolo Ciaccia (2007). "Imagination: Exploiting Link Analysis for Accurate Image Annotation". Springer Adaptive Multimedia Retrieval. doi:10.1007/978-3-540-79860-6_3.
Automatic Image Annotation by Ensemble of Visual Descriptors
Emre Akbas & Fatos Y. Vural (2007). "Automatic Image Annotation by Ensemble of Visual Descriptors". Intl. Conf. on Computer Vision (CVPR) 2007, Workshop on Semantic Learning Applications in Multimedia. doi:10.1109/CVPR.2007.383484. hdl:11511/16027.
A New Baseline for Image Annotation
Ameesh Makadia and Vladimir Pavlovic and Sanjiv Kumar (2008). "A New Baseline for Image Annotation"(PDF). European Conference on Computer Vision (ECCV).