Topic Models for Image Retrieval on Large-Scale Databases

Eva Hörster

Topic Models for Image Retrieval on Large-Scale Databases

Dissertation, Department of Computer Science, University of Augsburg, July 2009
Erstgutachter: Professor Dr. Rainer Lienhart
Zweitgutacher: Professor Dr. Wolfgang Effelsberg
erschienen 31. July 2009 in: Augsburg, Germany


With the explosion of the number of images in personal and on-line collections, efficient techniques for navigating, indexing, labeling and searching images become more and more important. In this work we will rely on the image content as the main source of information to retrieve images. We study the representation of images by topic models in its various aspects and extend the current models. Starting from a bag-of-visual-words image description based on local image features, images representations are learned in an unsupervised fashion and each image is modeled as a mixture of topics/object parts depicted in the image. Thus topic models allow us to automatically extract high-level image content descriptions which in turn can be used to find similar images. Further, the typically low-dimensional topic-model-based representation enables efficient and fast search, especially in very large databases.

In this thesis we present a complete image retrieval system based on topic models and evaluate the suitability of different types of topic models for the task of large-scale retrieval on real-world databases. Different similarity measure are evaluated in a retrieval-by-example task.

Next, we focus on the incorporation of different types of local image features in the topic models. For this, we first evaluate which types of feature detectors and descriptors are appropriate to model the images, then we propose and explore models that fuse multiple types of local features. All basic topic models require the quantization of the otherwise high-dimensional continuous local feature vectors into a finite, discrete vocabulary to enable the bag-of-words image representation the topic models are built on. As it is not clear how to optimally quantize the high-dimensional features, we introduce different extensions to a basic topic model which model the visual vocabulary continuously, making the quantization step obsolete.

On-line image repositories of the Web 2.0 often store additional information about the images besides their pixel values, called metadata, such as associated tags, date of creation, ownership and camera parameters. In this work we also investigate how to include such cues in our retrieval system. We present work in progress on (hierarchical) models which fuse features from multiple modalities.

Finally, we present an approach to find the most relevant images, i.e., very representative images, in a large web-scale collection given a query term. Our unsupervised approach ranks highest the image whose image content and its various metadata types gives us the highest probability according to a the model we automatically build for this tag. Throughout this thesis, the suitability of all proposed models and approaches is demonstrated by user studies on a real-world, large-scale database in the context of image retrieval tasks. We use databases consisting of more than 240,000 images which have been downloaded from the public Flickr repository.


Thesis as PDF (university library)