Séminaire ETIS : David Picard

Titre du séminaire et orateur

Learning deep embeddings for cross-modal retrieval.

David Picard, équipe MIDI.

Date et lieu

Mardi 6 novembre 2018, 11h.

ENSEA, salle du conseil.


In this talk, I will review my recent works on deep learning for cross-modal retrieval. In cross-modal retrieval, we want to index documents from several modalities, such as images and texts, so as to be able to retrieve similar documents independently of their type and the type of the query. In my case, given a collection of recipes and pictures of meals, I am interested in retrieving all pictures of a specific recipe, or conversely, the corresponding recipes of a picture of a meal. The key idea is to learn a common latent space representation to which both images and texts are mapped. I will present some architectural choices for the text encoder and the image encoder as well as the learning procedures used to train these encoders in an end-to-end (deep learning) fashion. I will also highlight some properties of the common latent space. Finally, I will present some extensions allowing to perform video indexing.