Do cross modal systems leverage semantic relationships?

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

Current cross modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure SemanticMap to evaluate the performance of cross modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.

Do cross modal systems leverage semantic relationships?

NAWAZ, SHAH;Kamran Janjua M.;Gallo Ignazio;Mahmood Arif;Calefati Alessandro;Shafait Faisal

2019-01-01

Abstract

Current cross modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure SemanticMap to evaluate the performance of cross modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo del volume
	
				ICCV 2019 Workshop on Cross-Modal Learning in Real World
			
	ISBN
	
				9781728150239
			
	Titolo del congresso
	
				17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
			
	Luogo del Congresso
	
				Seoul, Korea
			
	Data del Congresso
	
				October 27 - November 2, 2019
			
	Appare nelle tipologie:
	
				Relazione (in Volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2083806

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

ND

7

2

social impact