Bio-Inspired Modality Fusion for Active Speaker Detection

Assunção, Gustavo; Gonçalves, Nuno; Menezes, Paulo

doi:10.3390/app11083397

Utilize este identificador para referenciar este registo: https://hdl.handle.net/10316/103726

Campo DC	Valor	Idioma
dc.contributor.author	Assunção, Gustavo	-
dc.contributor.author	Gonçalves, Nuno	-
dc.contributor.author	Menezes, Paulo	-
dc.date.accessioned	2022-11-23T11:48:12Z	-
dc.date.available	2022-11-23T11:48:12Z	-
dc.date.issued	2020-02-28	-
dc.identifier.issn	2076-3417	pt
dc.identifier.uri	https://hdl.handle.net/10316/103726	-
dc.description.abstract	Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.	pt
dc.language.iso	eng	pt
dc.publisher	MDPI AG	pt
dc.relation	FCT - scholarhip 2020.05620.BD	pt
dc.relation	UIDB/00048/2020	pt
dc.rights	openAccess	pt
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	pt
dc.subject	artificial neural networks	pt
dc.subject	multi-modal perception	pt
dc.subject	human–robot interaction	pt
dc.title	Bio-Inspired Modality Fusion for Active Speaker Detection	pt
dc.type	article	-
degois.publication.firstPage	3397	pt
degois.publication.issue	8	pt
degois.publication.title	Applied Sciences (Switzerland)	pt
dc.peerreviewed	yes	pt
dc.identifier.doi	10.3390/app11083397	pt
degois.publication.volume	11	pt
dc.date.embargo	2020-02-28	*
uc.date.periodoEmbargo	0	pt
item.fulltext	Com Texto completo	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.openairetype	article	-
item.cerifentitytype	Publications	-
item.grantfulltext	open	-
crisitem.project.grantno	INSTITUTE OF SYSTEMS AND ROBOTICS - ISR - COIMBRA	-
crisitem.author.researchunit	ISR - Institute of Systems and Robotics	-
crisitem.author.parentresearchunit	University of Coimbra	-
crisitem.author.orcid	0000-0003-4015-4111	-
crisitem.author.orcid	0000-0002-1854-049X	-
crisitem.author.orcid	0000-0002-4903-3554	-
Aparece nas coleções:	FCTUC Eng.Electrotécnica - Artigos em Revistas Internacionais I&D ISR - Artigos em Revistas Internacionais

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Bioinspired-modality-fusion-for-active-speaker-detectionApplied-Sciences-Switzerland.pdf		3.44 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples

Visualizações de página

99

Visto em 16/out/2024

Downloads

43

Visto em 16/out/2024

Ficheiros deste registo:

Visualizações de página

Downloads

Google Scholar^TM

Altmetric

Altmetric

Ficheiros deste registo:

Visualizações de página

Downloads

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM