Integrating Vision and Language for Automatic Face Descriptions

Rodrigues, Diogo Manuel de Castro

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/86752

DC Field	Value	Language
dc.contributor.advisor	Araújo, Helder de Jesus	-
dc.contributor.author	Rodrigues, Diogo Manuel de Castro	-
dc.date.accessioned	2019-04-17T22:42:44Z	-
dc.date.available	2019-04-17T22:42:44Z	-
dc.date.issued	2018-09-24	-
dc.date.submitted	2019-04-17	-
dc.identifier.uri	https://hdl.handle.net/10316/86752	-
dc.description	Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia	-
dc.description.abstract	Nesta dissertação, para criar um exemplo único de um sistema de face para texto e texto para face foi integrado visão por computador e processamento de linguagem natural. O propósito é fornecer uma solução que permita ajudar os seres humanos a realizar funções com maior qualidade e de forma mais rápida. Assim sendo pretende-se criar um sistema que possa ser usado, por exemplo, para descrever rostos para pessoas com deficiência visual ou para gerar rostos a partir de descrições para investigações criminais. No entanto trata-se apenas de uma versão preliminar, na medida em que o curto tempo disponível para a sua realização não permitiu alcançar a ambiciosa proposta. De forma a atingir este objectivo, foi criado um sistema com a capacidade de descrever textualmente imagens faciais e por outro lado, gerar automaticamente imagens faciais a partir de descrições textuais. O sistema é dividido em duas partes, a primeira tem como função prever atributos das imagens faciais através de uma rede neuronal convolucional. Estes são utilizados como base para o modelo de geração de linguagem natural, gerando descrições textuais numa metodologia baseada em regras. A segunda parte, usa uma técnica simples de extração de palavras chave para analisar o texto e identificar os atributos nessa descrição. Seguidamente, o sistema usa uma rede generativa adversarial para gerar uma imagem facial com o conjunto das características desejadas. Os atributos são usados como base no nosso método, uma vez que representam um identificador dominante que transmite características sobre um rosto com eficácia.Os resultados demonstraram, mais uma vez, que os métodos CNN e GAN são atualmente as melhores opções para, tarefas de reconhecimento e geração de imagens, respectivamente. Esta conclusão destá assente nos resultados convincentes. Por outro lado, os métodos de processamento de linguagem natural apesar de terem funcionado bem, de acordo com os objectivos, os seus resultados são menos notáveis, especialmente o modelo de geração de linguagem natural. Este trabalho propõe uma solução fiável e funcional para resolver este sistema complexo, no entanto é uma área que merece uma extensa investigação e desenvolvimento.	por
dc.description.abstract	In this dissertation, computer vision and Natural Language Processing (NLP) are integrated to create a unique example of a face-to-text and text-to-face system. Its intention is to provide a solution that can help humans to perform their jobs with better quality and with a quick response. The aim is to create a system that can be used, for example, to describe faces for visually impaired people or to generate faces from descriptions for criminal investigations. However, this is a preliminary version as it is an ambitious goal to be achieved during the time available for its realization.To accomplish this motivation, a system was created with the capability of describing, textually, facial images, along with the ability to automatically generate face images from text descriptions. The system is divided into two sub-systems. The first part predicts attributes from the face images through a Convolutional Neural Network (CNN) method that are used, further, as a base to the Natural Language Generation (NLG) model. The descriptions are generated on a rule-based methodology. The second part of the system uses a simple keyword extraction technique to analyze the text and identify the attributes on that description. After that, it uses a conditional Generative Adversarial Network (GAN) to generate a facial image with a specific set of desired attributes. The reason why attributes are used as a base on the method is because they are a dominant identifier that can efficiently transmit characteristic about a face. The results demonstrate, once again, that either CNN and GAN methods are presently the best options for recognition and generation tasks, respectively. This conclusion is due to their convincing results. On the other hand, the NLP methods worked well for their purposes. However, its results are less remarkable, especially the NLG model. This work proposes a reliable and functional solution for solving this complex system. Nevertheless, this area needs an extensive investigation and development.	eng
dc.language.iso	eng	-
dc.rights	openAccess	-
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/	-
dc.subject	Inteligência Artificial	por
dc.subject	Aprendizagem Profunda	por
dc.subject	Rede Neuronal Convolucional	por
dc.subject	Rede Adversarial Generativa	por
dc.subject	Processamento de Linguagem Natural	por
dc.subject	Artificial Intelligence	eng
dc.subject	Deep Learning	eng
dc.subject	Convolutional Neural Network	eng
dc.subject	Generative Adversarial Network	eng
dc.subject	Natural Language Processing	eng
dc.title	Integrating Vision and Language for Automatic Face Descriptions	eng
dc.title.alternative	Integrando Visão e Linguagem para Descrições Faciais Automáticas	por
dc.type	masterThesis	-
degois.publication.location	DEEC	-
degois.publication.title	Integrating Vision and Language for Automatic Face Descriptions	eng
dc.peerreviewed	yes	-
dc.identifier.tid	202219380	-
thesis.degree.discipline	Engenharia Electrotécnica e de Computadores	-
thesis.degree.grantor	Universidade de Coimbra	-
thesis.degree.level	1	-
thesis.degree.name	Mestrado Integrado em Engenharia Electrotécnica e de Computadores	-
uc.degree.grantorUnit	Faculdade de Ciências e Tecnologia - Departamento de Eng. Electrotécnica e de Computadores	-
uc.degree.grantorID	0500	-
uc.contributor.author	Rodrigues, Diogo Manuel de Castro::0000-0001-6671-8531	-
uc.degree.classification	17	-
uc.degree.presidentejuri	Batista, Jorge Manuel Moreira de Campos Pereira	-
uc.degree.elementojuri	Perdigão, Fernando Manuel dos Santos	-
uc.degree.elementojuri	Araújo, Hélder de Jesus	-
uc.contributor.advisor	Araújo, Hélder de Jesus	-
uc.controloAutoridade	Sim	-
item.grantfulltext	open	-
item.fulltext	Com Texto completo	-
item.openairetype	masterThesis	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
crisitem.advisor.researchunit	ISR - Institute of Systems and Robotics	-
crisitem.advisor.parentresearchunit	University of Coimbra	-
crisitem.advisor.orcid	0000-0002-9544-424X	-
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Dissertação - Diogo Rodrigues.pdf		2.59 MB	Adobe PDF	View/Open

Show simple item record

Page view(s) 50

409

checked on Apr 23, 2024

Download(s) 50

424

checked on Apr 23, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s) 50

Download(s) 50

Google ScholarTM

Google Scholar^TM