Skeleton Fusion for Gestures Recognition in AugmentedReality Environments

Diogo, Miguel António de Figueiredo Moura

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/97920

DC Field	Value	Language
dc.contributor.advisor	Paulo, João Luís Ruivo Carvalho	-
dc.contributor.advisor	Peixoto, Paulo José Monteiro	-
dc.contributor.author	Diogo, Miguel António de Figueiredo Moura	-
dc.date.accessioned	2022-02-02T23:00:47Z	-
dc.date.available	2022-02-02T23:00:47Z	-
dc.date.issued	2021-11-18	-
dc.date.submitted	2022-02-02	-
dc.identifier.uri	https://hdl.handle.net/10316/97920	-
dc.description	Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia	-
dc.description.abstract	Inteligência artificial (IA) é uma área da computação responsável por criar algoritmos capazes de realizar tarefas que requerem inteligência humana. Uma destas tarefas é reconhecimento de gestos humanos, que tem como objectivo analisar os movimentos do corpo humano ao longo do tempo por forma a discriminar/distinguir diferentes gestos. Reconhecimento de gestos implica capacidade de sentir a pose desse humano ao longo do tempo, o que geralmente é feito com câmaras e recorrendo outra área de IA chamada visão por computador.Esta dissertação propõe um pipeline que reconhece gestos humanos a partir de 4 câmaras Microsoft Kinect V2. O pipeline proposto pode ser divido em 3 partes: fusão de skeleton data gerada por 4 câmaras RGB-D, codificação numa imagem da informação fundida e reconhecimento de gestos a partir dessas imagens através de algoritmos de aprendizagem de máquina. De cada câmara é obtida uma série temporal de posições 3D de juntas. Para obter posições tridimensionais, duas das coordenadas são calculadas por OpenPose, e a restante provém da informação de profundidade lida pelas câmaras. As quatro séries temporais são fundidas com um filtro de Kalman. Na segunda parte do pipeline, a série temporal é codificada numa imagem. Dois métodos diferentes são testados para a codificação da série temporal numa imagem: gramian angular fields e recurrence plots. Por último uma rede neural convolucional (CNN) é usada para distinguir sequências de gestos codificadas nas imagens.O nosso pipeline consegui obter uma precisão de 87.8\% no nosso dataset usando a codificação recurrence plot. No entanto, o nosso algoritmo de codificação de skeleton data em imagens e alimentação de uma CNN com essas imagens foi testado não só com um dataset nosso, mas também com outros 2 públicos.	por
dc.description.abstract	Artificial Intelligence is a field of computer science responsible for creating algorithms capable of executing tasks that have traditionally required human intelligence. One of these tasks is \acrfull{har}, whose purpose is to analyze human body movements through time and differentiate between different actions. HAR algorithms rely on the capacity to sense a human body's pose through time, which is generally done with cameras through another field in AI called computer vision.This thesis proposes a pipeline that recognizes human actions from 4 cameras Microsoft Kinect V2. The proposed pipeline can be divided into three parts: the fusion of skeleton data attained from 4 RGB-D cameras, the conversion of the fused data into an image, and action recognition from those images through machine learning algorithms. A time series of 3D joints is extracted from each one of the four cameras. Two of the joint coordinates are computed by the OpenPose algorithm, and the remaining one comes from depth information measured by the cameras. The four time series are fused with a Kalman filter. On the second part of the pipeline, the time series is converted into an image. Two different methods are tested to convert a time series into an image: the gramian angular fields and recurrence plots. Finally, the image that encodes skeleton data is feed into a convolutional neuronal network to recognize the action sequence being performed.Our pipeline manages to attain an accuracy of 87.8\% on our dataset while recurrence plots to encode time series into an image. Nevertheless, our algorithm to convert time series into images and feed those images into a CNN was tested with our dataset and two other public datasets.	eng
dc.language.iso	eng	-
dc.rights	openAccess	-
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	-
dc.subject	CNN	por
dc.subject	Reconhecimento de gestos humanos	por
dc.subject	Fusão de informação RGB-D	por
dc.subject	CNN	eng
dc.subject	Human gesture recognition	eng
dc.subject	RGB-D data fusion	eng
dc.title	Skeleton Fusion for Gestures Recognition in AugmentedReality Environments	eng
dc.title.alternative	Fusão de Esqueletos e Reconhecimento de Gestos	por
dc.type	masterThesis	-
degois.publication.location	DEEC	-
degois.publication.title	Skeleton Fusion for Gestures Recognition in AugmentedReality Environments	eng
dc.peerreviewed	yes	-
dc.identifier.tid	202920518	-
thesis.degree.discipline	Engenharia Electrotécnica e de Computadores	-
thesis.degree.grantor	Universidade de Coimbra	-
thesis.degree.level	1	-
thesis.degree.name	Mestrado Integrado em Engenharia Electrotécnica e de Computadores	-
uc.degree.grantorUnit	Faculdade de Ciências e Tecnologia - Departamento de Eng. Electrotécnica e de Computadores	-
uc.degree.grantorID	0500	-
uc.contributor.author	Diogo, Miguel António de Figueiredo Moura::0000-0002-1865-2125	-
uc.degree.classification	16	-
uc.degree.presidentejuri	Batista, Jorge Manuel Moreira de Campos Pereira	-
uc.degree.elementojuri	Barreto, João Pedro de Almeida	-
uc.degree.elementojuri	Peixoto, Paulo José Monteiro	-
uc.contributor.advisor	Paulo, João Luís Ruivo Carvalho	-
uc.contributor.advisor	Peixoto, Paulo José Monteiro::0000-0002-3680-564X	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.openairetype	masterThesis	-
item.cerifentitytype	Publications	-
item.grantfulltext	open	-
item.fulltext	Com Texto completo	-
item.languageiso639-1	en	-
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Tese_versao_final.pdf		1.2 MB	Adobe PDF	View/Open

Show simple item record

Page view(s)

95

checked on Apr 16, 2024

Download(s)

72

checked on Apr 16, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM