Skeleton-based Human Activity Performace Evaluation in Telerehabilitation.

Santos, Pedro Miguel Cera Ramos dos

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/113071

Title:	Skeleton-based Human Activity Performace Evaluation in Telerehabilitation.
Other Titles:	Avaliação Baseada na Análise do Esqueleto Humano do Desempenho num Contexto de Telerreabilitação
Authors:	Santos, Pedro Miguel Cera Ramos dos
Orientador:	Paulo, João Luís Ruivo Carvalho Peixoto, Paulo José Monteiro
Keywords:	Machine Learning; Deep Learning; Action Recognition; Motion Prediction; Skeleton Based; Machine Learning; Deep Learning; Action Recognition; Motion Prediction; Skeleton Based
Issue Date:	24-Jul-2023
Serial title, monograph or event:	Skeleton-based Human Activity Performace Evaluation in Telerehabilitation.
Place of publication or event:	DEEC/ISR
Abstract:	O ramo de aprendizagem computacional, ou machine learning, tem sido alvo de bastantesavanços e investimentos tanto do ponto de vista académico bem como em prol da indústriae serviços. Desenvolvimentos e investigação de novas ideias e definições nesta área têm sidouma ocorrência semanal. Estes progressos surgem maioritariamente pelo investimento emaprendizagem profunda deep learning. Os modelos de aprendizagem profunda, alicerçamsenas arquiteturas de redes neuronais artificiais, que por si, são modelos computacionaisinspirados pela estrutura e função do cérebro humano. Estes modelos consistem de váriascamadas de neurónios artificiais que são "capazes" de aprender automaticamente e extrairrepresentações hierárquicas dos dados de entrada. Com isto, esta tese propõe-se a implementar um sistema, para a avaliação do movimento esqueleto humano com base em sistemasde aprendizagem computacional. Este sistema avalia as capacidades do estado da arte naavaliação de um esqueleto virtual. Para esse fim o método separa-se em dois métodos deavaliação diferentes, sabendo à ‘priori’ que o movimento tende a repetir-se e a ser umasequência temporal.O primeiro é uma rede neuronal recorrente bidirecional (LSTM) com mecanismos deauto-atenção, self-attention, de modo a conseguir distinguir entre um exercício bem feito deoutro mal feito. O segundo é uma proposta mais ousada, uma rede que permite a prediçãodo movimento tendo em consideração uma sequência contextual. Este método, baseia-se nosimples facto de que um esqueleto humano e as suas ligações entre juntas são muito bemcodificadas por grafos. Logo a rede a utilizar é uma versão que tem em conta as mudançasespaço-temporais de um grafo, que permite a avaliação da evolução temporal e da interaçãodestas com as juntas. Assim opta-se pela utilização de Redes de Neuronais de Convolução deGrafos, visto que esta, garante a extração de características, features, e outras representações dos grafos, conferindo assim, a análise de uma ampla gama de grafos e tarefas. Vários datasets de movimento humano são utilizados entre eles, o conhecido Human3.6M [50] em formato 3D Cartesiano e 3D Euler, AMASS [77] em formato 3D Cartesiano e por fim um dataset não aberto ao público fornecido pela empresa PROZIS. O primeiro método, para reconhecimento de ações, é um dos melhores da literatura, apresenta uma accuracy de 96% na classificação da correção da execução de exercícios. Os métodos de predição do movimento são capazes de criar predições fidedignas quando apresentados com um bom contexto, em ações lineares com repetição, como andar ou correr. estes conseguem prever até 1 segundo ou 30 frames a 30Hz sem uma grande acumulação de erro.Por fim estes métodos são sujeitos a uma bateria de testes e comparados com os demaisda literatura. Apresentando resultados bastante promissores e com feedback visual ilustradordos erros e da correção do movimento. The field of machine learning, has been the subject of significant advances and investments,both from an academic standpoint and in support of industry and services. Developmentsand research of new ideas and definitions in this area have been a weekly occurrence. Theseadvancements largely arise from investments in deep learning. Deep learning models arebased on artificial neural network architectures, which themselves are computational modelsinspired by the structure and function of the human brain. These models consist of multiplelayers of artificial neurons that are "capable" of automatically learning and extractinghierarchical representations from input data.With this in mind, this thesis aims to implement a system for evaluating human skeletonmovement based on computational learning systems. This system assesses the state-of-theartcapabilities in evaluating a virtual skeleton. For this purpose, the method is dividedinto two different evaluation methods, knowing a priori that the movement tends to repeatand be a temporal sequence. The first method is a bidirectional recurrent neural network(LSTM) with self-attention mechanisms to distinguish between well-executed and poorlyexecuted exercises. The second method is a more ambitious proposal: a network that allowsmovement prediction considering a contextual sequence. This method is based on the simplefact that a human skeleton and its joint connections are well-encoded by graphs. Therefore,the network used is a version that takes into account the spatio-temporal changes of a graph,enabling the evaluation of temporal evolution and the interaction of joints. Thus, the use ofGraph Convolutional Neural Networks is chosen, as they ensure the extraction of featuresand other graph representations, allowing the analysis of a wide range of graphs and tasks.Various human motion datasets are used, including the well-known Human3.6M dataset[50] in Cartesian 3D and Euler 3D formats, AMASS dataset [77] in Cartesian 3D format, andfinally, a non-public dataset provided by the company PROZIS. The first method, for actionrecognition, is one of the best in the literature, achieving an accuracy of 96% in classifying exercise execution correctness. The motion prediction methods are capable of generating reliable predictions when presented with good context, particularly in linear repetitive actions such as walking or running. They can predict up to 1 second or 30 frames at 30Hz without significant error accumulation.Finally, these methods undergo a battery of tests and are compared with others in theliterature, presenting very promising results and providing illustrative visual feedback onerrors and motion correction.
Description:	Dissertação de Mestrado em Engenharia Eletrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/113071
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado