Building Machine Learning Microservices for the Data Science for Non-Programmers Platform

Pedroso, Artur Jorge de Carvalho

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/83540

Title:	Building Machine Learning Microservices for the Data Science for Non-Programmers Platform
Other Titles:	Construção de Microserviços de Machine Learning para a Plataforma Data Science for Non-Programmers
Authors:	Pedroso, Artur Jorge de Carvalho
Orientador:	Araújo, Filipe João Boavida Mendonça Machado de Paiva, Rui Pedro Pinto de Carvalho e
Keywords:	Ciência de dados; mineração de dados; aprendizagem computacional; computação na nuvem; micro-serviços; Data science; data mining; machine learning; cloud computing; microservices
Issue Date:	10-Sep-2018
Serial title, monograph or event:	Building Machine Learning Microservices for the Data Science for Non-Programmers Platform
Place of publication or event:	DEI-FCTUC
Abstract:	O surgimento de grandes quantidades de dados tornou evidente a falta de cientistas de dados para procederem à sua análise. Para treinar novos cientistas de dados de forma acelerada, é esperado que aplicações que permitam a aplicação de práticas da ciência de dados, mineração de dados e aprendizagem computacional sem requererem conhecimentos de programação sejam uma grande ajuda. Embora já existam algumas aplicações deste género, estas ainda apresentam algumas limitações. Algumas aplicações falham em oferecer boas práticas de aprendizagem computacional, especialmente para a avaliação e selecção de modelos; outras aplicações necessitam que os utilizadores criem fluxos de trabalho complexos e de forma geral não guiam o utilizador durante o processo. Tendo em vista estes problemas, esta tese vai apresentar um protótipo de uma aplicação na nuvem que permite a criação de experiências de aprendizagem computacional reforçando boas práticas de aprendizagem computacional e guiando o utilizador no processo. A aplicação que foi idealizada segue uma arquitectura de micro-serviços, pensada para aumentar a flexibilidade em introduzir e escalar algoritmos de aprendizagem computacional no sistema. Visto que este tipo de arquitecturas pode ser composto por muitos serviços, a sua gestão recorre às tecnologias Docker e Kubernetes para facilitar o processo. De forma geral, o sistema consegue realizar uma grande variedade de experiências de aprendizagem computacional, falhando em experiências mais complexas que requerem mais investigação. Foram realizados testes de usabilidade que confirmam uma grande satisfação por parte dos utilizadores ao usarem o sistema. Testes de performance computacional também foram realizados, não revelando os melhores resultados para já, principalmente devido à falta de optimizações no sistema. With the emergence of Big Data, the scarcity of data scientists to analyse all the data being produced in different domains became evident. To train the new data scientists faster, applications providing data science practices, such as data mining and machine learning, without requiring users to hold programming skills might be of great help. Although we can see already advances in the production of such applications, there are still challenges. Some applications lack in providing good machine learning practices, specially for assessment and selection of models; others require the creation of complex workflows to apply the machine learning processes correctly, and in general these applications do not intend to guide the user in the creation of the machine learning experiments. Having these concerns in mind, in this thesis will be presented a prototype of a cloud application to enable the creation of machine learning experiments enforcing good machine learning practices, while guiding the users in the machine learning process. The envisioned application follows a microservices architecture, which was mainly thought to increase the flexibility in introducing and scaling machine learning algorithms in the system. As microservices architectures can be composed by several services, it was used Docker and Kuberneters technologies to deploy and manage the system in the cloud, making this process easier.In general, the system is able to perform a great variety of machine learning experiments, however the execution of more complex experiments still requires more research, as these can make the system to fail.Preliminary usability tests were conducted with two groups of users to evaluate the envisioned concept for the creation of machine learning experiments, where it was observed a general high level of user satisfaction. To assess the computational performance of the current system design, tests in a public cloud were done, where the observed results were not so good, though these can be justified by the lack of optimisations done in the system at the moment.
Description:	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/83540
Rights:	closedAccess
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format	Login
MasterThesis_ammend_final_FINAL_E_DESTA.pdf		2.78 MB	Adobe PDF	Request a copy

Show full item record

Page view(s) 50

576

checked on Sep 10, 2024

Download(s) 50

376

checked on Sep 10, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s) 50

Download(s) 50

Google ScholarTM

Google Scholar^TM