A reinforcement learning application to an assembly decision-making problem

Neves, Miguel António Silva

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/92192

DC Field	Value	Language
dc.contributor.advisor	Neto, Pedro Mariano Simões	-
dc.contributor.author	Neves, Miguel António Silva	-
dc.date.accessioned	2020-12-15T10:28:05Z	-
dc.date.available	2020-12-15T10:28:05Z	-
dc.date.issued	2020-10-07	-
dc.date.submitted	2020-12-15	-
dc.identifier.uri	https://hdl.handle.net/10316/92192	-
dc.description	Dissertação de Mestrado Integrado em Engenharia Mecânica apresentada à Faculdade de Ciências e Tecnologia	-
dc.description.abstract	Reinforcement learning é uma metodologia com grande potencial de aplicabilidade em problemas de tomada de decisões na manufatura devido à reduzida necessidade prévia de dados, isto é, o sistema aprende durante a real operação. Esta dissertação foca-se na implementação dum algoritmo de reinforcement learning num problema de tomada de decisões na montagem de um avião, pertencente ao dataset de objetos e benchmark de Yale-CMU-Berkeley, com o objetivo de identificar a eficácia da abordagem proposta na otimização dos tempos de montagem. Existem inúmeros algoritmos de reinforcement learning, tendo sido o algoritmo Q-Learning o escolhido para o trabalho desta dissertação. Este algoritmo baseia-se na aprendizagem duma matriz de Q-values, conhecida como Q-table, através de sucessivas interações com o ambiente de forma a determinar a state-action policy que maximiza as rewards acumuladas e formalizada como um Markov Decision Process (MDP). Esta implementação foi conseguida em três cenários distintos, com um nível de complexidade crescente. No primeiro cenário, o reinforcement learning agent apenas poderia distinguir entre sequencias de montagem possíveis ou impossíveis. Num segundo cenário os tempos médios de duração das ações foram adicionados com a consequência de diferentes sequências de montagem corresponderem a diferentes soluções com valores de rewards acumuladas. Este cenário permitiu uma primeira otimização dos parâmetros e rewards do algoritmo. Por fim, no terceiro cenário os tempos médios das ações foram medidos com as respetivas variações, o que tornou a distribuição de rewards acumuladas mais dispersas. Este cenário permitiu uma nova otimização dos parâmetros e rewards do algoritmo. O algoritmo implementado, após a sua otimização, apresentou resultados promissores ao aprender a sequência de montagem ótima 95.83% das vezes.	por
dc.description.abstract	Reinforcement learning is a methodology with great potential of applicability in manufacturing decision-making problems due to the reduced need of previous training data, i.e., the system learns along time with actual operation. This dissertation focuses on the implementation of a reinforcement learning algorithm in an assembly decision-making problem of an airplane, from the Yale-CMU-Berkeley Object and Benchmark Dataset, aiming to identify the effectiveness of the proposed approach in the assembly time optimization. There are numerous types of reinforcement learning algorithms, with Q-Learning being the algorithm chosen for this dissertation. This algorithm is based on the learning of a matrix of Q-values (Q-table) from the successive interactions with the environment to find an optimal state-action policy that maximizes the accumulated reward, formalized as a Markov Decision Process (MDP). This implementation was achieved in three scenarios with increasing complexity. In the first scenario, the reinforcement learning agent could only distinguish between feasible and impossible assembly sequences. In a second scenario the actions’ average time were included so that different assembly sequences corresponded to solutions with diverse accumulated rewards. This scenario allowed an initial optimization of the algorithm’s parameters and rewards. Finally, in the last scenario, the tasks’ average time were measured with the corresponding time variances, so that the assembly sequences would have a larger distribution on accumulated rewards. This last scenario allowed the further optimization of the algorithm’s parameters and rewards. The implemented algorithm, after optimization, achieved very promising results by learning the optimal assembly sequence 95.83% of the times.	eng
dc.language.iso	eng	-
dc.rights	openAccess	-
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	-
dc.subject	Reinforcement learning	por
dc.subject	Q-Learning	por
dc.subject	Sequência de Montagem	por
dc.subject	Otimização	por
dc.subject	Reinforcement learning	eng
dc.subject	Q-Learning	eng
dc.subject	Assembly Sequence	eng
dc.subject	Optimization	eng
dc.title	A reinforcement learning application to an assembly decision-making problem	eng
dc.title.alternative	Aplicação de reinforcement learning num problema de tomada de decisão	por
dc.type	masterThesis	-
degois.publication.location	Departamento de Engenharia Mecânica	-
degois.publication.title	A reinforcement learning application to an assembly decision-making problem	eng
dc.peerreviewed	yes	-
dc.identifier.tid	202554236	-
thesis.degree.discipline	Engenharia Mecânica	-
thesis.degree.grantor	Universidade de Coimbra	-
thesis.degree.level	1	-
thesis.degree.name	Mestrado Integrado em Engenharia Mecânica	-
uc.degree.grantorUnit	Faculdade de Ciências e Tecnologia - Departamento de Engenharia Mecânica	-
uc.degree.grantorID	0500	-
uc.contributor.author	Neves, Miguel António Silva::0000-0002-6792-0042	-
uc.degree.classification	19	-
uc.degree.presidentejuri	Pinto, Telmo Miguel Pires	-
uc.degree.elementojuri	Simão, Miguel Ângelo Fernandes Castanheiro e	-
uc.degree.elementojuri	Vieira, Miguel Jorge	-
uc.degree.elementojuri	Neto, Pedro Mariano Simões	-
uc.contributor.advisor	Neto, Pedro Mariano Simões	-
item.fulltext	Com Texto completo	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.openairetype	masterThesis	-
item.grantfulltext	open	-
item.cerifentitytype	Publications	-
crisitem.advisor.researchunit	CEMMPRE - Centre for Mechanical Engineering, Materials and Processes	-
crisitem.advisor.orcid	0000-0003-2177-5078	-
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Miguel_Neves_2015241595.pdf		3.05 MB	Adobe PDF	View/Open

Show simple item record

Page view(s)

149

checked on Jul 17, 2024

Download(s)

302

checked on Jul 17, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM