Neural Networks, DeepFloat & TensorFlow Lite; Post-Training Quantization Case Study

Dias, Simão Pedro das Neves Gonçalves

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/90159

Title:	Neural Networks, DeepFloat & TensorFlow Lite; Post-Training Quantization Case Study
Other Titles:	Redes Neuronais, DeepFloat & TensorFlow Lite; Caso de Estudo de Quantização Pós-Treino
Authors:	Dias, Simão Pedro das Neves Gonçalves
Orientador:	Fernandes, Gabriel Falcão Paiva
Keywords:	Machine Learning; Quantização Pós-Treino; DeepFloat; Array Sistólico; Redes Neuronais; Machine Learning; Pos-Training Quantization; DeepFloat; Systolic Array; Neural Networks
Issue Date:	20-Feb-2020
Serial title, monograph or event:	Neural Networks, DeepFloat & TensorFlow Lite; Post-Training Quantization Case Study
Place of publication or event:	DEEC
Abstract:	Recentemente, Machine Learning (ML) passou por um período de renascimento devido à melhoria dos sistemas de computação e memórias dos computadores. A internet também teve um papel fundamental, permitindo o acesso e agregando enormes quantidades de dados. À medida que a tecnologia evolui, as optimizações feitas aos seus processos têm vindo a obter destaque.Tradicionalmente, os modelos de machine learning são bastante pesados em termos de memória e computações durante as fases de inferência e treino.Uma técnica de otimização utilizada em ML é focada na fase de inferência. Os modelos são tipicamente treinados em 32-bits, mas em vez de se realizar a inferência em 32-bits (operações e gravação), esta pode ser quantizada para um formato que utiliza menos bits - um processo designado por Quantização Pós-treino.Tipicamente, quanto menos bits forem guardados e movimentados num sistema, menor será a energia consumida e mais rápidas serão as computações implementadas, resultando num sistema mais eficiente, dado o mesmo tipo de tarefas.O objetivo deste estudo é comparar duas técnicas de quantização pós-treino de 8 bits utilizando dois modelos básicos diferentes, explorando os seus potenciais e as suas ressalvas. Ambos os modelos foram treinados para classificar algarismos escritos manualmente, em que o primeiro modelo é focado em camadas Fully Connected e o segundo é focado em camadas Convolutional.Uma das técnicas estudadas utiliza um sistema de representação numérica novo e este trabalho também explora um modelo para compreender como este sistema acumula erro. Em suma, é uma tentativa para perceber qual dos métodos fornece uma solução mais eficaz e prática. In recent years, Machine Learning (ML) went through a renascence due to improvements in computing systems and computer memories. The internet also played an important role, by providing access to and aggregating large amounts of data. As this technology evolves, optimizations to its processes are receiving more attention.Traditionally, machine learning models are intense in both memory and computations during training and inference.An optimization technique used in ML is focused on the inference phase. Models are typically trained in 32-bits, but instead of performing inference in 32-bits (operations and storage), it can be quantized to a format that uses fewer bits - this is called Post-training quantization.Usually, the fewer bits being stored and moved around in a computing system, the less energy is consumed, thus faster computations are performed, resulting in a more efficient system, given equivalent tasks.The goal of this study is to compare two 8-bit Post-training Quantization techniques by using two different basic models and exploit both their potentials and caveats. Both models are trained to classify handwritten numbers, the first one is focused on Fully Connected layers while the second focuses on Convolutional Layers.One of the techniques examined adopts a novel numeric representation system and this work also explores a model to understand how the system accumulates error. In short, it is an attempt at understanding which method provides a more efficient and practical solution.
Description:	Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/90159
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Simão Dias Dissertação Revista Orientador.pdf		6.25 MB	Adobe PDF	View/Open

Show full item record

Page view(s)

293

checked on Oct 8, 2024

Download(s)

334

checked on Oct 8, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM