Assessing the Fairness of Intelligent Systems

Valentim, Inês Filipa Rente

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/87310

DC Field	Value	Language
dc.contributor.advisor	Lourenço, Nuno António Marques	-
dc.contributor.advisor	Antunes, Nuno Manuel dos Santos	-
dc.contributor.author	Valentim, Inês Filipa Rente	-
dc.date.accessioned	2019-07-26T22:14:49Z	-
dc.date.available	2019-07-26T22:14:49Z	-
dc.date.issued	2019-07-08	-
dc.date.submitted	2019-07-26	-
dc.identifier.uri	https://hdl.handle.net/10316/87310	-
dc.description	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia	-
dc.description.abstract	Atualmente, os sistemas de software baseados em modelos de Aprendizagem Computacional são ubíquos, sendo muitas vezes usados em cenários que afetam diretamente a vida das pessoas. Consequentemente, surgem diversas preocupações sociais e legais, nomeadamente que as decisões suportadas pelos resultados dos modelos possam levar ao tratamento menos favorável de alguns indivíduos, com base em atributos como raça, idade, ou sexo. Na realidade, a fairness é uma das propriedades que os sistemas devem possuir para que cumpram legislação atual, tal como o Regulamento Geral sobre a Proteção de Dados da UE.O objetivo principal deste trabalho é avaliar a fairness de sistemas baseados em modelos de Aprendizagem Computacional, em problemas de classificação. A preparação e o pré-processamento de dados são fulcrais em qualquer pipeline de Aprendizagem Computacional, sendo que era necessário estudar o seu efeito em termos de fairness. Nesta perspetiva, avaliámos o impacto do encoding de atributos categóricos, a remoção do atributo sensível dos dados de treino, e mecanismos de amostragem, como random undersampling e random oversampling. A influência do algoritmo de aprendizagem foi também tida em conta, sendo avaliadas Árvores de Decisão e Random Forests. Medimos a fairness em diferentes etapas do pipeline para compreender os fatores com maior impacto nesta propriedade.Os resultados mostram que fazer uma amostragem de acordo com o output esperado e optar por Random Forests em vez de Árvores de Decisão tende a ter efeitos negativos na fairness. Embora a remoção do atributo sensível dos dados de treino elimine a discriminação direta, os modelos são ainda assim capazes de explorar associações entre este atributo e os restantes, sendo que algumas vezes as classificações acabam mesmo por ser mais injustas que os próprios dados. Desta forma, é necessário que as organizações estejam cientes deste compromisso entre desempenho e fairness, avaliando-o de forma cuidada.	por
dc.description.abstract	Nowadays, software systems based on Machine Learning models are ubiquitous, often being used in scenarios that directly affect people's lives. Consequently, societal and legal concerns arise, namely that decisions supported by the models' outputs may lead to the unfair treatment of individuals, based on attributes like race, age, or sex. In fact, fairness is one of the properties systems must have to be compliant with current legislation, namely the EU General Data Protection Regulation.The main objective of this work is to assess the fairness of software systems based on Machine Learning models in classification scenarios. Data preparation and pre-processing are key on any Machine Learning pipeline, and their effect on fairness needed to be studied in detail. Thus, we assessed the impact of the encoding of the categorical features, the removal of the sensitive attribute from the training data, as well as sampling methods, such as random undersampling and random oversampling. The influence of the learning algorithm was also considered, with an initial evaluation of Decision Trees and Random Forests. Fairness was measured at different stages of the pipeline to understand the procedures with the most impact on it.Our results show that performing sampling with respect to the true labels and opting for Random Forests over Decision Trees often has a negative effect on fairness. Although removing the sensitive attribute from the training data prevents incurring in direct discrimination, the models are often still able to explore associations between this attribute and the remaining features, with the resulting classifications sometimes even being more unfair than the data. As a result, organisations must be aware of and carefully assess the trade-off between classification performance and fairness.	eng
dc.description.sponsorship	H2020	-
dc.language.iso	eng	-
dc.relation	info:eu-repo/grantAgreement/EC/H2020/777154/EU	-
dc.rights	openAccess	-
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	-
dc.subject	Aprendizagem Computacional	por
dc.subject	Discriminação	por
dc.subject	Fairness	por
dc.subject	Sistemas Inteligentes	por
dc.subject	Tomada de Decisão	por
dc.subject	Decision Making	eng
dc.subject	Discrimination	eng
dc.subject	Fairness	eng
dc.subject	Intelligent Systems	eng
dc.subject	Machine Learning	eng
dc.title	Assessing the Fairness of Intelligent Systems	eng
dc.title.alternative	Avaliação da Fairness de Sistemas Inteligentes	por
dc.type	masterThesis	-
degois.publication.location	DEI-FCTUC	-
degois.publication.title	Assessing the Fairness of Intelligent Systems	eng
dc.peerreviewed	yes	-
dc.identifier.tid	202267180	-
thesis.degree.discipline	Informática	-
thesis.degree.grantor	Universidade de Coimbra	-
thesis.degree.level	1	-
thesis.degree.name	Mestrado em Engenharia Informática	-
uc.degree.grantorUnit	Faculdade de Ciências e Tecnologia - Departamento de Engenharia Informática	-
uc.degree.grantorID	0500	-
uc.contributor.author	Valentim, Inês Filipa Rente::0000-0001-6018-1788	-
uc.degree.classification	19	-
uc.degree.presidentejuri	Curado, Marília Pascoal	-
uc.degree.elementojuri	Antunes, Nuno Manuel dos Santos	-
uc.degree.elementojuri	Abreu, Pedro Manuel Henriques da Cunha	-
uc.contributor.advisor	Lourenço, Nuno António Marques::0000-0002-2154-0642	-
uc.contributor.advisor	Antunes, Nuno Manuel dos Santos::0000-0002-6044-4012	-
item.openairetype	masterThesis	-
item.fulltext	Com Texto completo	-
item.languageiso639-1	en	-
item.grantfulltext	open	-
item.cerifentitytype	Publications	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
DISSERTATION.pdf		2.13 MB	Adobe PDF	View/Open

Show simple item record

Page view(s) 50

476

checked on Jul 16, 2024

Download(s) 50

890

checked on Jul 16, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s) 50

Download(s) 50

Google ScholarTM

Google Scholar^TM