Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/101173
DC FieldValueLanguage
dc.contributor.authorFerreira, Francisco-
dc.contributor.authorLourenço, Nuno-
dc.contributor.authorCabral, Bruno-
dc.contributor.authorFernandes, João Paulo-
dc.date.accessioned2022-08-16T07:57:15Z-
dc.date.available2022-08-16T07:57:15Z-
dc.date.issued2021-
dc.identifier.issn2169-3536pt
dc.identifier.urihttps://hdl.handle.net/10316/101173-
dc.description.abstractNowadays, data is king and if treated and used properly it promises to give organizations a competitive edge over rivals by enabling them to develop and design Intelligent Systems to improve their services. However, they need to fully comply with not only ethical but also regulatory obligations, where, e.g., privacy (strictly) needs to be respected when using or sharing data, thus protecting both the interests of users and organizations. Fraud Detection systems are examples of such systems where Machine Learning algorithms leverage information to classify nancial transactions as legitimate or illicit. The data used to create these solutions is usually highly structured and contains categorical and continuous features characterised by complex distributions. One of the main challenges of fraud detection is concerned with the scarcity of fraudulent instances which results in highly unbalanced datasets. Additionally, privacy is crucial, and it is usually forbidden, or not possible, to share the data of organizations and individuals for creating or improving models. In this paper we propose a framework for private data sharing based on synthetic data generation using Generative Adversarial Networks (GAN) that learns the speci cities of nancial transactions data and generates ctitious data that keeps the utility of the original datasets. Our proposal, called Duo-GAN, uses two GAN generators to handle the data imbalance problem, one generator for fraudulent instances and the other for legitimate instances. With this approach, we observed, at most, a 5% disparity in F1 scores between classi ers trained and tested with actual data and the ones trained with synthetic data and tested with actual data.pt
dc.language.isoengpt
dc.relationFCT - UID/CEC/00326/2020pt
dc.relationEuropean Social Fund, through the Regional Operational Program Centro 2020; and in part by the Carnegie Mellon University (CMU)|Portugal Project autonomiC plAtform for MachinE Learning using anOnymized daTa (CAMELOT) under Grant POCI-01-0247-FEDER-045915pt
dc.rightsopenAccesspt
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt
dc.subjectFraud detectionpt
dc.subjectgenerative adversarial networkspt
dc.subjectprivacypt
dc.subjectmachine learningpt
dc.subjectsynthetic data generationpt
dc.subjecttabular datapt
dc.titleWhen Two are Better Than One: Synthesizing Heavily Unbalanced Datapt
dc.typearticle-
degois.publication.firstPage150459pt
degois.publication.lastPage150469pt
degois.publication.titleIEEE Accesspt
dc.peerreviewedyespt
dc.identifier.doi10.1109/ACCESS.2021.3126656pt
degois.publication.volume9pt
dc.date.embargo2021-01-01*
uc.date.periodoEmbargo0pt
item.languageiso639-1en-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairetypearticle-
item.fulltextCom Texto completo-
item.grantfulltextopen-
item.cerifentitytypePublications-
crisitem.author.researchunitCISUC - Centre for Informatics and Systems of the University of Coimbra-
crisitem.author.researchunitCISUC - Centre for Informatics and Systems of the University of Coimbra-
crisitem.author.parentresearchunitFaculty of Sciences and Technology-
crisitem.author.parentresearchunitFaculty of Sciences and Technology-
crisitem.author.orcid0000-0001-6060-4971-
crisitem.author.orcid0000-0001-9699-1133-
Appears in Collections:I&D CISUC - Artigos em Revistas Internacionais
Files in This Item:
Show simple item record

WEB OF SCIENCETM
Citations

1
checked on May 2, 2023

Page view(s)

36
checked on Feb 27, 2024

Download(s)

65
checked on Feb 27, 2024

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons