Please use this identifier to cite or link to this item:
Title: Using natural language processing to detect privacy violations in online contracts
Authors: Silva, Paulo
Gonçalves, Carolina
Godinho, Carolina
Antunes, Nuno Manuel dos Santos 
Curado, Marilia
Issue Date: Mar-2020
Publisher: ACM
Project: info:eu-repo/grantAgreement/EC/H2020/786713/EU/Protection and control of Secured Information by means of a privacy enhanced Dashboard 
Place of publication or event: Proceedings of the 35th Annual ACM Symposium on Applied Computing
Abstract: As information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology.
DOI: 10.1145/3341105.3375774
Rights: openAccess
Appears in Collections:FCTUC Eng.Informática - Artigos em Revistas Internacionais

Files in This Item:
File Description SizeFormat
ACM_SAC_Paper__POSTER_.pdf386.08 kBAdobe PDFView/Open
Show full item record


checked on Jul 2, 2021

Page view(s)

checked on Jul 22, 2021


checked on Jul 22, 2021

Google ScholarTM




Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.