A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

Melo, Rita; Fieldhouse, Robert; Melo, André; Correia, João D. G.; Cordeiro, Maria Natália D. S.; Gümüş, Zeynep H.; Costa, Joaquim; Bonvin, Alexandre M. J. J.; Moreira, Irina S.

doi:10.3390/ijms17081215

Utilize este identificador para referenciar este registo: https://hdl.handle.net/10316/108631

Título:	A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
Autor:	Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S.
Palavras-chave:	protein-protein interfaces; hot-spots; machine learning; Solvent Accessible Surface Area (SASA); evolutionary sequence conservation
Data:	27-Jul-2016
Editora:	MDPI
Projeto:	SFRH/BPD/97650/2013 UID/Multi/04349/2013 FCT Investigator program—IF/00578/2014 Marie Skłodowska-Curie Individual Fellowship MSCA-IF-2015 (MEMBRANEPROT 659826) UID/NEU/04539/2013 Center for Basic and Translational Research on Disorders of the Digestive System, Rockefeller University, through the generosity of the Leona M. and Harry B. Helmsley Charitable Trust and start-up funds of the Icahn School of Medicine at Mount Sinai
Título da revista, periódico, livro ou evento:	International Journal of Molecular Sciences
Volume:	17
Número:	8
Resumo:	Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
URI:	https://hdl.handle.net/10316/108631
ISSN:	1422-0067
DOI:	10.3390/ijms17081215
Direitos:	openAccess
Aparece nas coleções:	I&D CNC - Artigos em Revistas Internacionais

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
A-machine-learning-approach-for-hotspot-detection-at-proteinprotein-interfacesInternational-Journal-of-Molecular-Sciences.pdf		527.34 kB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo

Ficheiros deste registo:

Google Scholar^TM

Altmetric

Altmetric

Ficheiros deste registo:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM