Computational methodologies for predicting protein-protein Interactions

Castanheira, João Miguel Pereira Rebordão

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/86177

Title:	Computational methodologies for predicting protein-protein Interactions
Other Titles:	Metodologias computacionais para previsão de interações proteína-proteina
Authors:	Castanheira, João Miguel Pereira Rebordão
Orientador:	Arrais, Joel Perdiz Pires, Paula Cristina Veríssimo
Keywords:	Interação proteína-proteína; Melhorar performance; Extração de features; Módulos de reconhecimento peptídico; PyDPI; Protein-protein interaction; Performance improvement; Features extraction; Peptide recognition modules; PyDPI
Issue Date:	26-Sep-2018
Serial title, monograph or event:	Computational methodologies for predicting protein-protein Interactions
Place of publication or event:	Departamento de Ciências da Vida, FCTUC
Abstract:	Devido à relevância das interações proteicas nas diferentes funções celulares, é importante conseguir detetar a existência das mesmas. Como os métodos computacionais conseguem lidar com um grande número de dados de forma rápida, têm sido muito usados na previsão das interações proteína-proteína. Dessa forma esta tese pretende, com o recurso a novas features, desenvolver um método que melhore a performance da previsão de interações proteicas num dataset aleatório. O resultado destas pesquisas foram três novas abordagens de extração de features sendo elas o recurso a bases de dados de inibidores, o recurso a redes de co-expressão génica e o recurso a módulos de reconhecimento peptídico. Destas três, devido à maior simplicidade e praticabilidade, desenvolveu-se estudos usando a última abordagem enunciada, recorrendo nomeadamente aos domínios SH3, SH2, PDZ, WW e LRR. Para saber se estes domínios são uma boa fonte de features e que podem ser usados utilizando qualquer dataset, analisaram-se os mesmos na deteção de novas interações entre proteínas que não os possuem. Assim no decorrer do trabalho desta tese foram criadas três estratégias, a primeira baseava-se na extração de features pelo software PyDPI (fazendo uso dos descriptors AAC, CTD, Moranauto, QSO, SOCN e CT) e na avaliação da performance dos datasets por descriptor; a segunda estratégia recorreu às mesmas features mas avaliou a performance em datasets com todos os descriptors; a terceira estratégia avaliou a performance como a segunda estratégia mas usando features criadas para o artigo “A Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions". As duas primeiras estratégias foram postas de parte devido a metodologia incorreta e a valores pouco significativos. A terceira estratégia apesar de ter sido efetuada de uma forma bastante controlada também resultou em valores pouco significativos, pelos que se aconselha ao prosseguimento do estudo desta abordagem com novas estratégias e features. Due to the relevance of protein interactions in different cellular functions, it is important to be able to detect their existence. Because computational methods can handle large numbers of data quickly, they have been widely used in predicting protein-protein interactions. Thus, this thesis intends, with the use of new features, to develop a method that improves the predictive performance of protein interactions in a random dataset. The results of these researches were three new approaches of extraction of features, being: the use of databases of inhibitors, the use of gene co-expression networks and the use of peptide recognition modules. Of these three, due to the greater simplicity and practicality, studies were developed using the last approach enunciated, resorting in particular to the SH3, SH2, PDZ, WW and LRR domains. In order to know if these domains are a good source of features and that can be used using any dataset, they were analyzed in the detection of new interactions between proteins that do not possess them. Thus, in the course of the work of this thesis three strategies were created, the first one was based on the extraction of features by the software PyDPI (making use of descriptors AAC, CTD, Moranauto, QSO, SOCN and CT) and in the performance evaluation of datasets by descriptor ; the second strategy resorted to the same features but evaluated the performance in datasets with all descriptors; the third strategy evaluated performance as the second strategy but using features created for the article "The Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions." The first two strategies were set aside due to incorrect methodology and poor values. The third strategy, despite being carried out in a very controlled manner, also resulted in insignificant values, for which it is advisable to continue the study of this approach with new strategies and features.
Description:	Dissertação de Mestrado em Bioquímica apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/86177
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Documento.pdf		3.21 MB	Adobe PDF	View/Open

Show full item record

Page view(s) 50

473

checked on Oct 15, 2024

Download(s) 50

471

checked on Oct 15, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s) 50

Download(s) 50

Google ScholarTM

Google Scholar^TM