Toward the Explainability of Drug-Target Interactions: End-to-End Deep Learning Architectures for Binding Affinity Prediction

Monteiro, Nelson Rodrigo Carvalho

Utilize este identificador para referenciar este registo: https://hdl.handle.net/10316/115538

Título:	Toward the Explainability of Drug-Target Interactions: End-to-End Deep Learning Architectures for Binding Affinity Prediction
Outros títulos:	Rumo à Explicabilidade das Interações Fármaco-Alvo: Arquiteturas de Aprendizagem Profunda de Ponta a Ponta para a Previsão da Afinidade de Ligação
Autor:	Monteiro, Nelson Rodrigo Carvalho
Orientador:	Arrais, Joel Perdiz Oliveira, José Luís
Palavras-chave:	Afinidade Fármaco-Alvo; Aprendizagem Profunda; Cavidade de Ligação; Explicabilidade; Interação Fármaco-Alvo; Binding Pocket; Deep Learning; Drug-Target Affinity; Drug-Target Interaction; Explainability
Data:	27-Jun-2024
Projeto:	info:eu-repo/grantAgreement/FCT/POR_CENTRO/2020.04741.BD/PT info:eu-repo/grantAgreement/UIDB/00326/2020
Título da revista, periódico, livro ou evento:	Toward the Explainability of Drug-Target Interactions: End-to-End Deep Learning Architectures for Binding Affinity Prediction
Local de edição ou do evento:	Departamento de Engenharia Informática da Faculdade de Ciências e Tecnologia da Universidade de Coimbra
Resumo:	The identification of compounds that selectively bind to proteins continues to pose challenges in drug discovery. Thus, the proper assessment of target-specific compound selectivity and the accurate prediction of an unbiased Drug-Target Affinity (DTA) metric are pivotal to promoting the identification of Drug-Target Interactions (DTIs), the discovery of potential leads, and the understanding of the binding process. Although significant efforts have been made to increase the effectiveness of traditional approaches, these methods remain impractical for the vast array of compounds and proteins currently known. Hence, establishing effective computational strategies capable of using all available proteomics, chemical, and pharmacological data becomes decisive in the pursuit of new findings.Despite the plethora of in silico solutions to overcome the challenges of traditional experiments, most studies still focus on binary classification, overlooking the importance of characterizing DTIs with unbiased binding strength values to properly distinguish primary interactions from those with off-targets. Moreover, several methods simplify the interaction mechanism, neglecting the multi-domain inter-dependency associated with the proteomics, chemical, and pharmacological spaces, and have yet to include explainability into the inner structure of the architectures or providing potential explanations to the predictions, thus, limiting the validity and understanding of the results. Furthermore, most DTA or DTI prediction studies have not yet given any special characterization to binding positions or actively integrated information regarding binding pockets during the learning process, leading to the estimation of potential DTIs based on redundant substructures.This research tackles the challenge of DTA prediction by proposing and investigating novel Deep Learning (DL) architectures that leverage 1D raw sequential and structural data and focus on modeling the multi-domain representation space of DTIs. Furthermore, it aims to offer insights regarding DTIs and enhance prediction understanding by exploring the explainability of black-box models. This thesis comprises three main contributions.The first contribution explores the reliability of Convolutional Neural Networks (CNNs) in the identification of relevant sequential and structural regions, specifically binding sites and evolutionary motifs, and the robustness of the deep representations extracted by providing explanations to the model's decisions based on the identification of the input regions that contributed the most to the prediction. The results demonstrated the effectiveness of the deep representations extracted from CNNs in the prediction of binding affinity. Furthermore, CNNs were found to identify and extract features from regions relevant to the interaction without any a priori information, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction.The second contribution is a Transformer-based architecture, DTITR, that exploits self-attention layers to capture the short and long-term proteomics and chemical context dependencies between the sequential and structural units of the proteins and compounds, and cross-attention layers to exchange information and learn the pharmacological context associated with the interaction space. The results showed that DTITR is effective in predicting DTA, achieving superior performance compared to state-of-the-art baselines. The combination of multiple Transformer-Encoders was found to result in robust and discriminative aggregate representations of the proteins and compounds for binding affinity prediction, in which the addition of a Cross-Attention Transformer-Encoder was identified to be important for improving the discriminative power of these representations. Moreover, DTITR can self-provide different levels of potential DTI and prediction understanding due to the nature of the attention blocks.The last contribution is a binding-region-guided Transformer-based architecture, TAG-DTA, that simultaneously predicts the 1D binding pocket and the binding affinity of DTI pairs, where the prediction of the 1D binding pocket guides and conditions the prediction of DTA. TAG-DTA combines multiple Transformer-Encoder blocks to capture and learn the proteomics, chemical, and pharmacological contexts. The predicted 1D binding pocket conditions the attention mechanism of the Transformer-Encoder used to learn the pharmacological space to model the inter-dependency amongst binding-related positions. The results demonstrated that the 1D binding pocket prediction increases the discriminative power and robustness of the aggregate representation of the pharmacological space, improving the DTA prediction performance. Additionally, TAG-DTA provides increased DTI and prediction understanding due to the attention blocks and prediction of the 1D binding pocket. A identificação de compostos que se ligam seletivamente a proteínas apresenta desafios na descoberta de fármacos. Assim, a avaliação adequada da seletividade de compostos específicos para um alvo e a previsão precisa de uma métrica de afinidade fármaco-alvo (DTA) imparcial são cruciais para promover a identificação de interações fármaco-alvo (DTIs), a descoberta de potenciais fármacos ativos, e a compreensão do processo de ligação. Apesar dos esforços para melhorar a eficácia das abordagens tradicionais, estes métodos continuam inexequíveis para a vasta gama de compostos e proteínas conhecidos. Portanto, estabelecer estratégias computacionais eficazes e capazes de usar todos os dados proteômicos, químicos e farmacológicos disponíveis, torna-se decisivo para novas descobertas.Apesar das inúmeras soluções in silico para superar os desafios das experiências tradicionais, a maioria dos estudos ainda se foca na classificação binária, subvalorizando a importância de caracterizar DTIs com valores de força de ligação imparciais para distinguir corretamente as interações primárias daquelas com alvos não específicos. Além disso, vários métodos simplificam o mecanismo de interação, negligenciado a interdependência multi-domínio associada aos espaços proteômico, químico e farmacológico, e ainda não consideraram incluir explicabilidade na estrutura interna das arquiteturas ou fornecer possíveis explicações para as previsões, limitando a validade e compreensão dos resultados. Para além disso, a maioria dos estudos computacionais de previsão de DTA ou DTI não dá nenhuma caracterização especial às posições de ligação, ou integra ativamente informação sobre zonas de ligação durante o processo de aprendizagem, levando à previsão de potenciais DTIs com base em subestruturas redundantes.Esta investigação aborda o desafio da previsão de DTA ao propor e investigar novas arquiteturas de aprendizagem profunda (DL) que aproveitem dados sequenciais e estruturais brutos de 1D, e que se foquem na modelação do espaço de representação multi-domínio das DTIs. Para além disso, ela procura oferecer perceções sobre DTIs e melhorar a compreensão da previsão ao explorar a explicabilidade de modelos caixa preta. Esta tese compreende três contribuições principais.A primeira contribuição explora a confiabilidade das redes neuronais convolucionais (CNNs) na identificação de regiões sequenciais e estruturais relevantes, como sítios de ligação e padrões evolucionários, e na robustez das representações profundas extraídas através do fornecimento de explicações para as decisões do modelo com base na identificação das regiões de entrada que mais contribuíram para a previsão. Os resultados mostraram a eficácia das representações profundas extraídas pelas CNNs na previsão da afinidade de ligação. Para além disso, as CNNs foram capazes de identificar e extrair características de regiões relevantes para a interação sem qualquer informação a priori, onde o peso associado a esses locais estava na gama daqueles com a maior influência positiva dada pelas CNNs na previsão. A segunda contribuição é uma arquitetura baseada em Transformers, DTITR, que utiliza camadas de self-attention para capturar as dependências do contexto proteómico e químico de curto e longa distância entre as unidades sequenciais e estruturais das proteínas e dos compostos, e camadas de cross-attention para trocar informação e aprender o contexto farmacológico associado ao espaço de interação. Os resultados mostraram que a DTITR é eficaz na previsão de DTA, alcançando um desempenho superior às abordagens de estado de arte. A combinação de vários Transformer-Encoders mostrou resultar em representações agregadas robustas e discriminativas das proteínas e dos compostos para previsão de afinidade de ligação, em que a adição de um Cross-Attention Transformer-Encoder foi importante para melhorar o poder discriminativo destas representações. Para além disso, DTITR fornece diferentes níveis de potencial compreensão de DTI e previsão devido às camadas de atenção.A última contribuição é uma arquitetura baseada em Transformers e guiada pela região de ligação, TAG-DTA, que prevê simultaneamente a cavidade de ligação 1D e a afinidade de ligação, em que a previsão da cavidade de ligação 1D guia e condiciona a previsão de DTA. TAG-DTA combina vários blocos Transformer-Encoder para capturar e aprender os contextos proteómico, químico e farmacológico. A previsão da cavidade de ligação 1D condiciona o mecanismo de atenção do Transformer-Encoder utilizado para aprender o espaço farmacológico, com o objetivo de modelar a interdependência entre as posições de ligação. Os resultados mostram que a previsão da cavidade de ligação 1D aumenta o poder discriminativo e a robustez da representação agregada do espaço farmacológico, melhorando o desempenho na previsão de DTA. Adicionalmente, TAG-DTA fornece maior compreensão de DTI e previsão devido aos blocos de atenção e à previsão da cavidade de ligação 1D.
Descrição:	Tese de Doutoramento em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/115538
Direitos:	openAccess
Aparece nas coleções:	UC - Teses de Doutoramento

Ficheiros deste registo:

Ficheiro	Tamanho	Formato
NelsonRCMonteiro.PhD.Thesis.pdf	28.73 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo

Google Scholar^TM

Verificar

Este registo está protegido por Licença Creative Commons

Ficheiros deste registo:

Google ScholarTM

Google Scholar^TM