File
Transformer models for protein-guided drug compound generation: A comparison of amino acid sequences, pre-trained protein embeddings, SMILES, and SELFIES
Digital Document
Abstract |
Abstract
Drug discovery is a time-consuming and costly process that notoriously suffers from low success rates. Increased availability of chemically relevant data and advances in machine learning techniques offer potential solutions for aiding in the drug development pipeline. This thesis explores the use of Transformers in the conditional generation of potential drug compounds from protein context. Building on previous research, this work implements four transformer models to take in protein information as input and generate potential binding compounds. Each model uses either SMILES or SELFIES string representations of compounds and amino acid sequences or pretrained ESM-2 protein embeddings as contextual input. These models are trained and compared in their ability to generate chemically feasible compounds that approximate the physiochemical properties of the training set and show binding potential specific to the contextual proteins. The utilization of SEFLIES increased compound validity and diversity
but overall had a negative performance impact compared to their SMILES counterparts. Pretrained protein embeddings were shown to decrease validity but improved model performance despite no change to model structure or size. These results highlight the potential of transformer models paired with pretrained protein embeddings to enhance the drug discovery process with the generation of lead compounds from novel proteins without any fine-tuning or retraining. |
---|---|
Persons |
Persons
Author (aut): Fossl, Dylan
Thesis advisor (ths): Jian, Fan
Degree committee member (dgc): Maurice, Sean
Degree committee member (dgc): Hamieh, Alia
|
Degree Name |
Degree Name
|
Department |
Department
|
DOI |
DOI
https://doi.org/10.24124/2024/59584
|
Collection(s) |
Collection(s)
|
Origin Information |
|
||||||
---|---|---|---|---|---|---|---|
Organizations |
Degree granting institution (dgg): University of Northern British Columbia
|
||||||
Degree Level |
Extent |
Extent
1 online resource (xii, 108 pages)
|
---|---|
Physical Form |
Physical Form
|
Physical Description Note |
Physical Description Note
PUBLISHED
|
Content type |
Content type
|
Resource Type |
Resource Type
|
Genre |
Genre
|
Language |
Language
|
Handle |
Handle
Handle placeholder
|
---|
Use and Reproduction |
Use and Reproduction
author
|
---|---|
Rights Statement |
Rights Statement
|
Language |
English
|
---|---|
Name |
Transformer models for protein-guided drug compound generation: A comparison of amino acid sequences, pre-trained protein embeddings, SMILES, and SELFIES
|
Authored on |
|
MIME type |
application/pdf
|
File size |
2545762
|
Media Use |