Proteins are biomolecules composed of sequences of amino acids which enable functions essential for life. Advances in DNA sequencing technology have enabled the sequences of millions of proteins to be obtained. Despite this, wide gaps persist between known protein sequences and known protein functions. This information is classified into hierarchies constructed from experimental observations.
This project aims to hierarchically cluster proteins using the mathematical properties of protein sequences. Proteins will be embedded as n-dimensional vectors in a vector space then clustered using different hierarchical clustering strategies. Mathematically constructed protein hierarchies may enable improved protein function prediction by grouping proteins with shared functions.
Susanna Grigson is an undergraduate student in the Flinders University High Achievers Program studying a combined degree in Mathematics, Molecular Biology and Biochemistry. She primarily focuses on bioinformatics, using her combined knowledge of biology and mathematics to develop creative approaches to understand biological data. Susanna is interested and has extensive experience in research, completing projects with CSIRO’s transformational bioinformatics group and the Institute of Mathematical Sciences at the University of Malaya. Upon completion of her honours in 2021, Susanna plans to undertake a PhD to improve our understanding of biological problems by combining mathematics and other disciplines.