The proteins of SARS-CoV-2


File:SARS-CoV-2 without background.png - Wikipedia


By Patrick Kowalski, MBS 2020, Geisinger Commonwealth School of Medicine
Mentor: William McLaughlin, PhD

The COVID-19 pandemic has had a profound economic, social, and political impact across the world. As a result, treatment, and further prevention of the spread of the disease has become a priority in the scientific community. SARS-CoV-2, the virus which causes COVID-19, is the subject of their efforts. The spherical shape covered in spiny projections known as spike proteins has become a recognizable image across culture and language. Understanding the structure of these proteins and how they operate is an essential process in finding safe treatments and potential vaccines.

Organizations such as Continuous Automated Model EvaluatiON (CAMEO) and the Critical Assessment of Protein Structure Prediction (CASP) make up part of the bioinformatics community that have been striving to perfect protein modeling assessments. CAMEO is a continual community effort dedicated to evaluating protein models and CASP is a biannual competition which pits the brightest minds in bioinformatics against one another. The Geisinger Commonwealth School of Medicine is also involved in protein prediction structure in the form of its ResiRole model. ResiRole aims to address model accuracy in a “round-robin” with “head-to-head” comparisons. Differences in scores will distinguish the best model. In CASP, teams typically develop a model that will match with a known protein structure selected as the experimental for the competition. Realizing the structure of different proteins unlocks many opportunities across the bioinformatics world. By understanding specific protein regions, new biological pathways can be studied for potential breakthroughs in therapeutics or the mechanism of rare genetic conditions that were previously poorly understood. This is effectively why finding the absolute best predictive models and knowing how to evaluate them is so important. However, with the advent of COVID-19, the CASP teams’ efforts are now directed toward identifying the best model to predict the structure of SARS-CoV-2. The participants of CASP14 are actively engaged in identifying parts of the protein and projecting three-dimensional models. Since it is a new form of coronavirus, it has been challenging to pinpoint its exact structure.

SARS-CoV-2 shares some similarities with the SARS-CoV virus which affected southern China in 2002. Both viruses also have a protein which attaches preferentially to the angiotensin converting enzyme (ACE2) receptor. This is notable due to the presence of this protein receptor across numerous human tissues including the heart, kidneys, and liver. Because of its ability to replicate through these cells, the lungs become vulnerable as the virus spreads resulting in the respiratory conditions seen in COVID-19 patients. While the first SARS virus was more lethal, SARS-CoV-2 is 10-20 times more likely to bind to the ACE2 receptors than SARS-CoV. The tissues it binds to are also highly replicative meaning it spreads more easily within the body than the first virus. From these preliminary findings, investigators are seeking to either manipulate or block the binding of the virus as therapeutic means.

Part of the CASP competition is to find the best estimation of model accuracy (EMA). In EMAs, there is typically two ways to approach the problem with the difference being whether or not the protein structure assessment model utilizes a known experimental template to create its prediction. What makes this iteration of the competition unique is that the structure of SARS-CoV-2 is still being determined. Thus, there is an interesting dynamic as to which methodology will prevail as ideal because each model focuses on various aspects of protein structure. As the competition progresses, there will be greater insight into which elements of protein modeling are most crucial in success. However, regardless of model, there seems to be an emerging force in the protein modeling world.

Artificial intelligence (AI) in the form of deep learning has resulted in continual improvement of estimation of model accuracy in each technique it has been implemented. CASP14 was meant to be a demonstration of the power in machine learning as part of an experimental competition, but it has instead become a necessary tool in stopping a global pandemic. CASP13’s winner, DeepMind (a part of Google), developed AlphaFold and their entry into the protein folding scene continues to build on the implementation of AI in protein prediction. AlphaFold is not a template-based model technique, instead modeling targets shapes from scratch indicating that both approaches are viable. Academic groups currently do not possess the same resources as Google, but the incorporation of premier technology will be beneficial to them in the future.

While the world is currently focused on COVID-19, the improvements in technology through their efforts will have profound impacts beyond this one virus. The potential of protein prediction extends into understanding rare genetic conditions and bioengineering bacteria for sustainable energy. Proteins play a powerful role in the scientific community and the ability to control that power through an intimate understanding of their structure will produce extraordinary benefits.

Comments

Popular Posts