Title: Machine learning techniques for sequence-based prediction of viral host interactions between SARS Cov 2 and human proteins
Abstract:
The coronavirus disease (COVID-19) pandemic, which is caused by an unique strain of coronavirus called severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) virus infection, is one of the most important diseases in the current situation, according to the World Health Organization (WHO). It has infected over 15 million people in over 200 countries, resulting in the deaths of about 0.6 million people. The disease has put enormous strain on healthcare systems all across the world. The first instance of the new corona virus infection was reported in the Chinese city of Wuhan at the end of 2019. Its fatal effect is now endangering the entire world, from Asia to Europe and America. In addition to many accessory proteins, SARS-CoV-2 contains four primary structural proteins: spike (S) glycoprotein, small envelope (E) glycoprotein, membrane (M) glycoprotein, and nucleocapsid (N) glycoprotein. Understanding how these viral proteins interact with host cells in order to survive and reproduce is critical for therapeutic development. SARS-genetic CoV-2's traits must be fully recognised in order to combat the virus. It is a single-stranded RNA virus with particle sizes ranging from 65 to 125 nm in diameter and a genome size of roughly 27–32 kb. The world's healthcare institutions are frantically hunting for a vaccine to stop the virus from spreading. Aside from that, they segregate the infected patients and treat them with conventional medicine as soon as possible. One method viruses communicate with their hosts is through protein–protein interaction (PPI).
The discovery of PPIs between virus and host proteins aids in the understanding of how virus proteins function, propagate, and cause disease. Experimental ways for detecting PPIs have been developed throughout the last few decades. Nonetheless, these high-throughput experimental screens are generally employed to classify intra-species PPIs, leaving inter-species interactomes relatively unexplored. PPI identification in the lab, on the other hand, is typically time-consuming, labor-intensive, and challenging to generate comprehensive protein interactomes. As a result, efficient computational methods for PPI prediction are employed to bridge the gap by offering experimentally testable hypotheses and discarding protein pairs with a low chance of interaction, reducing the number of PPI candidates to be considered. Computational techniques have been popularly used for predicting viral–host interactions previously. To predict the PPIs between corona virus and human proteins, several machine learning models may be created, which are then confirmed using biological tests, gene ontology, and KEGG pathway enrichment analysis. Anti-viral drug discovery can also be aided by the identification of several repurposable medicines that target the expected interactions.