Introduction: The emergence of a novel coronavirus, SARS-CoV-2, an etiologic agent of coronavirus disease (COVID-19), has led to a pandemic of global concern. Considering the huge number of morbidity and mortality worldwide, the World Health Organization declared, on 11th March 2020, the pandemic as an unprecedented public health crisis. The virus is a member of plus sense RNA viruses that can show a high rate of mutations. The ongoing multiple mutations in the structural proteins of coronavirus drive viral evolution, enabling them to evade the host immunity and rapidly acquire drug resistance. In the present study, we focused mainly on the prevalence of mutations in the four types of structural proteins- S (spike), E (envelope), M (membrane), and N (nucleocapsid)- that are required for the assembly of a complete virion particle. Further, we estimated the antigenicity and allergenicity of these structural proteins to design and develop a potentially good candidate vaccine against SARS-CoV-2.

Methods: In the present in silico study, envelope protein was found to be highly antigenic, followed by the nucleocapsid, membrane, and spike proteins of SARS-CoV-2.

Results: In this study, we detected 987 mutations from 729 sequences from Asia in October 2020, and compared them with China's first Wuhan isolate sequence as a reference. Spike protein showed the highest mutations with 807 point mutations among the four structural proteins, followed by nucleocapsid with 151 mutations, while envelope showed 19 mutations and membrane only 10 point mutations.

Conclusion: Taken together, our study revealed that variations occurring in the structural protein of SARS-CoV-2 might be altering the viral structure and function, and that the envelope protein appears to be a promising vaccine candidate to curb coronavirus infections.


Human Coronavirus (SARS-CoV-2, Severe acute respiratory syndrome) is a positive-sense RNA virus. As an etiologic agent of coronavirus disease 2019 (COVID-19), the virus induces moderate to severe respiratory distress1. This pandemic originated from an animal market in Wuhan city of China2. The ripple effect of this contagious viral disease has created a humanitarian health crisis and has become an enormous challenge to the entire health systems across the globe.

SARS-CoV-2 is a member of the Coronaviridae family and Nidovirales order. The virus is considered the third zoonotic coronavirus (after SARS-CoV and MERS-CoV) and originated from bats. However, this novel coronavirus has been the only one having pandemic potential3, 4, 5, 6. SARS-CoV-2, a beta coronavirus, is an enveloped single-stranded, positive-sense, non-segmented and genetically diverse RNA virus with the largest genome size among known RNA viruses (29,891 ase pair, encodes for approximately 9860 amino acids)2, 7, 8. The genome of SARS-CoV-2 encodes both structural proteins like spike (S), envelope (E), membrane (M), and nucleocapsid (N), as well as non-structural proteins ranging from NSP1 to NSP16.

RNA viruses, generally, show a drastically high rate of mutation, substantially higher than those of DNA viruses. Due to this high rate of mutation shown by SARS-CoV-2 over a short period, it has been observed that viruses exhibit genomic variability which enables them to modulate virulence properties in the host and subsequently evade the host immunity9, 10.

In the present research work, we detected 987 mutations from 729 sequences derived from Asia in in the October. Altogether spike showed the highest mutations with 807 point mutations among the four structural proteins, followed by nucleocapsid with 151 mutations. Envelope showed 19 mutations and membrane showed only 10 point mutations. The results of our study suggest that mutational analysis of this virus might be considered as a new approach to help understand its genomic variability. Similarly, using the predictive tools of immunoinformatics approach, the antigenicity and allergenicity of the structural proteins of SARS-CoV-2 have been determined to develop efficacious antiviral therapeutics or vaccines against COVID-19.


Data mining

The full-length protein sequences of SARS-CoV-2 structural proteins, i.e., envelope protein, nucleocapsid phosphoprotein, surface glycoprotein and membrane glycoprotein, were retrieved from the NCBI virus database, as submitted from Asia in the month of October. There were 729 SARS-CoV-2 structural protein sequences submitted from Asia in the month of October, including sequences of 165 envelope proteins, 159 nucleocapsid phosphoproteins, 246 surface glycoproteins, and 159 membrane glycoproteins. A total of four reference sequences for envelope protein (YP_009724392), nucleocapsid phosphoprotein (YP_009724397), surface glycoprotein (YP_009724390), and membrane glycoprotein (YP_009724393) were also retrieved for mutational studies.

Multiple sequence alignment (MSA) and mutational identification

Multiple sequence alignment was performed using Clustal Omega online platform (http://www.clustal.org/) based on HMM profile seeded guide trees11. The envelope, nucleocapsid phosphoprotein, surface glycoprotein, and membrane glycoprotein were aligned with their respective reference sequences. The aligned files were viewed using Jalview (https://www.jalview.org/) to identify the point mutations occurring in different structural proteins with respect to the Wuhan type isolate.

Antigenicity and allergenicity evaluation

Vaxijen v2.0 server was used for the estimation of antigenicity of all the four structural proteins to study the capability of structural proteins to be used in vaccine production. This online server predicts antigens as per the auto cross-covariance (called ACC transformation) of the peptide sequences submitted to it12. A good vaccine needs to be non-allergenic to the host, hence the rationale for evaluating the allergenicity of these structural proteins, AllerTOP server was used, which predicts allergenicity based on size, flexibility, and other parameters13.

Figure 1 . Showing the total number of mutations occurring in the structural proteins. a . Surface glycoprotein, b . Envelope protein, c . Membrane glycoprotein and, d . Nucleocapsid phosphoprotein.

Mutational identification

A total of 729 structural protein sequences were retrieved from the NCBI virus database for spike glycoproteins, nucleocapsid phosphoproteins, envelope proteins, and membrane glycoproteins submitted from Asian countries in the month of October 2020, along with four references sequences. The size of the different reference structural proteins, i.e., spikes glycoprotein, nucleocapsid phosphoprotein, envelope protein, and membrane glycoprotein being 1273, 419, 75, and 222 amino acids.

The sequences were viewed using Jalview after alignment to compare and detect the mutations among the Asian isolates with the Wuhan isolates with respect to structural proteins. Amongst the 729 sequences released from Asia, a total of 987 point mutations were detected in all four structural proteins (Figure 1). Among the 311 mutants, spike showed the highest mutations with 807 point mutations (Table 3), followed by nucleocapsid with 151 mutations (Table 2), while envelope showed 19 mutations (Table 1) and membrane showed only 10 point mutations (Table 4).


