Genetic variations in the Orf7a protein of SARS-CoV-2 and its possible role in vaccine development

coronavirus disease (COVID-19) that has been creating an unprecedented situation globally. The recurrent mutations in SARS-CoV-2 genomes impact on the vaccine designing strategies. The Orf7a is a 121-amino acid-long type I transmembrane accessory protein encoded by the genome of SARSCoV- 2 and plays a crucial role in the virus-host interaction. The present study aimed to analyze the variations occurring in Orf7a due to multiple mutations and its immunological role in developing a promising therapeutic target to curb SARS-CoV-2 infections. Methods: 16,161 sequences of Orf7a reported from the onset of this disease until 13 June 2021 from five continents were compared to identify genetic variations in the protein. Results: A total of 470 point mutations were detected in the sequences submitted. Subsequently, the nature of mutations (deleterious or neutral) was determined. Furthermore, the physicochemical properties, antigenicity, allergenicity, toxicity, and stability of Orf7a protein were estimated to demonstrate the stability of the protein. Additionally, we identified three B-cell immune e pitopes, and their MHC cluster analysis was also performed. Conclusion: The recurrent mutations in Orf7a of SARS-CoV-2 provide a deep understanding of its role in the virus-host interactions. Findings of our study revealed that the predicted epitopes could be promising candidates for a vaccine against COVID-19 infections.


INTRODUCTION
SARS-CoV-2 is responsible for the rapid emergence of novel coronavirus disease, first reported at the wet seafood market of Wuhan city of China in December 2019 1,2 . COVID-19 is a contagious disease that induces mild to severe respiratory illness, including multi-organ dysfunction in the infected individuals 3 . SARS-CoV-2 transmission occurs via the inhalation of aerosols or direct contact with the droplets from an infected person. It has been observed that the incubation period of COVID-19 infection commonly varies between 2 to14 days 4 . COVID-19 has been declared a pandemic on 11 th March 2020 by the World Health Organization (WHO, 2020). As of July 17 th , 2021, worldwide, 190,561,846 confirmed cases of COVID-19 had been reported to WHO, including 4,095,470 casualties 5 . Coronaviruses (CoVs) are enveloped positive-sense, single-stranded RNA viruses belonging to the coronaviridae family. The genetic material is of~30 kb length encoding polyproteins of 9860 long chain of amino acids 6 . The genome of SARS-CoV-2 encodes four main structural proteins (spike S, envelope E, membrane M and nucleocapsid N), nine accessory open reading frames (Orf3a, Orf3b, Orf6, Orf7a, Orf7b, Orf8a, Orf8b and Orf99b) and several nonstructural proteins ranging from NSP1to NSP16 7,8 . Orf7a is made up of 121 long amino acids chain of accessory protein in SARS-CoV-2 that plays an important role in virus-host interaction. ORF7a of SARS-CoV-2 consists of the information of a type I transmembrane protein, which is primarily located in the Golgi apparatus but can also be found on the cell surface 9, 10 . RNA viruses like SARS-CoV-2 exhibit higher rates of genetic mutation than DNA viruses which leads to genomic diversity. Thus, SARS-CoV-2 acquires genetic heterogeneity that modulates virulence properties in the host and thereby facilitating the immune evasion of host 11-13 . A total of 470 point mutations were detected from 16,161 sequences submitted since the onset of this disease up to 13 th June 2021. Additionally, using the predictive tools of computational biology, we attempted to design the epitope-based vaccine candidates that can generate long-lasting Bcell immune responses against SARS-CoV-2 infections. This study also highlights the physicochemical properties, antigenicity, allergenicity, and toxicity of vaccine construct and their MHC cluster analysis, which revealed predicted epitopes can be a potent vaccine candidate to minimize COVID-19 infections. The purpose of the present study was, to analyze the variations occurring in Orf7a protein due to multiple point mutations leading to the alterations in the structure of Orf7a and its immunological role in designing epitope-based vaccine candidates against COVID-19 infections. Moreover, this in silico research work further needs validation through in vitro and in vivo studies.

Data mining
The full-length protein sequence of Orf7a protein of SARS-CoV-2 was downloaded from the NCBI virus database, submitted from five different continents; Asia, Africa, Europe, Oceania, and South America till 13 th June 2021. There were nearly 16,161 sequences released from different continents since the onset of this pandemic. For the mutation studies, a reference sequence of the Orf7a protein of the Wuhan virus was also downloaded with accession number QWZ15014.

Multiple sequence alignment and identification of Orf7a mutants
The full-length Orf7a protein sequences were aligned using Clustal Omega online platform, and the aligned files were viewed using Jalview to detect the mutations regarding Wuhan type virus sequence 14 . The frequency of mutations was calculated to check if different point mutations were from different continents. The non-synonymous amino acid variants were analyzed using Protein Variation Effect Analyzer known as PROVEAN v1.1.3 with a cutoff predicted score of -2.50 15 to detect the effect of mutation on the Orf7a protein.

Estimation of physicochemical properties and hydropathy index of Orf7a protein
The physicochemical properties, which include molecular weight, extinction coefficient, amino acid composition, instability index, estimated half-life, aliphatic index, and an average of hydrophobicity (GRAVY) was calculated using Protparam tool of the Expasy online program. Protscale tool of expense was used for preparing hydropathy plot of Orf7a protein 16 .

Identification of linear B-cell epitopes
IEDB was used to predict the linear B-cell epitopes in the Orf7a protein of SARS-CoV-2 17 . IEDB web server constructs epitopes based on estimation of parameters such as flexibility, accessibility, hydrophilicity, turns, polarity, and the antigenic propensity of the protein using amino acid scales and HMMs.

MHC allele cluster analysis
MHCcluster 2.0 online tool was used to analyze MHC class I and MHC class II alleles, which might interact with the epitopes leading to the immune responses. This online server predicts epitopes and the allele binding phylogenetically in the form of clusters and heatmap 18 .

Antigenicity and allergenicity evaluation
The antigenicity of the Orf7a protein was estimated using the Vaxijen v2.0 server, which predicts antigens according to the auto cross-covariance (ACC) transformation of the protein sequences 19 . To detect whether the Orf7a protein was allergenic, an Aller-TOP server was used, which evaluates protein allergenicity on autocross variance (ACC method) that explains residues based on hydrophobicity, size, flexibility, and other parameters 20 .

Identification of Orf7a mutants and detection of non-synonymous mutants
A total of 16,161 full-length protein sequences of Orf7a, 121 amino acids in length were submitted from all the five continents (Asia, Africa, Europe, Oceania, and South America) till 13 th June 2021 since the onset of this pandemic. These sequences were downloaded along with a reference sequence of Wuhan-type virus from the NCBI virus database. The multiple sequence alignment was performed to detect the variations in the isolates and visualized using Jalview. Among these point mutations, N43Y, T14I, V82A, S81L, and T39I were the most frequently occurring mutations and were used for further characterization in this study (Figure 1). All these five frequent mutations were deleterious for the Orf7a protein at 2.5 cutoff values of PROVEAN scores ( Table 1).

Estimation of physicochemical properties and hydropathy index of Orf7a accessory protein
The estimation of physicochemical properties of Orf7a protein revealed that Orf7a protein is 121 amino acids long with a molecular weight 13744.17 Da, aliphatic index 48.66, instability index 48.66, and GRAVY score of 0.233 ( Table 2). The structure of the      ORF7a protein is shown in Figure 2. The hydropathy plot showed the C-terminal amino acid to be relatively more hydrophobic than the N-terminal end of the Orf7a protein (Figure 3).

B-cell epitope prediction
A total of three linear B-cell epitopes were predicted for 121 amino acids long Orf7a protein, as shown in Figure 4 and Table 3. These epitopes can induce antibody production and hence play a crucial role in hu-moral immunity.

Cluster analysis of MHC alleles
The cluster analysis of the MHC class I allele is shown in Figure 5A&B while that of class II allele is shown in Figure 5C&D, where the red zone denotes strong interaction of the HLA allele with the epitopes of Orf7a protein, whereas yellow depicts weak interaction. We analyzed the binding ability of all the possible alleles with the Orf7a epitopes.

Assessment of antigenicity and allergenicity
To predict the antigenicity of Orf7a protein, the Vax-iJen v2.0 server was used, which predicts antigenicity based on the ability of the vaccine candidate to bind with the B-cell and T-cell receptors and hence can enhance the immune response. This analysis revealed the antigenic nature of Orf7a protein with an antigenicity score of 0.6441 at a threshold of 0.4%. A good vaccine candidate needs to be non-allergenic; hence, the allergenicity and toxicity analysis of Orf7a protein revealed its non-allergenic nature, hence it is possibly a potent vaccine candidate.

DISCUSSION
The rapid spread of coronavirus disease started in China, in late December 2019 and has become a se-rious threat to human health across the globe. Therefore, efficacious and safe antiviral therapeutics are indispensable to curb COVID-19 infections. Primarily, the novel coronavirus causes a pulmonary obstruction with multi-organ dysfunction in humans, whose manifestation encompasses dyspnea (shortness of breath), sore throat, dry cough, and fever. The symptoms of the COVID-19 begin within two days, or it may take up to ≥ 14 days. COVID-19 infections may have some symptoms, or the infected individuals may appear to be asymptomatic. SARS-CoV-2 is an RNA virus and has an enormous capacity to exhibit high rates of mutation 21 . It has been observed in previous studies that mutation plays a vital role in viral evolution and adaptations 22,23 .
Since these traits are found to be the key determinants for viruses to live in the dynamic host environment and enabling them to escape the pre-existing immunity of the host and quickly acquire drug resistance. Various factors are responsible for the rapid spread of SARS-CoV-2 infection, such as fidelity of its RNA polymerase, population density, different geographical regions, poor health or hygiene, and environmental conditions 24 . Mutational analysis of this contagious virus provides a better understanding of its epidemiology, pathogenesis, and design of suitable antiviral therapeutics to fight against COVID-19 infections. We detected 470 point mutations from 16,161 sequences of Orf7a proteins around the world. RNA viruses, including SARS-Cov-2, can accumulate genomic mutations through an error-prone viral enzyme reverse transcriptase and better adapt inside the host, which further creates hurdles in designing antiviral therapeutics against RNA viruses 25 .
The main function of ORF7a is binding and preventing N-linked glycosylation of BST-2 (Bone marrow stromal antigen 2, also called CD317 or tetherin), therefore, blocking the tethering of SARS-CoV virions to the cytoplasmic membrane after they are released from the cell. Taylor JK et al. (2015) 26 have reported that SARS-CoV ORF7a antagonizes the function of BST-2 and suggested that therapeutics designed to inhibit the interaction between BST-2 and ORF7a might be inhibiting virus growth both in vitro and in vivo.
Epitope-based vaccine designing strategies using various tools of immunoinformatics gained much attention for various infectious diseases in recent times. The conventional methods of vaccine development are costly, time-consuming, and require lots of experimental work. However, the epitope-based approach of vaccine designing uses several predictive tools of bioinformatics and has proven to be highly advantageous over the traditional vaccine development strategies. As evident from the earlier studies, in silico vaccine development methods, seem to be specific, easily establish an immunological correlation between host and pathogens, and can elicit long-lasting immunity 4, 27 .
Previous studies have shown that epitope-based vaccine candidates might be a potential target to combat SARS-CoV-2 infections 25, 28 . Therefore, for designing the epitope-based vaccine candidate, antigenicity, allergenicity, physicochemical properties, toxicity, and stability of Orf7a protein were explored to demonstrate the stability of the protein. In addition, we identified 3 B-cell immune epitopes, and its MHC cluster analysis has also been performed, which revealed predicted epitopes might be a promising vaccine candidate to combat COVID-19 infections 29,30 .

CONCLUSIONS
The occurrence of recurrent mutations in the Orf7a of SARS-CoV-2 provides a deep understanding of its role in the virus-host interaction. For designing vaccine construct, Orf7a of coronavirus has been chosen as a good target since Orf7a is a type I transmembrane protein. Moreover, our study highlights the high efficacy and durability of designed epitopes-based vaccine construct using predictive immunoinformatics tools; further, in vitro and in vivo studies are mandatory to validate designed vaccine candidates.

AUTHOR'S CONTRIBUTIONS
NY DKJ performed all the analysis, AK MG KS performed mutational study, NY DKJ wrote the manuscript. All authors read and approved the final manuscript.

FUNDING
None.

AVAILABILITY OF DATA AND MATERIALS
Not applicable.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Not applicable.

CONSENT FOR PUBLICATION
Not applicable.