Biochemistry and computer-generated graph comparison of the structural and nonstructural proteins of Spanish-1918 Influenza, pandemic-2009, and bird flu viruses*

A potential emergence of deadly pandemic influenza viruses is unpredictable and most of them have emerged with no forewarning. The distinct epidemiological and pathological patterns of the Spanish (H1N1), pandemic-2009 (H1N1), and avian influenza (H5N1), known as bird flu, viruses may allow us to develop a ‘template’ for possible emergence of devastating pandemic strains. Here, we provide a detailed molecular dissection of the structural and nonstructural proteins of this triad of viruses. GenBank data for three representative strains were analyzed to determine the polymorphic amino acids, genetic distances, isoelectric points, hydrophobicity plot, and protein modeling of various proteins. We propose that the most devastating pandemic strains may have untruncated PB1-F2 protein with unique residues, highly cleavable HA, and a basic NS1. Any newly emerging strain should be compared with these three strains, so that resources can be directed appropriately.


BACKGROUND
We must be constantly alert to the potential emergence of deadly pandemic influenza viruses because most of them have emerged with no forewarning. The emergence of new strains will continue to challenge public health and pose problems for scientific communities (Neumann et al., 2009;Szewczyk et al., 2014). Extensive molecular and bioinformatics data of this virus throughout the world are paramount in development of new antiviral drug and vaccine (Szewczyk et al., 2014). The distinct epidemiological patterns of the Spanish (H1N1), pandemic-2009 (H1N1), and bird flu (H5N1) viruses should allow us to develop a 'template' for pos-sibly devastating pandemic strains. Spanish flu and pandemic-2009 flu are established human-to-human infections (Mena et al., 2016;Taubenberger & Morens, 2006), whereas the bird flu, the highly pathogenic avian influenza virus (HPAIV) subtype H5N1 (HPAIV-H5N1), so far has shown only a limited or unsustainable humanto-human transmission (Kandun et al., 2006;Peiris et al., 2007;Wang et al., 2008). These three strains have spread worldwide, although the bird flu has yet to arrive on the American continent , while this is still theoretically possible (Kilpatrick et al., 2006).
The Spanish flu, which caused an estimated 20-50 million deaths around 1918 (Tumpey et al., 2005), is recognized as the most devastating epidemic in modern history (Trilla et al., 2008). The real toll should have been significantly higher (Johnson & Mueller, 2002). The global case fatality rate (CFR) of the Spanish flu is believed to have been around 2.5% (Taubenberger & Morens, 2006) or greater (Johnson & Mueller, 2002). The CFR of medically attended patients of pandemic-2009 is estimated to have been approximately 0.4% (0.2-0.6%), while CFR among all infected persons was 0.025% (Simonsen et al., 2018), so its virulence is deemed to be mild (Nishiura, 2010). The CFR greatly increases if the denominator is an influenza-related critical illness. The mortality rate associated with critical illness in pandemic-2009 in various countries was reported to range from 15% in Australia to 61% in Southeast Asia (Duggal et al., 2016). Among the viral triad, CFR of the bird flu is the highest. As of December 19, 2016, the cumulative number of confirmed human cases of avian influenza A (H5N1) reported to the World Health Organization in 2003-2006 was 856, 452 (52.8%) of which were fatal (www.who.int). However, the fatalities associated with pandemic-2009 and bird flu may have been much higher without modern medical modalities as it has happened around 1918.
The influenza viruses are part of a unique family of viruses, the Orthomyxoviridae, where members have an eight-segment negative-strand genome (Lamb & Krug, 2001). Five segments encode single proteins. The largest fragment encodes polymerase basic 2 (PB2), polymerase acidic (PA), hemagglutinin (HA), nucleoprotein (NP), and neuraminidase (NA) (Lamb & Krug, 2001). The second largest fragment encodes polymerase basic 1 (PB1) (Lamb & Krug, 2001) and an accessory peptide, PB1-F2 (Chanturiya et al., 2004;Chen et al., 2001). The two smallest segments are each spliced to produce mRNAs for two proteins, M1 and M2 (the seventh segment) and NS1 and NS2 (the eighth segment) (Lamb & Krug, 2001 No direct molecular comparison of all the genes of this influenza virus triad has been made. Molecular modeling of NA has been reported, which established the susceptibility patterns of the viruses to anti-influenza drugs (Le et al., 2009). A phylogenetic analysis and amino acid homology analysis of the polymerase complex genes of the Spanish flu, seasonal flu, and bird flu have been described, and concluded that the genes of Spanish flu closely resemble those of the bird flu . Although the analysis of viral genomes alone is unlikely to clarify some critical issues (Taubenberger & Morens, 2006), such as the viral capacity for human-tohuman transmission and the severity of clinical infection, a head-to-head molecular comparison of different lineages should pinpoint putative gene(s) or protein(s) that can be used as signals for the potential emergence of a catastrophic pandemic strain. Here, we provide a detailed molecular analysis of the structural and nonstructural proteins of this triad of viruses that are associated with their distinct transmissibility and severity.

MATERIALS AND METHODS
The nucleotide and deduced amino acid sequences of all the structural and nonstructural proteins (PB2, PB1, PB1-  Table 1. Sequences were aligned using ClustalW, and the polymorphic amino acids in each protein and the numbers of amino acid differences were determined with Mega 6.0 (Tamura et al., 2013). Genetic distances were calculated with the Kimura 2-parameter model (Kimura, 1980b). The isoelectric points were calculated with Peptide Calculator (http://www.bachem.com). Hydrophobicity plots of each peptide were constructed with the Kyte and Doolittle method (Kyte and Doolittle, 1982), using the online software ProtScale (Gasteiger et al., 2005) (http://web. expasy.org/protscale/), and the results were superimposed using Adobe Creative Cloud 2015 (Adobe System Incorporated). Protein modeling was performed with the online resource PHYRE2 (http://www.sbg.bio.ic.ac.uk) (Kelley et al., 2015), and the protein models were visualized with RasWin 2.7.5.2 (www.rasmol.org).

RESULTS
The genetic distances, numbers of amino acid differences, and isoelectric points of all structural and nonstructural proteins of the SF, PDM, and BF viruses  Xu et al. (1999) are presented in Table 2. The lengths of all the corresponding proteins of the triad were equal, except those of PB1-F2, NS1, and HA. In SF and BF, PB1-F2 contains 90 residues, whereas in PDM, it contains only 11 residues. NS1 in PDM has lost 11 amino acids at the carboxyl terminus. The length of HA in SF and PDM is 566 amino acids, whereas in BF, it is 568 amino acids. The genetic distance between BF and SF is smaller than that between PDM and SF based on PB1 and PA, whereas the contrary is true when the distances are based on PB2, HA, NS1, or NS2. The genetic distances are almost equal between BF and SF and between PDM and SF when based on other protein-coding fragments. Fewer amino acid differences are found in all proteins of BF when compared to SF than PDM to SF, except for HA, NS1, and NS2. The isoelectric points of all proteins are almost equal for the three viruses, except that HA of PDM and NS1 of SF are more basic than the corresponding proteins of the other viruses. The superimposed hydrophobicity plots of all proteins are given in Supplementary Material 1 (at https://ojs. ptbioch.edu.pl/). The plots of PB1-F2, HA, NS1, and NS2 of SF, PDM, and BF are presented in Fig. 1. The plots for PB2, PB1, PA, NP, NA, MA1, and MA2 were almost perfectly superimposed. A slight aberration was observed in HA, which is neutral in SF and PDM, but hydrophilic in BF at residues 300-400 (Fig. 1). PB1-F2 is also more hydrophilic in BF than in SF at positions 1-20 and 60-70, but more hydrophobic at positions 30-50. The NS1 and NS2 plots were not exactly superimposed, and the mismatch was most prominent in the BF plot (Fig. 1).
The result of protein modeling of PB1-F2, NS1, and NS2 are presented in Figs 2-4. PB1-F2 and NS2 are dominated by an α-helix, with no β-sheets. NS1 consists of an α-helix membrane domain and a globular head consisting of an α-helix and a β-sheet. All of these viral proteins have homologous structures, with some minor differences. PB1-F2 consists of two α-helixes. The amino-terminal helix is continuous in BF, whereas it is interrupted in SF. On the contrary, the carboxyl-terminal helix of SF is continuous, whereas that of BF is inter-rupted by random coils. NS1 consists of a globular head formed by the amino terminus of the protein, and a trans-membrane domain at the carboxyl terminus. NS2 consists of four α-helix structures. The first helix (from the amino terminus) is intact in BF, but interrupted in SF and PDM. The second (yellow in Fig. 4) is intact in SF and BF, but interrupted in PDM. The third (green to light blue in Fig. 4) is intact in all three viruses. The last helix at the carboxyl terminus (blue in Fig. 4) is intact in SF, but interrupted in PDM and BF.

DISCUSSION
It is generally believed that the pathogenicity and transmissibility of influenza viruses are polygenic or multifactorial (Swayne, 2011;Wright & Webster, 2001). Many gene segments of these viruses contribute to their capacity to cause severe outcomes in their host and to be readily transmitted between hosts (Maines et al., 2011). Whereas one viral protein might define a pathogenic characteristic, a virulent outcome may only be possible in case of contribution from the other proteins. The magnitude of an infection depends on those proteins acting in concert with the host and environmental factors (Qiu et al., 2014;Slingenbergh et al., 2004;Wu et al., 2015;Zhang et al., 2013). In other words, the genetic make-up of the virus must match the permissiveness of the host and an appropriate environment to generate fatal or contagious outcomes.
The full genomic sequence of the Spanish-1918 influenza virus, which was derived from a naturally preserved human body believed to have died due to this severe pandemic (Taubenberger et al., 1997;, and the massive number of influenza virus sequences determined in recent decades, provide data from which a 'template' for the emergence of modern disastrous pandemic strains can be generated. The triad strains SF, PDM, and BF are representative of different influenza viruses with distinct hallmarks. All of them have been disseminated globally, except BF, which so far affects only Asia, Europe, and Africa (Li Table 2. Genetic distances, numbers of amino acid differences, and isoelectric points of all structural and nonstructural proteins of the Spanish-H1N1, pandemic-2009 H1N1, and bird flu-H5N1 viruses. Genetic distances were calculated with the Kimura 2-parameter model (Kimura, 1980). Isoelectric points were calculated with the Peptide Calculator (http://www.bachem.com). SF, Spanish-H1N1; PDM, pandemic-2009 H1N1; BF, bird flu-H5N1.
et al., 2014). The SF and PDM viruses are readily transmitted between humans, whereas BF has no capacity for sustained human-to-human transmission (Kandun et al., 2006;Peiris et al., 2007;Wang et al., 2008). BF has the highest CFR, at >50% (www.who.int). Without modern medical resources, as occurred in 1918, its CFR might have been much higher. The Spanish-1918 influenza, responsible for the 'mother of pandemics', had a CFR of 2-3% (Taubenberger & Morens, 2006). Compared with those viruses, the pandemic-2009 virus is 'avirulent' or mild (Nishiura, 2010), with a CFR of only 0.05% (Nishiura, 2010). We analyzed genetic distances between the triad viruses, the differences in the number of amino acids and the isoelectric points of all their structural and nonstructural proteins, as well as their hydrophobicity plots. We found that BF closely resembles SF, except in proteins HA and NS2. In those proteins, PDM is closer to SF than to BF. PB1-F2 contains 90 residues in SF and BF, both are the pathogenic members of the triad, but only 11 residues in PDM. The mild disease associated with PDM might be related to the loss of PB1-F2. Different hydrophobicity patterns might also be responsible for the differences between SF and BF. PB1-F2 is more hydrophilic at positions 1-20 and 60-70 in BF than in SF, but more hydrophobic at positions 30-50.
The obvious differences in the open reading frame (ORF) length of this peptide in SF, PDM, and BF suggest that PB1-F2 is responsible for the pathological outcomes of influenza viral infections. The truncation of PB1-F2 in PDM might have caused the low CFR of PDM in 2009. PB1-F2 is a small accessory protein encoded by an alternative ORF in the second largest segment of most influenza A virus genomes (Vidic et al., 2016) and is thought to contribute to viral pathogenicity and the severity of pandemic influenza (Chen et al., 2001).
The role of PB1-F2 in the pathogenesis of the influenza viruses is contentious. The function of this peptide is both strain-and host-specific (Deventhiran et al., 2015), and it is even expressed in a strain-specific way (Buehler et al., 2013). In various studies, this protein caused immune disruption through recruitment of neutrophils and inhibition of the natural killer cells (Vidy et al., 2016), immune cell apoptosis (Vidic et al., 2016), and inhibition of interferon (Varga et al., 2012). Its species-specific activity was demonstrated in attenuation of AIV-H5N1 (Leymarie et al., 2014) and in an experiment with swine isolates (Buehler et al., 2013). The posttranslational phosphorylation of Ser 35 (Mitzner et al., 2009) and Ser 66 (Conenello et al., 2007) of PB1-F2 may contribute to the strain-specific functions of this protein. Both, SF and BF, have serine as residue 35, whereas Ser 66 in SF is substituted with asparagine in BF.
The hydrophilic domain around the cleavage site of HA may be responsible for the highly pathogenic nature of BF. HA contains 566 amino acids in SF and PDM, but 568 amino acids are present in BF. The superimposition of the hydrophobicity plots of HA showed slight aberrations, with more hydrophilic amino acids in the region defined by residues 300-400 in BF than in the other viruses. This domain includes the cleavage site of HA.
HA is an important surface protein for receptor recognition and penetration of the virus into the host cell cytoplasm (Lamb & Krug, 2001). Cleavage of this virus is a prerequisite for viral infections and is therefore a crucial determinant in viral pathogenicity and tissue tropism (Steinhauer, 1999). The most highly pathogenic strains have polybasic amino acids at the cleavage site or no carbohydrate residues in the vicinity of the site (Wright & Webster, 2001). The length of the site also enhances its cleavability (Wright & Webster, 2001). The long track of basic amino acids at the cleavage site of HA in HPAIVs is thought to facilitate expansion of the tissue tropism. Cleavage of the HA precursor to HA1 and HA2 is mediated by a ubiquitous endopeptidase, furin, located in the trans-Golgi network (Swayne, 2011). Infectious virions are released from infected cells with no requirement for extracellular activation. Moreover, the cleavability and pathogenicity of HA seem to be strain-and host-specific. Zhang and others (Zhang et al., 2012) had shown experimentally that a single substitution in the cleavage site of HA modulates the virulence of the H5N1 virus (Zhang et al., 2012), and another report (Suguitan et al., 2012) has demonstrated that the contribution of the H5 multibasic site to the virulence of the HPAIV-H5N1 virus varies among mammalian hosts, and is most significant in mice and ferrets and less remarkable in nonhuman primates. It is believed that the cleavability of HA correlates with the degree of virulence, when all other genetic characteristics are considered equal (Horimoto & Kawaoka, 1997). However, in the evolution of a low-pathogenic avian influenza virus to become an HPAIV, a change in the cleavage site alone is not enough. The low-pathogenicity strains may already have a cryptic virulence potential (Bogs et al., 2010).
NS1 and NS2 may be responsible for the human-tohuman transmission of the influenza virus. The PDM virus resembles SF more closely than BF. The SF and PDM strains have an established human-to-human transmission, while BF is yet incapable of it. The hydrophobicity plots of NS1 and NS2 were incompletely superimposed, and the aberration was predominantly located in the BF plot (Fig. 1). NS1 and NS2 are encoded by the shortest segment of the influenza genome. NS1 is a nonstructural protein expressed in high abundance in the infected cells, whereas NS2 seems to be a structural protein that is a minor component of the virion (Lamb & Krug, 2001). NS1 is collinearly translated from the transcript of the segment, whereas NS2 is encoded by a spliced transcript (Lamb & Krug, 2001). NS1 is particularly linked to the 'cytokine storm' phenomenon that follows an influenza infection (Phung et al., 2011), contributing to its unusual immunopathogenesis (Na-Ek et al., 2017). Hyperinduction of proinflammatory cytokines is a hallmark of severe influenza infection (Liu et al., 2016). Presence of the host cofactors determines the effect of this protein. NS1 induces a cytokine storm only in the presence of the myeloperoxidase system (Phung et al., 2011), but the mechanism of NS2 still requires clarification. NS1 of SF is remarkably basic, whereas NS1 of the other viruses is acidic (Table 2). This unique property of SF NS1 might have led to the devastating human impact of the SF pandemic shortly before the 1920s.
We believe that the severe pathological event due to cytokine storm and the multi-organ infection due to high-cleavability of HA are more plausible than the original antigenic sin phenomenon proposed by some scholars recently. Worobey and others (Worobey et al., 2014) propose that the very high mortality experienced during the SF pandemic was primarily due to previous exposure to another influenza subtype. This is said to represent the original antigenic sin (Gagnon et al., 2015). The finding of Choi and others (Choi et al., 2011) also does not support the antigenic sin hypothesis, in which antibody responses to the pandemic vaccine are reduced in indi-viduals who had been previously vaccinated against another strain. Antigenic sin that leads to a cytokine storm is exemplified by a dengue virus infection (Rothman 2011;Ngono & Shresta 2018). Rapid and elevated secondary antibody response is paramount in the secondary dengue infection (Wahala & de Silva, 2011). Due to variety of dengue strains, this ideal immune response is not neutralizing, but instead it is leading to antibody dependent enhancement and immune-macrophage-complement cascade that causes plasma leakage (Ngono & Shresta 2018).
Amino acid differences disturb secondary structures of PB1, HA, NS1, and NS2, which might alter the humanto-human transmissibility and human pathogenicity of this triad of viruses. Modeling of the PB1-F2, NS1, and NS2 proteins showed that these proteins of all three viruses have homologous structures, with some minor differences.
The findings of this study are merely hypothetical. To test the approach, we made additional analysis with two strains of the most current seasonal influenza vaccine strain 2019, namely A/Michigan/45/2015 (H1N1) and A/Kansas/14/2017 (H3N2) (https://www.who. int/influenza/vaccines/virus/recommendations/en/). The results are presented in supplementary material 2 and 3 at https://ojs.ptbioch.edu.pl/. The A/Michigan/45/2015 (H1N1) resembles the genetic and biochemical patterns of Pandemic-2009. However, the A/ Kansas/14/2017 (H3N2) possesses the most basic isoelectric points in PB1-F2, HA, NA, and NS1. The basic signature of NS1 A/Kansas/14/2017 (H3N2) resembling that of the Spanish flu is not surprising, as this subtype has been prooven to be able to cause pandemic in 1968, known as the Hong Kong Flu (Kilbourne, 2006). The number of fatal cases was estimated to be more than one million (Ryu, 2017). The hydrophobicity plot of PB1-F2 of A/Kansas/14/2017 (H3N2), which is neither overlaid perfectly with SF nor with BF (Supplementary Material 3 at https://ojs.ptbioch.edu.pl/), shows that this protein seems to be strain-specific, as discussed above. While NS1 was perfectly overlaid with each other, the most aberrant hydrophobicity plot is shown for HA of BF, which is very hydrophilic at the cleavage site.
Reverse genetic experiments with combined gene segments are the only way to validate our hypotheses. Such experiments will be controversial and should be strictly regulated. A wide survey of the influenza virus genomes available in databases will offer indirect evidence to support our findings. The analysis conducted in this study was very simple and could be undertaken in many countries, so a capacity to immediately predict the potential impact of an emerging strain is possible there.
We conclude that the putative pathogenicity of an influenza virus lies in PB1-F2 and the cleavability of HA, whereas NS1 and NS2 (especially NS1) are responsible for the human permissiveness of the virus. The most devastating pandemic strains may have untruncated PB1-F2 proteins with unique residues, highly cleavable HA, and a basic NS1. Generation of such strain with reverse genetics will provide proof of this model. However, this kind of experiment must be strictly regulated or may be even impossible to undertake. Any newly emerging strain should be compared with the triad of influenza viruses studied here to rapidly estimate its pathogenicity and human-to-human transmissibility.