Molecular dissection of the structural and nonstructural proteins of spanish-1918 Influenza, pandemic-2009, and bird flu viruses

The potential emergence of deadly pandemic influenza viruses is unpredictable and most have emerged with no forewarning. The distinct epidemiological and pathological patterns of the Spanish (H1N1), pandemic-2009 (H1N1), and avian influenza (H5N1), known as bird flu, viruses may allow us to develop a ‘template’ for possible emergence of devastating pandemic strains. Here, we provide a detailed molecular dissection of the structural and nonstructural proteins of this triad of viruses. GenBank data for three representative strains were analyzed to determine the polymorphic amino acids, genetic distances, and isoelectric points, hydrophobicity plot, and protein modeling of various proteins. We propose that the most devastating pandemic strains may have full-length PB1-F2 protein with unique residues, highly cleavable HA, and a basic NS1. Any newly emerging strain should be compared with these three strains, so that resources can be directed appropriately.


Background
We must be constantly alert to the potential emergence of deadly pandemic influenza viruses because most have emerged with no forewarning. The emergence of new strains will continue to challenge public health and pose problems for scientific communities 1 . The distinct epidemiological patterns of the Spanish (H1N1), pandemic-2009 (H1N1), and bird flu (H5N1) viruses should allow us to develop a 'template' for possibly devastating pandemic strains. Spanish flu and pandemic-2009 flu established human-to-human infections 2,3 , whereas bird flu, the highly pathogenic avian influenza virus (HPAIV) subtype H5N1 (HPAIV-H5N1), has so far shown only limited or unsustainable human-to-human transmission 4-6 . These three strains have spread worldwide, although bird flu has yet to arrive on the American continent 7 , while this is still theoretically possible 8 . Spanish flu, which caused an estimated 20-50 million deaths around 1918 9 , is recognized as the most devastating epidemic in modern history 10 . However, the case fatality rate (CFR) of Spanish flu is not the highest recorded, although its global CFR is believed to have been around 2.5% 3 . The CFR of pandemic-2009 is estimated to have been approximately 0.05% of confirmed cases, so its virulence is deemed mild 11 . The CFR increases greatly if the denominator is influenza-related critical illness.
The mortality rate associated with critical illness in pandemic-2009 in various countries was reported to range from 15% in Australia to 61% in Southeast Asia 12 .
Among the viral triad, the CFR of bird flu is highest. As of December 19, 2016, the cumulative number of confirmed human cases of avian influenza A (H5N1) reported to the World Health Organization in 2003-2006 was 856, 452 (52.8%) of which were fatal (www.who.int). However, this comparison of mortality rates is invalid because the availability of and access to medical interventions were significantly different in 1918 from now. The fatalities associated with pandemic-2009 and bird flu may have been much higher without modern medical modalities.
The influenza viruses are a unique family of viruses, the Orthomyxoviridae, and members have an eight-segment negative-strand genome 13 . Five segments encode single proteins: (from the largest to smallest) polymerase basic 2 (PB2), polymerase acidic (PA), hemagglutinin (HA) nucleoprotein (NP), and neuraminidase (NA) 13 . The second largest fragment encodes polymerase basic 1 (PB1) 13 and an accessory peptide, 15 . The two smallest segments are each spliced to produce the mRNAs for two proteins, M1 and M2 (the seventh segment) and NS1 and NS2 (the eighth segment) 13 .
No direct molecular comparison of all the genes of this influenza virus triad has been made. Molecular modeling of NA has been reported, which established the susceptibility patterns of the viruses to anti-influenza drugs 16 . A phylogenetic analysis and amino acid homology analysis of the polymerase complex genes of Spanish flu, seasonal flu, and bird flu have been described, and concluded that the genes of Spanish Flu closely resemble those of bird flu 17 . Although the analysis of viral genomes alone is unlikely to clarify some critical issues 3 , such as the viral capacity for human-to-human transmission and the severity of clinical infection, a head-to-head molecular comparison of different lineages should pinpoint the putative gene(s) or protein(s) that can be used as signals for the potential emergence of a catastrophic pandemic strain. Here, we provide a detailed molecular analysis of the structural and nonstructural proteins of this triad of viruses that are associated with their distinct transmissibility and severity.

Results
The genetic distances, numbers of amino acid differences, and isoelectric points of all the structural and nonstructural proteins of the SF, PDM, and BF viruses are presented in Table 2. The lengths of all the corresponding proteins of the triad were equal, except those of PB1-F2, NS1, and HA. In SF and BF, PB1-F2 contains 90 residues, whereas in PDM, it contains only 11 residues. NS1 in PDM has lost 11 amino acids at the carboxyl terminus. The length of HA in SF and PDM is 566 amino acids, whereas in BF, it is 568 amino acids. The genetic distance between BF and SF is smaller than that between PDM and SF based on PB1 and PA, whereas the contrary true when the distances are based on PB2, HA, NS1, or NS2. The genetic distances are almost equal between BF and SF and between PDM and SF when based on other protein-coding fragments. Fewer amino acid differences are in all protein of BF to SF than PDM to SF, except in HA, NS1, and NS2. The isoelectric points of all proteins are almost equal in the three viruses, except that HA of PDM and NS1 of SF are more basic than the corresponding proteins in the other viruses.
The superimposed hydrophobicity plots of all the proteins are given in Supplementary Material 1. The plots of PB1-F2, HA, NS1, and NS2 of SF, PDM, and BF are presented in Figure 1. The plots for PB2, PB1, PA, NP, NA, MA1, and MA2 were almost perfectly superimposed. A slight aberration was observed in HA, which is neutral in SF and PDM, but hydrophilic in BF at residues 300-400 ( Figure 1). PB1-F2 is also more hydrophilic in BF than in SF at positions 1-20 and 60-70, but more hydrophobic at positions 30-50. The NS1 and NS2 plots were not superimposed exactly, and the mismatch was most prominent in the BF plot ( Figure 1).  Figure 4) is intact in SF and BF, but interrupted in PDM. The third (green to light blue in Figure 4) is intact in all three viruses. The last helix at the carboxyl terminus (blue in Figure 4) is intact in SF, but interrupted in PDM and BF.

Discussion
It is generally believed that the pathogenicity and transmissibility of influenza viruses are polygenic or multifactorial 18,19  We analyzed the genetic distances between the triad viruses, the differences in the numbers of amino acids and the isoelectric points of all their structural and nonstructural proteins, as well as their hydrophobicity plots. We found that BF closely resembles SF, except in proteins HA and NS2. In those proteins, PDM is closer to SF than is BF. We identified PB1-F2, HA, and NS1 are the factors putatively responsible for the distinct human-to-human transmissibility of SF, PDM, and BF, and as hallmarks of disease severity.
PB1-F2 seems to be responsible for pathogenicity. PB1-F2 contains 90 residues in SF and BF, but only 11 residues in PDM. Therefore, the mild disease associated with PDM might be related to the loss of PB1-F2. Different hydrophobicity patterns might also be responsible for the differences between SF and BF. PB1-F2 is more only induces a cytokine storm in the presence of the myeloperoxidase system 39 , but the mechanism of NS2 still requires clarification. NS1 of SF is remarkably basic, whereas NS1 of the other viruses is acidic (Table 2). This unique property of SF NS1 might have led to the devastating humanitarian impact of the SF pandemic shortly before the 1920s.

1
Amino acid differences disturb of secondary structures of PB1, NS1, and NS2, which might alter the human-to-human transmissibility and human pathogenicity of this triad of viruses. The modeling of the PB1-F2, NS1, and NS2 proteins showed that the proteins of all three viruses have homologous structures, with some minor differences.
The findings of this study are merely hypothetical. Reverse genetic experiments with combined gene segments are the only way to validate our hypotheses. Such experiments will be controversial and should be strictly regulated. A wide survey of the influenza virus genomes available in databases will offer indirect evidence to support our findings. The analysis conducted in this study was very simple and could be undertaken in many countries, so a capacity to immediately predict the potential impact of an emergent strain is possible in these countries.
We conclude that the putative pathogenicity of an influenza virus lies in PB1-F2 and the cleavability of HA, whereas NS1 and NS2 (especially NS1) are responsible for the human permissiveness of the virus. The most devastating pandemic strains may have full-length PB1-F2 proteins with unique residues, highly cleavable HA, and a basic NS1. The generation of such strain with reverse genetics will provide proof of this model. However, this kind of experiment must be strictly regulated or may even be impossible. Any newly emerging strain should be compared with this triad of influenza viruses to rapidly estimate its pathogenicity and human-to-human transmissibility.

Materials and Methods
The nucleotide and deduced amino acid sequences of all the structural and     Cartoon peptide modeling of PB1-F2 of SF (left) and BF (right). Images are colored by inverted rainbow from N-to C-terminus. Protein modeling was performed with the online resource PYRE2 (http://www.sbg.bio.ic.ac.uk) 46 . Protein models were visualized with RasWin 2.7.5.2 (www.rasmol.org).