Sequence variation analysis of the E1 and E2 genes of human papillomavirus type 16 in cervical lesions in women from the south of Poland

The E1 and E2 genes of the human papillomavirus encode the so-called early proteins, their sequences are conserved, and regulatory functions are associated with the viral oncoproteins. The purpose of this study is to determine the HPV16 E1 and E2 mutations appearing in the female population of southern Poland, depending on the severity of cervical pathological changes. We also take into account the number of E1 and E2 mutations detected in the E6 gene variant (350G or 350T). This publication is one of the first in the Central and Eastern Europe to deal with this topic. We identified 4 mutations in the E1 gene and 24 mutations in the E2 gene that have not been described so far. In three cases of squamous cell carcinoma a C3409T mutation occurred, which is widely described as oncogenic. This mutation lies in the 3243-3539 area of the E2 hinge region. Statistical analyses show a possible relationship of mutations in this area with oncogenesis. The discovered dependencies may be important in the context of oncogenesis, however, a study with a larger group of patients is needed in order to confirm this view.


INTRODUCTION
According to the current data, cervical cancer ranks fourth in the world in terms of incidence and mortality among all cancer types (Bray et al., 2018). In Poland, cervical cancer in women has been consistently the seventh most prevalent type of cancer and it has been listed as the ninth most frequent cause of death (among women between ages 15-44, it has ranked as the fourth and the third, respectively) (Bruni et al., 2019). It is estimated that 99.7% of cervical cancer cases correspond to an oncogenic HPV type infection (Aref-Adib & Freeman-Wang, 2016). HPV16 is the most common type of HPV found in women with cervical cancer (present in 63.3% of cases) (Zampronha et al., 2013). It should be noted, however, that in 26.9% of cases, coinfection with the second most common HPV virus -HPV18 -has been reported (Zampronha et al., 2013).
HPV16 belongs to the Alphapapillomavirus genus of the Papillomaviridae family. It is the most common type of HPV detected not only in cervical cancer, but also in the head and neck cancers, as well as in the vulvar, penile and anal cancers (Alemany et al., 2015;Anic & Giuliano, 2011;Celebi et al., 2018;Pils et al., 2017). The HPV16 genome is composed of approximately 8 kb-long dsDNA consisting of six early genes (E1, E2, E4-E7), two late genes (L1, L2) and the LCR region that controls transcription and replication of the viral DNA (de Sanjosé et al., 2018). The E1 and E2 proteins maintain expression of the oncogenic E6 and E7 proteins at a low level (Doorbar, 2006). A low concentration of oncogenic proteins is sufficient to realize the life cycle of the virus while not causing disturbances in the process of epithelial cell differentiation (Doorbar, 2006). However, when the virus integrates with the host genome, the E1 or E2 genes are interrupted or deleted, resulting in cell cycle dysregulation and increased expression of viral oncoproteins (Cricca et al., 2009;Schmidt et al., 2005). In case of HPV16, however, this mechanism may work differently as it has been recently shown that for this type of virus the E1 mRNA levels increase along progression of pathological changes (Baedyananda et al., 2018).
According to the UniProt database (www.uniprot.org), the HPV16 E1 protein has four essential domains: the nuclear localization signal, nuclear export signal, SF3 helicase and DNA-binding region. In turn, the HPV16 E2 protein has three principal essential domains: the transactivation domain, DNA-binding domain and the hinge region.
E6 is an oncogenic protein characterized by two zinc finger motifs (Liu et al., 2009). Its oncogenic potential is generally caused by inactivation of the p53 tumor transformation suppressor and the BAX and Bcl proapoptotic proteins Vogt et al., 2006). Presence of specific mutations in genes encoding E1, E2 and E6 proteins is also associated with an increased oncogenic potential of the virus (Bae et al., 2009;Hu et al., 2001Kahla et al., 2014Szostek et al., 2017;Tsakogiannis et al., 2014, Yao et al., 2019. Many publications show that the distribution of different variants of HPV16 mutations in the world population is not identical, some variants are characteristic of the European population, others are typical for Asian population and yet they are different from the African one (Burk et al., 2009).
Based on single nucleotide polymorphisms, 4 main HPV16 lineages were distinguished: European Asian, African 1, African 2 and Asian American (Burk et al., 2009). In the European Asian lineage there are European and Asian sublineages (Cornet et al., 2012). In turn, the European subtype is divided into two variants: EUR-p (EUR-350T) and EUR-350G (Cornet et al., 2012). This division is based on the replacement of the T>G nucleotide at position 350 of the E6 gene, which results in an amino acid change at position 83, where leucine replaces valine (L83V) (Zehbe et al., 2001). The HPV16 E6 350G genome variant is more common in women with persistent infections and high-grade cervical diseases than the HPV16 E6 350T prototype (Gheit et al., 2011), but an increase in the chance of cancer progression probably does not occur only because of this additional nucleotide change (Szostek et al., 2017).
The purpose of this study is to determine the HPV16 E1 and E2 mutations in females living in southern Poland and their relation to the severity of cervical pathological changes, as no similar report in the Central and Eastern Europe has been produced yet (Sabol et al., 2012). Our aim was to search for unknown mutations of these genes and assess whether the E1 and E2 sequence variations differ depending on the HPV 16 350T or 350G E6 gene variant.

MATERIALS AND METHODS
The study was conducted on 22 women from southern Poland, aged 28-62 years (mean 43±12) with HPV16 infection confirmed by PCR and reverse hybridization (INNO-LiPA, Innogenetics, Belgium). Clinical material (cervical smears) was taken from women with two different diagnoses: low-grade squamous intraepithelial lesions -LSIL (n=11) and squamous cervical carcinoma -SCC (n=11), FIGO stage I-III. All samples were taken at the National Research Institute of Oncology, Krakow Branch, Poland. The study had been approved by the Ethics Committee of the Jagiellonian University (identification code: 1072.6120.29.2018).
All samples were collected from participating patients before the beginning of treatment. The smears were taken into 2 ml of the 0.9% NaCl solution and stored at -70°C before further processing.
Genomic DNA was isolated from cervical smears using the Genomic DNA Prep Plus kit (A&A Biotechnology, Poland).
The positive (SiHa cell line containing HPV16 insert) and negative controls (H 2 O) were added to each reaction.
The obtained PCR products were enzymatically purified and sequenced at the Genomed company (Poland). The same primers were used for amplification and sequencing of both strands (sense and antisense). Tested sequences were compared with the prototype HPV16 sequence (GenBank K02718) using the BLAST 2.0 database and the ChromasPro 1.5 program.
Identification of the HPV16 E6 gene variant was carried out in the framework of the previously described studies (Szostek et al., 2017).
The complete E1 and E2 ORF nucleotide sequences from 22 patients were compared with the HPV16 prototype sequence (GenBank K0.2718). The analysis was carried out according to the advancement of changes and sequence affiliation to the prototype (n=10) and the 350G variant (n=12) of the HPV16 E6 gene.
A statistical analysis was done using the STATISTICA 13.3 software package. We used the Wald-Wolfowitz test to calculate the statistical significance included in this publication. p Value≤0.05 was considered as statistically significant.

RESULTS AND DISCUSSION
The E1 gene sequence polymorphisms were found in six samples and in the E2 gene in seven samples.
The analysis of the E1 gene polymorphism is presented in Table 1. Any changes in the gene sequence when compared to the prototype sequence were found in 6/22 samples (27%) including 4/11 LSIL and 2/11 SCC (36% and 18%, respectively, p=0.016). Three mutations were detected in one SCC sample, two mutations were found in one LSIL sample, and a single mutation was found in the remaining four samples (3 LSIL, 1 SCC). The sequence analysis showed substitutions at 7 nucleotides located between positions 1053 and 2456 (A1053C, G1189T, G1222A, C1225T, G1345C, A1892C, G2034T) leading to an amino acid change (E63D, R109I, R120K, E121D, S161T, Q343H, A391S), while another nucleotide substitution (C2456T) was silent. Furthermore, for the first time, 4 previously unknown missense mutations (G1189T, G1345C, A1892C, and G2034T) were ob-served. The A1053C mutation was found in two LSIL samples, the C1225T, A1892C and G2034T mutations in single LSIL samples, while the G1189T, G1222A and G1345C were found in SCC samples. Only in two cases amino acid substitutions were non-conservative (R109I, Q343H), while other substitutions were conservative. Using the UniProt database, it was found that two of the above missense mutations were related to specific E1 protein domains: G1189T -the nuclear export signal, A1892C -the DNA binding region.
It was also found that the difference in the number of nucleotide changes in the E2 gene between the E6 350T (13) and 350G (29) variants is statistically significant (p=0.03).
There were no mutations in the transactivation domain in the Eur-p variant, while in the Eur-350G variant they were dominant. In 3/4 of the samples from SCC, the nucleotide changes were only in the hinge region (Table 2).
HPV16 is the most frequently detected oncogenic type of HPV among women in Poland and worldwide. In this study, we examined the polymorphisms of the E1 and E2 genes of this virus found in women living in southern Poland. The E1 gene sequences were analyzed first. It is one of the key proteins in the process of replication of a viral genome -it has an ATP helicase activity (McBride & Warburton, 2017). The influence of the E1 protein mutations on oncogenesis has been studied by several authors and their results clearly show a correlation of some mutations of this protein with the occurrence of HSIL and cervical cancers (Sabol et al., 2012;Tsakogiannis et al., 2014;Yao et al., 2019). The relationship with the process of cervical oncogenesis was found in the case of following mutations: T933A, T1014G, A1668G, G2073A, G2160A, T2169C, T2189C, T2232C, G2337A, A2453T, C2454T, A2547G, A2587T and G2650A. None of the above mutations were found in samples tested in this study. However, in two LSIL samples the A1053C mutation was found, previously de- Capital letters indicate variants with an amino acid change, while small letters indicate the silent mutation; Symbol "*"means a mutation not previously described. LSIL -low-grade squamous intraepithelial lesions; SCC -squamous cervical cancer; aa -amino acid; NES -nuclear export signal; DBR -DNA binding region; SF3 H -SF3 helicase. Eur-p means standard nucleotide (thymine) at position 350 of the E6 sequence; Eur-350G means guanine at this position. Designations in the "Class/subclass" section indicate nucleotides at the given positions of the E6 protein. PCR was performed 3 times in each sample, and sequencing was done in two directions by a reputable company.
No. of mutations 1 1 1 3 1 2 K. Sitarz and others No. of mutations 13 5 9 12 1 1 1 E1 and E2 HPV16 gene variation and cervical lesions scribed in the literature as frequent (Sabol et al., 2012) but with no information about its relationship to oncogenesis. In one of the SCC samples, 3 nucleotide changes were observed, which resulted in a change in amino acid sequence in the protein, including one of which was non-conservative and considered important for the protein function. Next, we analyzed the E2 gene. Its most known function is to control the level of E6 and E7 oncoproteins (Nishimura et al., 2000), and moreover the E2 gene also participates in the process of viral genome replication and post-transcriptional processing (Schwartz et al., 2013). For the E2 gene, mutations have been described that are directly associated with a pro-oncogenic change in the protein function, such as the lack of control over E6 and E7 levels (Kahla et al., 2014); mutations that have an influence on the negative prognosis when using radiation therapy are also described (Kahla et al., 2018). For the E2 gene, we found the C3409T mutation in three SCC samples. Importantly, this mutation was not found in any of the LSIL samples. This mutation was also described by other authors (Casas et al., 1999;Swan et al., 2005;Tsakogiannis et al., 2012). It is a nucleotide change associated with the E6 European variant 350G, as well as with other types outside Europe (Casas et al., 1999;Swan et al., 2005). There are also reports that this polymorphism is associated with a higher risk of developing high-grade dysplasia or neoplasia (Graham & Herrington, 2000). Importantly, the C3409T mutation is located within the hinge region. The literature reports that deletions in this region, particularly at nucleotides 3243 to 3539, are associated with cervical cancer (Arias- Pulido et al., 2006). In our samples, mutations in this range were found in four SCC samples and one LSIL sample (p=0.049). We therefore conclude that both, deletions and mutations in this region may affect transformation of the cervical epithelial cells. The nucleotide changes in the E2 gene that herein are described for the first time, were detected in individual samples. It is a good starting point for a search of identified mutations in a subsequent study that would lead to establishing their relationship with oncogenesis.