Development of a 12-biomarkers-based prognostic model for pancreatic cancer using multi-omics integrated analysis

Pancreatic cancer is one of the most malignant tumors of the digestive system, with insidious, rapid onset and high mortality. The 5-year survival rate is only 10%. Therefore, in-depth exploration of the potential mechanism affecting the prognosis of pancreatic cancer, and search for biomarkers that can effectively predict the prognosis of pancreatic cancer are of practical clinical importance. The mRNA sequencing data, miRNA sequencing data, methylation data and SNP data of pancreatic cancer patients available in The Cancer Genome Atlas (TCGA) were used for analysis to identify biomarkers that significantly affect the prognosis for the patients. Finally, a prognostic prediction model was developed using principal component analysis (PCA) method. The genes that significantly affected the prognosis of pancreatic cancer were as follows: 5 DmiRNAs (hsa-mir-1179, hsa-mir-1224, hsa-mir-1251, hsa-mir-129-1 and hsa-mir-129-2), 6 DmRNAs&DMs&MethyCor database entries (MAPK8IP2, CPE, DPP6, MSI1, IL20RB and S100A2), and FMN2 gene from differential expressed mRNAs and differential single-nucleotide polymorphism (DmRNAs&DSNPs) database. Prognostic index (PI)=∑iwi xi – 0.717716. A patient was predicted as high/low risk if the PI was larger/smaller than 0.034045. Our study resulted in a comprehensive prognostic model for pancreatic cancer patients based on multi-omics analysis, which could offer better guidance for the clinical management of patients with early-stage pancreatic cancer.


INTRODUCTION
Pancreatic cancer is one of the most malignant digestive system tumors with insidious and rapid onset and high mortality. Due to the lack of specific clinical manifestations and clinical biomarkers at the early stage, most pancreatic cancer patients are diagnosed at an advanced stage, accompanied by distal metastasis and extensive local invasion, and the 5-year survival rate is only 10% (Siegel & Miller, 2018). Therefore, indepth exploration of the potential disease mechanisms and identification of biomarkers that can effectively predict the prognosis of pancreatic cancer is of practical clinical significance.
Numerous research proved that the biological behavior of pancreatic cancer is complex and involves various mechanisms, such as epigenetics, mutations, non-coding RNA regulation etc. Increasing evidence shows that methylation plays an important role in the tumorigenesis of pancreatic cancer. Arginine methylation of MDH1 inhibits the malignant progression of pancreatic cancer by inhibiting glutamate metabolism (Wang et al., 2016). Moreover, one study reported that the MLL1-H3K4me3 pathway mediates pancreatic cancer immune evasion by regulating PD-L1 expression (Minassian et al., 2017). It is wellknown that mutations of KRAS and P53 play an important role in the tumorigenesis of pancreatic cancer. In addition, mutation of PRSS1 was found to be involved in the malignancy of pancreatic cancer (Yi et al., 2016). The crucial role of miRNA in the tumorigenesis of pancreatic cancer has been gradually discovered, as a result of the continuous development of sequencing technology. The mir-135b-5p promotes migration and invasion of pancreatic cancer cells by targeting NR3C2 , and mir-138-5p inhibits autophagy of pancreatic cancer cells by regulating SIRT1 .
Therefore, it is a relatively limited approach to search for biomarkers that affect the prognosis of pancreatic cancer based on single-omics. In this study, multiple sets of data (mRNA, miRNA, methylation and single-nucleotide polymorphism (SNP)) for pancreatic cancer available in The Cancer Genome Atlas (TCGA) were downloaded for integrated analysis to identify more solid biomarkers.

Patients' datasets.
Center for Cancer Genomics (CCG) collected tumor tissues and normal tissues (frozen or paraffin-embedded) from patients who choose to participate. Raw data were obtained by high-throughput sequencing and normalized, with the following analysis of molecular and pathology data. Multi-omics data of pancreatic cancer were downloaded from TCGA database (https://www.cancer.gov/) and included 183 cases of mRNA data (RNASeq V2), 182 cases of miRNA data (IlluminaHiSeq miRNAseq), 195 cases of Illumina Human Methylation450 BeadArray, and 170 cases of SNP data. Samples were selected for the study according to the criteria: 1) pancreatic cancer patients were pathologically diagnosed, 2) patients records contained all four kinds of data and 3) the follow-up data were complete. Finally, 150 patients were selected for our study. All data analyzed in this study was obtained from pancreatic adenocarcinoma tissues. The median of 150 patients' overall We also analyzed the correlation between the survival time and additional radiation, pharmaceutical therapy and histological type using the chi-square test.
To further illustrate the underlying mechanism affecting the prognosis of pancreatic cancer patients we compared the multi-omics differences between the two cohorts. Data processing. Differential mRNA and miRNA expression was analyzed using DESeq2 (Anders & Huber, 2010) function (P<0.05, |logFC|>1). Each probe value for the methylation data was expressed as the β value (β=U/(M+U+1)), where M is the methylated probe signal strength and U is the unmethylated probe signal value. The limma package (Ritchie et al. 2015) was used for differential expression analysis (P<0.05, |logFC|>1), and MethylMix package (Gevaert 2015) was used to analyze the correlation between gene methylation level and mRNA expression value (Pearson correlation coefficient test, R>0.5, P<0.05). Chi-square test (P<0.05) was used to test the significance of differences in the frequency of gene mutation between the L/S cohorts.
Protein-protein interaction network construction. The mRNA interaction data came from the STRING database (https://string-db.org/; Szklarczyk et al., 2019) and the mRNA-miRNA interaction data was derived from DIANA tools (Reczko et al., 2012;Paraskevopoulou et al., 2013). This interaction information was imported into Cytoscape software (Shannon et al., 2003) to build an integrated interaction network.
Kaplan-Meier survival analysis and prognostic model construction. Kaplan-Meier survival analysis was performed for all significantly differentially expressed mRNA/miRNA to further screen for biomarkers which significantly correlated with OS of pancreatic cancer patients. Principal component analysis (PCA) method was used to construct the prediction model based on 12 genes, and the receiver operating characteristic (ROC) curve and area under the curve (AUC) were used to test the performance of the classifier.

Differential expression analysis of mRNA and miRNA
Patients, 150, were divided into L/S cohorts according to the median survival time which was 15.85 months. DESeq2 package was used to identify the differentially expressed mRNA and miRNA between L and S cohort (P<0.05, |logFC|>1). 1,255 significantly differentially expressed mRNAs (DEmRNAs) and 33 miRNAs (DEmiRNAs) were found (Fig. 1).

Methylation analysis
The limma package was used to identify significantly differentially methylated genes in the methylation data, and the results revealed that there were 9,601 significantly differentially methylated genes (DMethGs) be- tween L and S cohort (P<0.05, |logFC|>1) ( Fig. 2A). Further, MethylMix package was used to analyze the relationship between methylation data and mRNA expression data, and revealed that the methylation degree of 551 genes was significantly correlated with their mRNA expression level (MethyCor) (P<0.05, |r| > 0.5; Fig. S1). Based on the intersection of DEmR-NAs, DMethGs and MethyCor datasets, we identified 15 common genes ( Fig. 2B) (Table 1). Therefore, we could infer that the significantly differential mRNA expression of these 15 genes is caused by their significantly differential methylation.

SNP Analysis
For SNP data, firstly, we calculated the mutation rate of all genes in the SNP dataset. The 20 highest mutation-frequency genes, including KRAS and TP53, were shown in Fig. 3A. Then, 31 genes (DSNPs) were chosen by comparing the mutation frequency between L and S cohort (Chi-square test, P<0.05). Combining the results from DmRNAs and DSNPs datasets revealed 4 genes (SLC8A3, C6orf118, RIMS2 and FMN2) not only with a significant difference in mutation frequency but significantly differentially expressed at mRNA level (Table 2).
We used principal component analysis to build a prognostic model based on the 12 genes. The weight coefficient of each gene is shown in Table 3. The prognostic formula was as follows: prognostic index (PI)=∑ i w i x i -0.717716 (w=weight, x=gene expression  High AUC for the prognostic model (AUC=0.683) and low P-value with the high hazard ratio (HR) in fitted Cox proportional hazards model (P=0.035) suggest that this can be a better prognostic model for pancreatic cancer patients (Fig. 5).

Functional enrichment analysis
The target genes of 5 miRNAs that significantly affected the prognosis of pancreatic cancer patients were used for querying DIANA tools (Reczko et al., 2012;Paraskevopoulou et al., 2013). 1,437 target genes were identified with combined score>0.7 as the criterion, and 46 of them were DmRNAs. Then, the interaction rela-  tionship of these 46 target genes was obtained using the STRING database (Szklarczyk et al., 2019). The results from DIANA and STRING were integrated to construct the PPI network graph using Cytoscape software (Shannon et al., 2003), providing a better visual understanding of the relationship between these miRs and target genes (Fig. 6A).
To further understand the function of the 12 biomarkers, GO function and KEGG pathway enrichment analysis was performed on 53 genes (46 target genes and 6 DmRNAs & DMs & MethyCor dataset entries and 1 entry from DmRNAs & DSNPs dataset) using DAVID website. The enrichment results were as follow: biologi-cal process (BP) contained ion transmembrane transport and excitatory postsynaptic potential; cellular component (CC) contained plasma membrane and integral component of membrane; molecular function (MF) contained mRNA binding and extracellular-glutamate-gated ion channel activity. The most significantly enriched KEGG pathways were Retrograde (Fig. 6B) endocannabinoid signaling and Serotonergic synapse.

Patients' treatment
150 patients selected for our study included 145 pancreas adenocarcinoma ductal and 5 pancreas colloid carcinoma. Patients had undergone additional radiation, pharmaceutical therapy or not, and the survival time of the patients was not correlated to the treatment (Table 4).

DISCUSSION
Pancreatic cancer is one of the most malignant tumors that seriously threaten human health. Despite the continuous medical progress, and the invention of new treatment methods and agents, the 5-year survival rate of pancreatic cancer patients is still inferior. Therefore, there is an urgent need to find new biomarkers that can effectively predict the prognosis and overall survival for pancreatic cancer patients.
Previous studies identified many genes involved in pancreatic cancer tumorigenesis through various mechanisms and significantly affecting the prognosis of pancreatic cancer. ZNF281 promotes growth and invasion of pancreatic cancer cells by activating Wnt/β-catenin signaling (Qian et al., 2017). SNX6 predicts poor prognosis and contributes to the metastasis of pancreatic cancer cells via activating epithelial-mesenchymal transition . It was reported that methylation of SULT1E1, IGF2BP3 and MAP4K4 is significantly correlated with prognosis of pancreatic cancer patients (Chen et al., 2019). Hideyuki Hayashi and others (Hayashi et al., 2017) found that the number of mutations in KRAS, CDKN2A, TP53, and SMAD4 can be used to predict the prognosis of pancreatic cancer (Hayashi et al., 2017). Furthermore, their study revealed that miRNA is also significantly correlated with the prognosis of pancreatic cancer (Shi et al., 2018).
However, all the above traditional studies based on single omics failed to substantially benefit and significantly improve OS for pancreatic cancer patients. Therefore, our study was different from the traditional approach, being an integrated multi-omics analysis and re-  sulted in identifying 12 effective biomarkers and building a prognostic model based on these 12 biomarkers. Mitogen-Activated Protein Kinase 8 Interacting Protein 2 (MAPK8IP2), also known as JNK-Interacting Protein 2 (JIP2), is thought to be involved in the regulation of the c-Jun amino-terminal kinase (JNK) signaling pathway (Yasuda et al., 1999). JNK signaling pathway plays an important role in tumorigenesis (Ma et al., 2017).
Liu and others (Liu et al., 2014) reported that Carboxypeptidase E (CPE), a member of the M14 family of metallocarboxypeptidases, can promote tumor proliferation in pancreatic cancer cells and mouse models (Liu et al., 2014), and the mRNA expression value of CPE in pancreatic cancer patients was higher than that in adjacent tissues according to TCGA database. Interestingly, our study found that higher CPE level predicts longer survival time of pancreatic cancer patients. Therefore, we speculate that the function of CPE in pancreatic cancer patients is not only pro-cancer or anti-cancer, and its complex mechanism of action needs further study to be fully elucidated. Classically, Dipeptidyl Peptidase Like 6 (DPP6) is described as a membrane protein thought to be involved in voltage-gated potassium channels function. It was found that DPP6 also participates in malignancy of esophageal adenocarcinoma (Xi & Zhang, 2017). MSI1 (Musashi RNA Binding Protein 1) was long used as a marker of stem cells (Okano et al., 2002), and was reported to be involved in tumorigenesis of several cancers (Kharas & Lengner, 2017). The function of Interleukin 20 Receptor Subunit Beta (IL20RB) is to form the heterodimeric receptor for Interleukin 20 (IL20) with Interleukin 20 Receptor Subunit Alpha (IL20RA). Besides, it was reported to be involved in the JAK-STAT pathway (Blumberg et al., 2001), which plays an important role in various tumors (Waldmann & Chen, 2017;Tiacci et al., 2018). Syed Haider et al. reported that IL20RB is correlated with shorter survival of pancreatic patients, which was highly consistent with our findings, indicating that IL20RB is an indicator of pancreatic cancer prognosis (Haider et al., 2014). S100 calcium-binding protein A2 (S100A2) is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. Many studies suggested that S100A2 plays an oncogenic role in pancreatic cancer (Ohuchida et al., 2007;Bachet et al., 2013;Ji et al., 2014). Furthermore, our study confirmed these earlier results and used it as an essential element in the prognostic model. Formin-2 (FMN2) is a member of the formin family, and it is mainly involved in the organization of actin cytoskeleton (Peng & Liou, 2012). Studies reported that FMN2 is involved in the malignancy of various tumors (Jin et al., 2017;Li et al., 2018).
The discovery of miRNA opened a brand-new area in the bioscience research. More and more functions of miRs have been discovered, especially in the various biological behaviors of tumors. miRNAs are detectable in biofluids including urine, cerebrospinal fluid and blood, which make them highly available and accurate biomarkers for the prediction of prognosis of cancer patients (Hayes et al., 2014). In this study, therefore, miRNAs were included in the exploration of biomarkers of pancreatic cancer. The results showed that 5 miRNAs (hsamir-1179, hsa-mir-1224, hsa-mir-1251, hsa-mir-129-1 and hsamir-129-2) significantly affected the prognosis of pancreatic cancer patients. The functional enrichment analysis showed that target genes were significantly enriched in ion transport and mRNA binding function.
With an increasing number of neoplasm research, ion transport was found to play an important role at every stage of the cancer disease. Moreover, ion transport is involved in every step of tumorigenesis, proliferation, invasion and metastasis (Djamgoz et al., 2014), and even plays an indispensable role in the process of destroying the cancer cells by the immune cells (Panyi et al., 2014).
Furthermore, we found that the target genes of these miRs are significantly enriched in mRNA binding, which means these target genes are heavily involved in each step of mRNA metabolism. mRNA metabolism is in turn vital for tumorigenesis.
In summary, our study found that gene expression patterns were significantly discriminated in pancreatic cancer patients with different survival time. We used multi-omics analysis to identify 7 mRNAs and 5 miRNAs as biomarkers, all of which were significantly correlated with the prognosis of pancreatic cancer patients. In addition, the previous studies confirmed that these biomarkers are involved in various malignant processes, such as tumorigenesis, proliferation or metastasis through different mechanisms, but their specific mechanism of action in pancreatic cancer still needs to be elucidated. Finally, the PCA method was used to construct the prognostic prediction model based on the 12 biomarkers. This model may provide guidance for the clinical management of patients with early-stage pancreatic cancer.