Identification of significant prognostic risk markers for pancreatic ductal adenocarcinoma: a bioinformatic analysis

Objective: This study aimed to identify novel prognostic biomarkers of pancreatic ductal adenocarcinoma (PDAC) using bioinformatics analyzes. Methods: Clinical information, microRNAs (miRNAs), and genes expression profile data from PDAC cases were down-loaded from the Cancer Genome Atlas (TCGA) data-base. The potential prognostic risk miRNAs and genes were screened using the Elastic Net Cox proportional risk regression hazards (EN-COX) model. The receiver operating characteristic (ROC) curve and the Kaplan-Meier (KM) curve were used to identify miRNAs and genes of significant prognostic risk. Furthermore, significant prognostic risk miRNAs were functional enrichment analyses based on their target genes. Furthermore, the survival analyzes of the hub genes were validated through OncoLnc. Results: Complete clinical records and expression data of 797 miRNAs and 19969 genes from 137 PDAC cases were obtained, of which 59 potential prognostic risk factors, including 54 genes and 5 miRNAs, were selected by EN-COX analyzes. A total of 17 significant prognostic risk markers were identified (all P <0.05), including 16 genes and 1 miRNA (miRNA-125a). The miRNA-125a target genes were found in the MiRWalk database and the function enrichment analyzes were performed in the the DA-VID website. Furthermore, according to data from the Oncomine and Human Protein Atlas (HPA) databases, the mRNA and protein level of frizzled class receptor 8 (FZD8) were overexpressed in pancreatic cancer tissues compared to the corresponding noncancer normal tissues ( P <0.001). However, both glutathione S-transferase mu 4 (GSTM4) and inducible T cell costim-ulator ligand (ICOSLG) were negatively regulated in tissues of pancreatic cancer tissues ( P <0.001). Finally, survival analysis was used to validate these factors by the OncoLnc database, and the results revealed that overexpression of ICOSLG was associated with a better prognosis ( P =0.025). Conclusions: This study showed that the expression levels of FZD8, GSTM4 and ICOSLG were significantly different between PDAC and non-tumor tissues, especially ICOSLG, which could be a prognostic indicator and therapeutic target for PDAC.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC), which represents more than 90% of pancreatic cancer and is the most common pathological type of pancreatic adenocarcinoma (PAAD), is one of the most concerned malignancies due to poor diagnosis and limited treatment (Parrasia et al., 2021). According to the statistics of America, 5700 cases were newly diagnosed with pancreatic cancer in 2020 and 447050 cases died from it (Siegel et al., 2020). Lacking obvious early symptom and specific tumor screening, PDAC is not diagnosed until the disease is at an advanced stage presenting local nerve and vascular invasion and metastasis to distant sites (Yang et al., 2021). In most patients, PAAD is not resectable after diagnosis. According to the National Institutes of Health statistics (NIH), the 5-year survival rate for patients with PAAD is only 10.8% based on cases of PAAD-associated mortality between 2011 and 2017 (https://seer.cancer.gov/statfacts/ html/pancreas.html). Therefore, the exploration of effective prognostic markers may provide new insights into the predictive outcome of PDAC.
It is well known that most malignancies are generated by random accumulation of massive genetic and epigenetic aberrations, and the prognostic and therapeutic implications associated with these aberrations are becoming more and more crucial (Hatziapostolou and Iliopoulos, 2011). Many previous studies have confirmed the involvement of abnormal gene expression in PDAC, and alteration of these gene expressions can act as indicators for diagnosis, treatment, and prognosis evaluation . For example, the mutation of the viral oncogene homolog gene of Kirsten rat sarcoma (KRAS) occurred in more than 90% of cases of PDAC (Jonckheere et al., 2017). Some studies revealed that the KRAS gene mutation was significantly associated with tumor stage, liver metastasis, and median survival time, and therefore it may be used for the detection of PDAC (Waters & Der, 2018, Fan et al., 2018. Pathogenic mutations in the breast cancer susceptibility gene (BRCA) were identified in 4.6% of a large cohort of clinic patients (Holter et al., 2015). Furthermore, mothers against decapentaplegic homolog 4 (SMAD4) deficiency also accelerated the development of PDAC, which works to block the progression of KRAS-initiated neoplasms (Yokose et al., 2020). According to a research, 78 % of pancreatic tumors have abnormalities at the G1/S checkpoint of the cell cycle (TP53, CNKN2A, TP53BP2 mutations) (Bailey et al., 2016). Chung et al. found an obvious correlation between the extracellular high mobility group 1 (HMGB1) and the tumor stage and prognosis of PDAC, therefore HMGB1 may serve as a marker in the diagnosis and evaluation of the prognosis (Chung et al., 2012). Although several prognostic risk factors have been identified, sensitivity and specificity are not satisfactory. It is still urgent to identify the best prognostic biomarkers of PDAC.
Recently, on the basis of the analysis of large integrated data and bioinformatics, key genes related to tumor development and prognosis could be identified with the widespread popularity of gene chips and the rapid development of high-throughput sequencing technology.
Potential key genes related to the pathogenesis of PDAC were selected by bioinformatics meta-analysis based on the Cancer Genome Atlas (TCGA) database (Ma et al., 2019). In addition, there are more and more online tools favored by researchers because of the powerful function, like easy-to-use, no need to install, to analyze the specific disease with someone else's official server.
This study aimed to investigate and identify significant prognostic risk markers for the diagnosis and treatment of PDAC by bioinformatic analyzes.

Data sources and preprocessing
In the present study, clinical data from 197 PAAD cases were downloaded from the TCGA database (https://cancergenome.nih.gov), which included clinical characteristics, genomic characterization, and highthroughput sequence data from cancer patients. Of these cases, 150 were PDAC and the rest were other pathological types. Among the 150 PDAC cases, 2 cases were excluded for the lack of accurate data of overall survival time. Finally, there were 148 PDAC cases that contained clinical data and survival time data. In addition, 183 cases of pancreatic cancer with microR-NAs (miRNA) and gene expression profile data were downloaded. After data integration, there were a total of 137 cases that contained both clinical data and genes expression profile data.
The expression values of miRNAs and genes were estimated by reads per kilobase of exon per million mapped reads (RPKM) (Wagner et al., 2012). Data from different samples were normalized by median.

Identification of potential prognostic risk miRNAs and genes
Elastic net (EN) is an ideal variable selection method that is used to deal with collinearity and effectively reduce dimension. Cox proportional regression hazards analysis is a semiparametric model designed to analyze survival data. Elastic Net Cox's proportional risk regression hazards (EN-COX) presents the advantages of the above method (Wu, 2012). In this study, EN-COX was performed to screen potential prognostic risk genes and miRNAs using the glmnet package in R studio (version: R 3.63.) (Engebretsen and Bohlin, 2019). The parameter was the minimum of λ.

Identification of prognostic risk markers
The receiver operating characteristic (ROC) curve was used to evaluate classifiers in bioinformatical ap-plications. The cases were divided into two groups according to the cut-off values. The cases with miRNAs or genes expression values below the cut-off value were ranked as low-expression groups, while the cases with expression values above the cut-off value were ranked as high-expression groups. Based on the data obtained from potential prognostic risk genes, miR-NAs and the corresponding expression profile data, the ROC curves were generated using the pROC package in R studio to determine the cut-off values for grouping the cases (Robin et al., 2011).
The Kaplan-Meier (KM) curve was used for univariate analysis of survival data. In this study, the KM curves were generated using the survival package in R studio (Therneau, 2012). Later, the log-rank test was used to determine whether there were significant differences between two groups (P<0.05 was considered a significant difference).

Function prediction of prognostic risk miRNAs
The target genes for the prognostic risk miRNAs obtained above were searched in the MiRWalk portal database (Sticht et al., 2018). The regulatory networks were then constructed between selected prognostic risk miRNAs and their target genes using Cytoscape (http://www.cytoscape.org/), which is an open source software used to visualize biological networks and integrate data (Doncheva et al., 2019). After that, the target genes mentioned above were performed functional enrichment analyses in DAVID Bioinformatics Resources database (Huang et al., 2009). The results were plotted as bubble diagrams using the ggplot2 package in R Studio. P<0.05 was set as the threshold.

Expression of prognostic risk markers in human pancreatic adenocarcinoma
Prognostic risk markers were assessed at the RNA level of human pancreatic adenocarcinoma samples by comparing with paracancer tissues through the Oncomine database (www.oncomine.org) (Rhodes et al., 2004), and their protein expression was detected in the HPA database (www.proteinatlas.org) (Uhlén et al., 2015). P<0.05 was considered to be significant difference.

Survival analyses of the prognostic risk markers
OncoLnc is a tool to interactively explore survival correlations and download clinical data coupled to expression data for mRNAs, miRNAs, or long noncoding RNAs (lncRNAs). OncoLnc contains survival data for 8,647 patients from 21 cancer studies performed by the TCGA database, together with RNA-SEQ expression for mRNAs and miRNAs from TCGA, and lncRNA expression from MiTranscriptome (Anaya, 2016). By means of OncoLnc, survival difference of the prognostic risk markers was validated once more.

Data preprocessing
After normalizing the original TCGA data, the expression values of 797 miRNAs and 19969 genes were obtained in 137 PDAC cases.

Identification of potential prognostic risk genes and miRNAs
According to the parameter of λ=0.107, 59 potential prognostic risk factors, including 5 miRNAs and 54 genes, were screened through EN-COX.

Identification of prognostic risk markers
According to the threshold of P<0.05 of the KM curve and AUC>0.6 of the ROC curves, 17 prognostic risk markers including 16 genes and 1 miRNA were considered significantly correlated with the prognostic risk of PDAC patients (Table 1).

The mRNA and protein levels of FZD8, GSTM4 and ICOSLG
In view of the Badea pancreas data set in the Oncomine database, the mRNA level of frizzled class receptor 8 (FZD8) was significantly higher in human PDAC specimens compared to paracancer specimens (P<0.001), and the para-cancer specimens (P<0.001), and inducible T-cell costimulator ligand (ICOSLG) was significantly down-regulated in pancreatic cancer tissues (P<0.001). In view of the logsdon pancreas data set, glutathione S-transferase mu 4 (GSTM4) revealed significantly lower expression in tumor tissues compared to the corresponding non-tumor tissues in mRNA level of mRNA (P<0.001).  The yellow rectangle nodes represented miRNA-125a-3p and miR-NA-125a-5p, and the circular nodes represented the target genes, while the green circular nodes represented the prognostic risk genes identified in this study.
Moreover, the protein expressions of FZD8, GSTM4 and ICOSLG demonstrated similar results in the immunohistochemical images from HPA database (Fig. 4).

Survival analyzes of prognostic risk markers
With the aid of the OncoLnc web server, the overexpression of ICOSLG was obviously associated with a better prognosis (P=0.0252). However, the expression of FZD8 and GSTM4 was significantly not related to the survival time of PDAC (P=0.0652 and 0.573, respectively) (Fig. 5).

DISCUSSION
As one of the most lethal cancers worldwide, the 5-year survival rate of PDAC patients is only 10.8% in the US. Due to poor diagnosis and high malignancy, the outcome and prognosis of PDAC still remain poor in recent decades. PDAC is usually diagnosed in an advanced stage with common symptoms including jaundice, pain, and weight loss. Most patients with PDAC cannot have surgery when diagnosed. As a consequence, it is neces-sary to explore new biomarkers to predict the outcome of PDAC patients. In this study, 17 prognostic risk markers were found, including 1 miRNA and 16 genes. The expressions of them were evaluated in pancreatic cancer at both the RNA and protein levels, respectively. FZD8, GSTM4, and ICOSLG revealed significant differential expression in tumor tissues compared to the corresponding non-tumor tissues. Moreover, ICOSLG was obviously associated with the prognosis of PDAC in the OncoLnc database.
FZD8 is one of the Frizzled receptors that belong to the Wnt ligand family. FZD8 activates the canonical Wnt/β-catenin signaling pathway that plays a vital role in the development and progression of multiple carcinomas. Li and others (Li et al., 2017) reported that FZD8 was robustly up-regulated in bone-metastastic prostate cancer cell lines and tissues, and a high expression level of FZD8 was significantly and positively associated with progression and bone metastasis. Chen's findings showed that FZD8 promotes gastric cancer invasion and metastasis via the β-catenin pathway (Chen et al., 2020). Wang and others (Wang et al., 2012) suggested that FZD8 may be used as a potential therapeutic target in The color represented P Value, and a range from red to blue indicated P Value (lowest to highest, respectively) The size of bubble showed genes number involved in BP, CC, MF and KEGG pathway. If the items that meet the criteria were more than 10, only the top 10 were displayed.
lung carcinoma due to its overexpressed expression in human lung cancer tissue and cell lines and suppression in cancer cell proliferation. This in turn reduced the Wnt signaling activity and enhanced the sensibility of lung cancer to chemotherapeutics. Additionally, some studies revealed a higher expression of FZD8 in breast cancer and colorectal cancer compared to their corresponding adjacent tissues (Jiang et al., 2015;Xu et al., 2016). It is well known that K-ras mutation occurs in 90% of pancreatic cancer cases. Wang et al. revealed that FZD8 was inhibited in K-ras mutant pancreatic cells. This further suppressed the non-canonical Wnt/Ca 2+ signaling, contributing strongly to its tumorigenic properties. Restoration of FZD8 expression in K-ras mutant pancreatic cells demonstrated a decrease in malignant transformation (Wang et al., 2015). Our study found that FZD8 was upregulated in pancreatic cancer and was associated with a better prognosis, which was consistent with most previous studies, implying that FZD8 mainly activated Wnt/β-catenin pathway in pancreatic cancer.
GSTM4 is a member of μ subfamily in Glutathione S-transferases (GSTs) family. GSTs are a group of detoxifying enzymes that catalyze glutathione in conjunction with oncogenes, drugs, toxic substances, and products of oxidative stress, with the aim of reducing their toxic combination with cell components. GSTs were divided into eight subfamilies which include α (GSTAs), κ (GSTKs), μ (GSTMs), ω (GSTOs), π (GSTPs), θ (GSTTs), ζ (GSTZs) and membrane-bound GSTs (MGSTs). It was reported that GSTM4 was implicated in tumorigenesis and resistant to chemotherapy. Zhuo et al. showed that GSTM4 was overexpressed in Ewing's sarcoma and promoted tumor development and chemoresistance by inhibiting cell apoptosis (Zhuo et al., 2014). Barros et al. discovered a higher level of GSTM4 expression in human MCF-1 breast cancer cells, which helped to maintain the reduction status of cytochrome C and suppressed cell apoptosis, resulting in the chemoresistance of MCF-1 cells (Barros et al., 2013).
However, a few studies demonstrated that downregulation of GSTM4 in breast cancer cell lines was related to their chemotherapeutic resistance (Watson et al., 2007). Furthermore, compared to well differentiated laryngeal cancer tissues, poor laryngeal cancer differentiation demonstrated reduced GSTM4 expression (Sedat et al., 2010). Unfortunately, there have been no reports in the literature on GSTM4 expression in pancreatic adenocarcinoma yet. The color represented P Value, and a range from red to blue indicated P Value (lowest to highest, respectively) The size of bubble showed genes number involved in BP, CC, MF and KEGG pathway. If the items that meet the criteria were more than 10, only the top 10 were displayed.
ICOSLG encodes inducible T-cell co-stimulator ligand (ICOSLG). This belongs to the B7 family and is identified as a new kind of molecule on the cell surface. It is usually expressed in antigen-presenting cells such as B lymphocytes, dendritic cells, and macrophagocytes. ICOSLG is the only ligand of the inducible co-stimulator (ICOS) that belonged to the CD28 superfamily. Physiologically, ICOS is highly expressed in activated T lymphocytes and regulatory T cells, and ICOS: The ICOSLG interaction promoted the activation and proliferation of T cells and the secretion of cytokines. Recent studies revealed that the ICOS/ICOSLG signaling pathway was involved in inflammatory responses, autoimmune diseases, and cancers (Merrill et al., 2013). Many studies reported that ICOSLG was highly expressed in several tumors such as breast cancer, melanoma, and acute myeloid leukemia. Its high expression was correlated with progression, immune escape, chemoresistance, and poor prognosis of cancers (Nam et al., 2015;Martin-Orozco et al., 2010). Scott et al. found that ICOSLG transcription was reduced in myeloma, leading to inhibition of its antitumor immunity (Scott et al., 2015). In this study, we found that ICOSLG is downregulated in PDAC at both RNA and protein levels, and high expression of ICOSLG was significantly associated with a better prognosis. Nevertheless, there is still a lack of other research on the functions of ICOSLG in pancreatic cancer.
When validated in the oncomine and HPA databases, miRNA-125a expression was not significantly different in PDAC or there was no information on the gene.
Bioinformatics is the field of science developed by the combination of biology and information technology. It is the computational techniques used for solving biological problems. Data problems such as representation (graphics), storage and retrieval (databases), analysis (statistics, artificial intelligence, optimization, etc.) and biology problems such as sequence analysis, structure or function prediction, data mining, etc. Along with the rapid development of the field, there are some issues proposed (Sethi & Behera, 2016). First, the bioinformation algorithm is too far from maturity: for example, whole exome sequencing combined with diseases is used for disease diagnosis or prediction, and the false positive rate is too high. Algorithms are messy, bioinformatics researchers disagree, and there are more than a dozen types of sequencing software alone. There is no unified standard for difference analysis, and the data forms in GEO chips are all different, so the results cannot represent accurately predicted phenotypes. Second, there are too many uncontrollable factors: from sampling to bioinformatic operation, each step in the process may result in different results if another person operates it. There are even dozens of specifications of sequencing instruments, without unified standards. Since this was a purely bioinformatic study that lacked clinical samples and data to verify our results, our study has certain limitations. Therefore, we will collect clinical data and conduct laboratory experiments in vivo and in vitro to testify to this result. Although their differential expressions were confirmed in the Oncomine and HPA databases, the specific mechanisms and pathways of FZD8, GSTM4 and ICOSLG in pancreatic cancer involved in tumor prognosis should be explored.

CONCLUSION
This study discovered that FZD8, GSTM4, and ICOSLG expression levels were significantly associated with PDAC by using bioinformatic analyses. Moreover, according to the data from the TCGA dataset, these 3 genes were related to the survival of PDAC patients. Furthermore, it revealed that ICOSLG was obviously associated with the prognosis of PDAC in the OncoLnc database. Therefore, ICOSLG was probably used as a prognostic indicator for PDAC.

Conflict of interest
The authors have no potential conflicts of interest in this work.