Interaction network of proteins associated with unfavorable prognosis in acute myeloid leukemia

Acute myeloid leukemia (AML) is a malignant disorder of hematopoietic stem and progenitor cells, characterized by accumulation of immature blasts in the bone marrow and peripheral blood of affected patients. Standard induction therapy leads to complete remission in approximately 50% to 75% of patients. In spite of favorable primary response rates, only 20% to 30% of patients enjoy longterm disease free survival. Identifying proteins involved in prognosis is important for proposing biomarkers that can aid in the clinical management of the disease. The aim of this study was to construct a protein-protein interaction (PPI) network based on serum proteins associated with unfavorable prognosis of AML, and analyze the biological pathways underlying molecular complexes in the network. We identified 16 candidate serum proteins associated with unfavorable prognosis (in terms of poor response to treatment, poor overall survival, short complete remission, and relapse) in AML via a search in the literature: IL2RA, FTL, HSP90AA1, D2HGDH, PLAU, COL18A1, FGF19, SPP1, FGA, PF4, NME1, TNF, ANGPT2, B2M, CD274, LGALS3. The PPI network was constructed with Cytoscape using association networks from String and BioGRID, and Gene Ontology enrichment analysis using the ClueGo pluggin was performed. The central protein in the network was found to be PTPN11 which is involved in modulating the RAS-ERK, PI3K-AKT and JAK-STAT pathways, as well as in hematopoiesis, and in the regulation of apoptotic genes. Therefore, a dysregulation of this protein and/or of the proteins connected to it in the network leads to the defective activation of these signaling pathways and to a reduction in apoptosis. Together, this could cause an increase in the frequency of leukemic cells and a resistance to apoptosis in response to treatment.


INTRODUCTION
Acute myeloid leukemia (AML) is a malignant disorder of hematopoietic stem and progenitor cells, characterized by accumulation of immature blasts in the bone marrow and peripheral blood of affected patients. Response to chemotherapy treatment in patients with AML is wideranging, and there are no adequate biomarkers to predict their clinical outcome (Bienz et al., 2005;Lazarevic et al., 2015;Slovak et al., 2014). Standard induction therapy, based on cytarabine and anthracycline, leads to complete remission in approximately 50% to 75% of patients, depending on prognostic factors, such as age or the presence of certain gene or chromosomal changes (Mroźek et al., 2012). In spite of favorable primary response rates, only approximately 20% to 30% of the patients enjoy long-term disease survival. This heterogeneity is related to acquired mutations, and deregulation in the expression of genes and non-coding RNA (miRNA) (Liao et al., 2017;Walker & Marcucci, 2012). It is clear that genetic studies are very valuable, but when isolated from a context in which thousands of proteins mediate cellular function, this information cannot be interpreted properly and without bias. Protein-protein interaction (PPI) networks seek to characterize this flow of information within the cell and the organism in order to understand the functional relevance of expressed proteins (Končarević et al., 2014). Analysis of PPI networks can help understand mechanisms involved in diseased states, and orient research strategies into biomarkers or therapeutic targets. Identifying proteins involved in response to treatment is important for proposing biomarkers that can aid in the clinical management of AML. The aim of this study was to construct a PPI network with key proteins identified in the literature as associated with chemotherapy resistance in AML, and analyze the biological pathways underlying molecular complexes in the network. This approach recognizes that many pathways are involved in the pathogenesis of AML, and thus a multi-marker strategy will almost certainly be necessary, as a single biomarker is unlikely to be sensitive and specific enough.

Seed proteins.
We systematically searched PubMed for proteomic studies that analyzed prognosis of AML patients, with the criteria that blood or serum was used as a biological sample (Acute Myeloid Leukemia AND prognosis AND serum OR blood AND protein OR proteomics). Based on these criteria, and after manual curation, we identified 16 candidate proteins.
Construction of a protein-protein interaction network. The PPI network was constructed using the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) web source (Szklarczyk et al., 2017) and the Biological General Repository for Interaction Datasets (BioGRID) database (Chatr-Aryamontri et al., 2017). The parameters of confidence for STRING were restricted in Vol. 67, No 4/2020 475-483 https://doi.org/10.18388/abp.2020_5094 order to reduce the amount of data while maintaining the most reliable interactions. The active prediction methods taken into account for STRING predictions were: experiments, co-expression, neighborhood and databases; a confidence score > 0.4 -medium confidence; 1st and 2nd shell no more than 20 interactors and no more than 5 interactors, respectively. We did not consider the direction of each protein interaction, and the duplicate edges and self-interactions were removed from the results. Topological analysis of the protein interaction network. The network analysis includes three fundamental parameters that allow for nodes in a network to be evaluated: Connectivity degree (k), betweenness centrality (BC) and closeness centrality (CC). The most basic characteristic of a node in a network is its degree (k), which represents the number of interactions (links) the node has to other nodes (Barabási & Oltvai, 2004). Nodes with a higher k value are called hubs and therefore are the principal agents in the interaction network, affecting the network´s function and stability (Patil et al., 2010). The BC value is an indicator of a node's centrality in the network. It is the fraction of the number of non-redundant shortest paths (SP) that pass through each node, which measures how often the node is located on the shortest path between other nodes. The SP refers to the path with the smallest number of links between the selected nodes in a network (Raman, 2010). Nodes with higher BC are called bottlenecks and indicate that a large number of SP in the network passes through them. The CC of a node is defined as the inverse of the average length of the SP to/from all the other nodes in the graph. The node with the highest CC value is usually the topological center of the network (Ran et al., 2013). In the present study, a network analyzer Cytoscape 3.6.1 (Shannon et al., 2003;Assenov et al., 2008) was used to compute the properties of the whole network. In order to classify the hub and bottleneck proteins, we divided all of the proteins into four categories, as proposed in the literature (Yu et al., 2007): (1) nonhub-nonbottlenecks (small k and low BC); (2) hub-nonbottlenecks (large k but low BC); (3) nonhub-bottlenecks (small k but high BC); and (4) hub-bottlenecks (large k and high BC).
Construction of the backbone network of the AML PPI network. In order to construct a backbone network, we selected proteins from the giant network within the top 10% BC values, excluding those not within the main network. Based on graph theory, the protein bottlenecks are nodes with SP, therefore these control communication among other nodes in the giant network (Yu et al., 2007). This information can give us an approximation about the shortest pathway from the giant network that could be activated in chemotherapy resistance in AML patients.
Construction of a subnetwork consisting of all the shortest paths between the seed proteins. In order to construct a subnetwork in which the 16 seed proteins are connected directly or indirectly with the minimum number of connections, we found the SP between seed proteins using the PesCa 3.0.8 plug-in for Cytoscape (Scardoni et al., 2015). The subnetwork was constructed using the SP that interconnects seed proteins with a size of less than 6 nodes, thus helping to determine the principal pathways and biological processes between seed proteins related to chemotherapy resistance in AML patients.

Giant network
The PPI network was constructed with 16 seed proteins associated with unfavorable prognosis in Acute Myeloid Leukemia, and was found to have 340 nodes connected by 2223 edges ( Fig. 1 and Table 2). Each of the nodes represents a protein, while the edges between nodes represent interactions between proteins. As can be seen in Fig. 1, there is one main network and two smaller ones, with D2HGDH and FTL as seed proteins that are not connected to the main network.
The proteins in the network were classified into four categories according to their k and BC values, as described in the Methods section. This analysis revealed that 184 nodes were nonhub-nonbottleneck (low k and low BC), 122 nodes were hub-nonbottlenecks (high k and low BC), 13 nodes were nonhub-bottlenecks (low k and high BC), and 55 nodes were hub-bottlenecks (high k and high BC), the latter being of the most interest as they are the most central and well-connected nodes in the network.
In order to identify the most central node in the network, we compared these k and BC values and found two proteins of interest with the highest k and BC: PTPN11 (BC 0.186; k 37) and UBC (BC 0.130; k 44). As PTPN11 (also called Shp2) has the highest BC it was selected as the central node in the network.

Backbone network
We retrieved PTPN11, HSP90AA1, and the other 26 proteins within the top 10% largest degree (k) or highest BC and considered them as the hubs or bottlenecks and constituted the backbone of the giant network (Fig. 2). Of the 28 nodes comprising the backbone network, 6 are original seed proteins (PLAU, FGA, COL18A1, HSP90AA1, LGALS3, PF4). PTPN11 had the highest BC of the network (Table 3) meaning that it is the main node controlling the flow of information through  the network, followed by CDK1 and HSP90AA1, while UBC had the highest k. Gene Ontology (Go) analysis was performed on the Backbone network to identify which GO terms (biological process and molecular function) were over or underrepresented in the network (Table 4).

Shortest path network
The subnetwork of the shortest paths between the seed proteins was made up of 81 nodes and 409 edges. In Fig. 3, it can be observed that the 16 seed proteins are related to each other through intermediate nodes and there is a shorter pathway through which these proteins are related, which suggests that there are common signaling pathways between these proteins that could explain the biological context associated with an unfavorable prognosis in patients with AML. Just like in the Backbone network, in the SP network PTPN11 had the highest BC (0.186) while UBC had the highest k (44), and both values are well above the average. Topological analysis of this network is summarized in Table 2.
The GO analysis was performed on the SP network to identify which GO terms (biological process and molecular function) were over or under-represented in the network (Table 5). In terms of signaling pathways, the main pathways represented were signaling by EGFR in cancer, MET activated PI3K/AKT signaling, adaptive immune system, signaling by receptor tyrosine kinases, cytokine signaling in Immune system, diseases of signal transduction, hemostasis, PI3K-Akt signaling pathway, MAPK family signaling cascades, signaling by Interleukins, pathways in cancer, proteoglycans in cancer, cell surface interactions at the vascular wall, platelet activation, signaling and aggregation, acute myeloid leukemia, and chronic myeloid leukemia.
The backbone and SP networks were analyzed with Reactome and KEGG Pathway databases, and the  main pathways represented were signaling by EG-FRvIII in cancer, diseases of signal transduction, cell surface interactions at the vascular wall, platelet activation, signaling and aggregation, MAPK family signaling cascades, proteoglycans in cancer, MET activates PI3K/AKT signaling, acute myeloid leukemia, and chronic myeloid leukemia ( Table 6).

Importance of PTPN11
PTPN11 encodes the Shp2 non-receptor protein-tyrosine that is involved in cytokine receptor and receptor tyrosine kinase signaling (Rehman et al., 2018). This protein is required for the complete activation of the RAS-ERK pathway in response to growth factors and cytokines, in addition to modulating the PI3K-AKT and  JAK-STAT pathways (Grossmann et al., 2010). Mutations in PTPN11 occur in approximately 6.6% of patients with AML  and lead to alterations in signaling pathways associated with the cell differentiation and growth. As this protein plays critical roles in hematopoiesis and leukemogenesis, myeloid and erythroid differentiation is affected in embryonic stem cells that express mutated Ptpn11 (Qu et al., 1997). It has been reported that PTPN11 mutations cause an increase in the frequency of leukemic cells in both, humans and murine models Deng et al., 2018). Shp2 also regulates apoptotic genes, and it has been reported that it increases the expression of Bcl2 and Mcl1. Patients who have mutations in Ptpn11, therefore, prove to be resistant to anti-Mcl1 drugs . There are several in vitro and in vivo studies in the literature that have shown that Ptpn11 is a potential target for cancer treatment, specifically when there is drug resistance. The potential for cancer treatment is observed in a study with transgenic mice containing a doxycycline (Dox)-inducible PTP-defective Shp2 mutant; when the Shp2 activity is inhibited in these mice, this results in suppressed EGFR signaling and fewer/smaller hyper proliferative lesions (Schneeberger et al., 2015). In terms of drug resistance, an interesting study, published in 2015 (Prahallad et al., 2015), revealed that when Ptpn11 was knocked-down, BRAF mutant colon cancer cells that were previously resistant to treatment with selective BRAF inhibitors became sensitive to these drugs.
There are currently three Ptpn11 (Shp2) inhibitors that have been developed for patients with advanced solid tumors that have failed, are intolerant to (drug resistance), or are considered ineligible for standard treatments. These function in a similar manner, by binding to and inhibiting Shp2 signaling, which in turn inhibits the Ras-MAPK pathway that is often hyperactivated in cancer cells. These are: JAB-3068 (Jacobio Pharmaceuticals Co.), Inc. & Sanofi), and TNO155 (Novartis). For each of these there are two registered clinical trials: JAB-3068 (ClinicalTrials.gov identifier: NCT03565003 and NCT03518554), RMC-4630 (ClinicalTrials.gov identifier: NCT03634982 and NCT03989115), TNO155 (ClinicalTrials.gov identifier: NCT03114319 and NCT04000529). All of these are phase 1/2a clinical trials that are currently recruiting participants, with the aim of determining the maximum tolerated dose, as well as characterizing the safety, tolerability, and pharmacokinetics profile of these drugs.
Presently, there are no Ptpn11 (Shp2) inhibitors specifically aimed towards AML. However, there is a trial (ClinicalTrials.gov identifier: NCT03311815) sponsored by the PETHEMA Foundation, in which bone mar- row and peripheral blood samples from 500 AML patients will be taken at diagnosis and at resistance or first and subsequent relapses. These samples will be analyzed by Next Generation Sequencing (NGS) in order to sequence 26 consensus genes recurrently mutated in AML (ASXL1, HADH, CBL, CEBPA, DNMT3A, EZH2, FLT3, GATA2, IDH1, IDH2, JAK2, KIT, KRAS, MPL, MLL, NPM1, NRAS, PTPN11, RUNX1, SETBP1, SF3B1, SRSF2, TET2, TP53, U2AF1, WT1). With this data, it will be possible to determine which gene mutations can be classified as the driver or passenger mutations, and establish a diagnostic platform for rapid molecular diagnosis of the disease. As samples will be taken at resistance and relapse, this will also provide information regarding which genes/proteins are associated with unfavorable prognosis, which could be useful for determining prognosis at time of diagnosis, thus informing treatment options.

CONCLUSIONS
This in silico analysis revealed 28 proteins that could be considered potential biomarkers of poor prognosis in AML, with PTPN11 as the main node controlling the flow of information through the network.
One of the biggest challenges in biomarker research is that more often than not, a single biomarker is shared by several pathologies; so rather than a single protein biomarker, a panel of biomarkers is required in order to achieve the overall level of specificity needed. Therefore, this in silico approach is highly useful for informing which proteins could be included in such a panel, and which of these contribute significantly to the overall specificity and sensitivity.
It would be of great interest to perform a dependency analysis on this proposed panel of 28 proteins, in order to determine which nodes have a positive or negative influence on other nodes, thus identifying activators and inhibitors of the network. Perturbation experiments can also help identify which nodes are essential to the network, by eliminating them and observing how the behavior of the network changes. This optimization of the panel will ensure prognostic precision, while keeping costs down by avoiding unnecessary testing of biomarkers that do not significantly contribute. Wet bench research is enhanced when computational analysis is incorporated.