Identification of serum proteome components associated with progression of non-small cell lung cancer

The aim of the present study was to perform comparative analysis of serum from patients with different stages of non-small cell lung cancer (NSCLC) using the three complementary proteomic approaches to identify proteome components associated with the progression of cancer. Serum samples were collected before any treatment from 200 patients with NSCLC, including 103 early stage, 64 locally advanced and 33 metastatic cancer samples, and from 200 donors without malignancy. The low-molecular-weight fraction of serum proteome was MALDI-profiled in all samples. Serum proteins were characterized using 2D-PAGE and LC-MS/MS approaches in a representative group of 30 donors. Several significant differences were detected between serum samples collected from patients with early stage cancer and patients with locally advanced cancer, as well as between patients with metastatic cancer and patients with local disease. Of note, serum components discriminating samples from early stage cancer and healthy persons were also detected. In general, about 70 differentiating serum proteins were identified, including inflammatory and acute phase proteins already reported to be associated with the progression of lung cancer (serum amyloid A or haptoglobin). Several differentiating proteins, including apolipoprotein H or apolipoprotein A1, were not previously associated with NSCLC. No significant differences in patterns of serum proteome components were detected between patients with adenocarcinoma and squamous cell carcinoma. In conclusion, we identified the biomarker candidates with potential importance for molecular proteomic staging of NSCLC. Additionally, several serum proteome components revealed their potential applicability in early detection of the lung cancer.


INTRODUCTION
Clinical stage of cancer, assessed by the primary tumor, nodal and distant spread categories, remains the most important prognostic factor for human malig-nancies including a lung cancer.Currently used the 7th edition of TNM classification of lung cancer (Sobin et al., 2009) associates well with long-term survival in nonsmall cell lung cancer (NSCLC) (Coche et al., 2010).However, the traditional staging based on the TNM classification still appears insufficient for planning of systemic therapies, for example in high risk patients who underwent surgery.It has become generally accepted that clinical, anatomical and pathological criteria have to be supplemented with additional parameters reflecting individual features of a disease.The proposed approaches include volumetric classification, what might enable estimation of tumor volume and the number of cancer cells assessed on the basis of functional imaging (van Loon et al. 2011;Manenti et al., 2012).Another approach is based on assessment of RNA expression profiles or proteomic profiles that differentiate high-risk patients from the lowrisk ones (Yanagisawa et al., 2007).These profiles appear promising, yet did not prove to provide sufficient information for a clinical use (Subramanian & Simon, 2010).Finally, assessment of driving molecular aberrations in oncogenes such as EGFR, KRAS or ALK, provides prognostic in addition to their predictive information (Pennell, 2010).Some of these markers are being used for defining molecular subtypes to select patients for targeted therapies in increasing proportion of NSCLC patients (Toyooka et al., 2011;Sos & Thomas, 2012).
Molecular markers assessed directly in cancer tissue allow the best characterization of a disease.However, analysis of the cancer tissue material is not always feasible.Hence assessment of molecular markers in surrogate material like blood appears as a highly attractive approach in cancer diagnostics (Liotta et al., 2003).There are several serum/plasma protein markers, whose levels associate with lung cancer stage and prognosis.These include CEA, CA125, CYFRA21-1 or SCCAg, which usu-ally increase with advanced stage of cancer progression (Lu et al., 2010).Some other serum proteins potentially associated with progression of cancer (including serum amyloid A or haptoglobin) are involved in the inflammation and acute processes (Dowling et al., 2012).Of note, serum proteome profiling by MALDI/SEL-DI mass spectrometry was used for the identification of multi-peptide signatures discriminating patients with NSCLC and healthy donors or patients with other malignancies (Patz et al., 2007;Yildiz et al., 2007;Han et al., 2008;Ocak et al., 2009;Pietrowska et al., 2012).Similarly, serum proteome profiling revealed proteome signature allowing for classification of NSCLC patients for good or poor outcome after treatment with EGFR inhibitors (Taguchi et al., 2007), which signature was a base for the prognostic and predictive VeriStrat test (Carbone et al., 2012).
The aim of the present study was to perform comparative analysis of serum proteome of patients with different stages of NSCLC.The methodological approach was designed to systemically characterize and identify components, whose abundance in blood is associated with progression of NSCLC.Such proteins would have the potential to serve as a prognostic marker supporting the staging of lung cancer.

MATERIALS AND METHODS
Characteristics of the patient group.Two hundred patients with NSCLC were enrolled into this study: 103 patients with early stage cancer (stage IA, IB, IIA and IIB), 64 patients with locally advanced cancer (stage IIIA and IIIB) and 33 patients with metastatic cancer (stage IV).The histopathology of tumors included squamous cell carcinoma (99 patients), adenocarcinoma (78 patients) and not otherwise specified NSCLC; the pathological types were similarly distributed among groups with different stage of cancer.All patients were Caucasians (67% men), with the age at the range 38-86 years (median 65), mostly current or former smokers (96%).Two hundred donors without diagnosed malignancies were recruited as a control group during the lung cancer CT screening program.All persons were Caucasians (68% men), with the age at the range 49-79 years (median 64), mostly current or former smokers (98% with at least 20 pack years).Table 1 shows more detailed information about analyzed groups.Five mL of peripheral blood was collected from each donor; all cancer samples were collected before the start of any treatment.Blood was incubated for 30 min.at room temperature to allow clotting, and then centrifuged at 1000 × g for 10 min.to remove the clot; the serum was stored at -70°C.The study was approved by the appropriate Ethics Committee and all participants provided informed consent indicating their conscious and voluntary participation.
Profiling of the low-molecular-weight fraction of serum proteome.Directly before analysis albumin and other large-molecular-weight proteins were removed from serum samples by centrifugation through 50 kDa cut-off membrane, and mass spectra were registered as described in details elsewhere (Pietrowska et al., 2009;Pietrowska et al., 2012).Briefly, samples were desalted and concentrated through binding to C18 ZipTip microcolumn (Millipore) and eluted with 1 µl of matrix solution (saturated solution of alpha-cyjano-4-hydroxy-cinnamic acid in 50% ACN/H 2 O and 0.1% TFA) directly onto the MALDI plate.Samples were analyzed using UltrafleXtreme MALDI-ToF/ToF mass spectrometer (Bruker Daltonics); the analyzer worked in the linear mode, and positive ions were recorded in the mass range between 2 000 and 14 000 Da. Spectral components, which reflected [M+H] + peptide ions recorded at defined m/z values, were initially preprocessed, which included alignment, detection and removal of outlier profiles by Dixon's Q test, averaging of technical repeats, baseline removal and normalization of the total ion current.In the preprocessed profiles the spectral components were detected using decomposition of mass spectra into their Gaussian components as described in details elsewhere (Pietrowska et al., 2009;Pietrowska et al., 2012).Briefly, the average spectrum was decomposed into a sum of Gaussian bell-shaped curves by using a variant of the expectation maximization algorithm and Bayesian Information Criterion for model selection.The initial set of Gaussian components, defined by their mean values and standard deviations, was further processed to merge overlapping components (components homogenous in variance and with main values closer than 0.1% of the m/z value), and to remove components presumably representing the residual baseline (components with coefficient of variation bigger than 25%), which resulted in dimension reduction to 176 Gaussian components.The final 176 Gaussian components were used to compute features of registered spectra (termed spectral compo- nents afterward) for all samples by the operations of convolutions with Gaussian masks.2D-PAGE analysis and identification of serum proteins.Serum samples (300 µg of proteins) precipitated and dissolved in a rehydration buffer (7M urea, 2M thiourea, 2% CHAPS) were separated by isoelectric focusing using linear gradient of pH 4-7, then the second dimension was performed on 12% SDS-polyacrylamide gels.Proteins were quantified using optical scanner after staining with colloidal Coomassie Brilliant Blue.For each sample three technical replicas were performed; the ETTAN System (GE Healthcare) was used for protein separation.Protein spots were automatically matched across gels between samples, and the size of a protein spot was expressed as its relative volume.Protein spots were excised and then in-gel digestion was performed with trypsin, and trypsin-digested samples were analyzed using an UltrafleXtreme MALDI-ToF/ToF (Bruker Daltonics) mass spectrometer working in a reflectron mode in 800-5 000 Da mass range.Protein identification was performed with the use of the Mascot engine.
LC-MS/MS analysis of serum proteome components.Serum samples were reduced with 5 mM dithiothreitol for 5 min at 95 o C, and subsequently alkylated with 10 mM iodoacetamide for 20 min in darkness at room temperature, and then digested at 37 o C overnight with trypsin.Tryptic digests (20 µg) were separated by nanoflow HPLC system (EASY-nLC) on a 150 mm × 75 µm C18 column (Thermo Fisher Scientific Inc) with the flow rate of 300 nL/min.A linear 2-45% gradient of ACN applied over 400 minutes was used for separation.The eluates were analyzed online using an HCT-Ultra mass spectrometer (Bruker Daltonics).Mass data acquisition was performed in the mass range of 200-1500 m/z using the standard-enhanced mode (8100 m/z per second).Doubly-and triply-charged ions with absolute intensities greater than 20000 were selected for fragmentation in the trap, and the resulting fragments were analyzed using the Ultra Scan mode (m/z range of 50-3 000 at 26 000 m/z per second).Protein identification was performed using the Mascot engine for searching against Swiss-Prot human database; identification was considered significant when the protein score was above the 95% confidence limit.
Protein annotation.The knowledge base Empirical Proteomic Ontology Knowledge Base (EPO-KB) (Lustgarten et al., 2008), which annotates registered m/z values to known peptide/proteins, was used for hypothetical identification of MALDI-ToF spectra components assuming their mono-protonation and allowing for a 0.5% mass accuracy limit.Serum proteins were annotated to different functional groups using the PANTHER Classification System (Mi & Thomas et al., 2009).
Statistical analyses.For each component of MALDI-ToF mass profiles the normality of distribution was assessed using the Lilliefors test, and then, depending on the type of distribution, either the Tukey-Kramer pairwise test or the Kruskal-Wallis pairwise test was applied to the analysis of the differences between the groups (the ANOVA test was used in the first step to search for differentiating components).Volumes of the 2D-gel protein spots were analyzed by the ANOVA test to detect differentiating components, and then differences between groups were analyzed using the Wilcoxon test.Rates of protein identification by LC-MS/MS were compared using the Fisher test.In general, p=0.05 was selected as a statistical significance threshold, except for MALDI profiling where correction for multiple testing with false discovery rate (FDR) estimation was applied.

Patterns of differences between the compared groups.
We proposed several hypothetical patterns of cancer-stage associated changes of serum proteome features.Two general types of changes were either upregulation (U) or downregulation (D) of a specific protein in serum from cancer patients comparing to healthy controls, then more refined changes between sub-groups were assumed.Figure 1 presents major hypothetical patterns of serum proteome components, which correspond to potential differences between healthy controls and patients with different stage of cancer.We performed comparison between four groups: healthy controls (group 0), patients with early stage cancer (group 1; clinical stage from IA through IIB), locally advanced cancer (group 2; clinical stage IIIA and IIIB) and metastatic cancer (group 3; clinical stage IV).Detection of patterns was based on the significance of pairwise differences between the compared groups (i.e., 0 vs 1, 1 vs 2, 2 vs 3 and 3 vs 4); p<0.05 was selected as the level of statistical significance.

RESULTS
In this work we have performed three types of proteomic analyses: (i) the low-molecular-weight fraction of serum (up to 15 000 Da) was characterized by the MALDI-ToF mass profiling, (ii) the medium-and high-molecular-weight proteins were characterized by the 2D-PAGE, then identified by their tryptic fragments fingerprinting, (iii) the whole proteome components were characterized and identified by LC-MS/MS after digestion with trypsin ("shotgun proteomics").Finally, the data resulting from all three approaches were combined to obtain a complete picture of cancer-stage related features of serum proteome.
Mass profiles of the low-molecular-weight fraction of serum proteome were characterized by MALDI-ToF spectrometry in the whole group of 200 cancer patients and 200 healthy donors.A typical mass spectrum recorded in the range from 2 000 to 14 000 Da is shown in Fig. 2A; 176 spectral components (peptide ions) were distinguished in this range (abundances of 67 components revealed statistically significant differences between compared groups).The major differences in abundances of specific serum components were observed between healthy donors and the patients with early stage cancer (group 0 vs group 1), as well as between healthy donors and the patients with advanced cancer (0 vs 2+3); there were 58 and 48 serum components showing statistically significant differences (FDR<0.05) between these groups, respectively.Significant differences were also observed between patients with early stage cancer and advanced cancer (1 vs 2+3); there were 15 differentiating components between these groups.None of serum components showed statistically significant difference between patients with locally advanced cancer and metastatic cancer (2 vs 3).Of note, we did not observe statistically significant differences between serum samples from patients with squamous cell carcinoma and adenocarcinoma.Figure 2B shows the numbers and examples of serum components, whose differences between the compared cancer-stage groups followed one of the patterns defined in Fig. 1.We found that the majority of differentiating components discriminated between healthy donors and all three groups of cancer patients including patients with early stage cancer; 39 and 19 components were classified as representative for pattern U1 and D1, respectively.However, only a few serum components discriminated patients with advanced cancer (group 2+3) from healthy donors and with early stage cancer (group 0+1); there were 8 components following pattern U2 or D2.Interestingly, none of serum components differentiated between patients with metastatic cancer and all other groups of donors (hypothetical patterns U3 and D3).Of note, more differentiating serum components had their abundance increased (U) than decreased (D) in cancer samples: 43 and 23 components, respectively.
In the next step of the study 2D-PAGE profiles of serum proteins were compared between the groups.Serum samples were polled before gel electrophoresis as follows: 6 samples from healthy donors and 8-9 samples from each cancer-stage group (including 5 samples of squamous cell carcinoma and 3-4 samples of adenocarcinoma, which were gel-analyzed separately); three technical replicas for each batch of polled samples were run.Abundances of protein spots were assessed based on their in-gel volume, and then proteins with statistically different abundances were identified by the peptide fingerprinting after their excision from gel and trypsin digestion.Figure 3A shows a typical 2D-PAGE pattern of the analyzed serum samples, Fig. 3B shows examples of serum proteins with different abundances in the compared groups of donors.Table 2 shows differentiating serum proteins, whose levels corresponded to patterns defined in Fig. 1.We found that abundances of 12 serum proteins were associated with cancer progression; 8 proteins showed increased while 4 proteins showed decreased levels in cancer samples.Among proteins that discriminated between the groups of patients with different stage of cancer three different haptoglobin (HPT) variants (GI: 296653, 78174390 and 47124562) and two serum amyloid A (SAA) variants (GI: 225986 and 247143) were found; all of them were upregulated in sera from patients with advanced cancer.There was no statistically significant differences observed between serum samples from patients with the same stage of squamous cell carcinoma and adenocarcinoma.
Finally, the whole serum trypsin-digested proteome was analyzed using LC-MS/MS (the "shot-gun" approach).30 serum samples were subjected to analysis: 6 samples from healthy donors and 7-9 samples from each cancer-progression group (with similar proportion of squamous cell carcinomas and adenocarcinomas).One could assume that the probability of identification of a given protein depends on its initial abundance in serum sample when the automatic MS/MS mode is used for selection of peptides.Hence, the relative number of samples where the protein was identified (i.e., rate of identification) could be a measure of differences between the groups.Overall, in the analyzed samples we identified tryptic peptides from more than 1000 proteins; among them about 300 proteins appeared repetitively in multiple samples (i.e., at least half of the samples in at least one group), and could be used for differentiation of groups.Table 2 shows serum proteins, whose relative identification rate allowed for their classification among the patterns defined in Fig. 1.We found that 32 identified proteins, associated with cancer progression, were upregulated (patterns U1-U3), while 31 identified proteins were downregulated (patterns D1-D3) in cancer samples.Different fragments of HPT, though generally upregulated in cancer samples, showed differential overrepresentation in samples from different stages of progression.To note, two proteins, namely complement component C3 (C3) and transferrin (TRFE), showed more complex patterns and their different fragments were either over-or underrepresented in different groups.In general, we detected 27 proteins whose relative identification rates discriminated between the samples from healthy controls and all groups of cancer patients (patterns U1/D1), and 24 proteins that differentiated the samples of early cancer and locally advanced cancer (patterns U2/D2).Additionally, 14 proteins that differentiated patients with metastatic cancer from other groups of donors were also detected (patterns U3/D3).

DISCUSSION
Serum proteome features associated with progression of the NSCLC were characterized here using a combination of three proteomic approaches.The major differences were observed between healthy donors and all groups of cancer patients (hypothetical patterns U1/D1).However, several important differences were also detected between serum samples collected from patients with early stage cancer (clinical stage IA through IIB) and patients with locally advanced cancer (clinical stage IIIA and IIIB); these proteome components contributed to hypothetical pattern U2/D2.There were also a few proteins identified whose abundances differentiated patients with metastatic cancer (clinical stage IV) from patient with local disease (hypothetical pattern U3/D3).Thus, several hypothetical biomarker candidates were identified with potential applicability in molecular proteomic staging of NSCLC.Interestingly, several components whose abundances were significantly different between healthy donors and patients with early stage cancer were detected in the low-molecular-weight fraction of serum proteome upon MALDI-ToF profiling.Of note, about 50% of patients in the analyzed early stage cancer group were diagnosed as a result of CT-screening program without previous clinical symptoms.Hence, high potential of this fraction of serum proteome for identification of biomarkers for early detection of NSCLC was revealed.
Among blood proteins characteristic to cancer patients are those reflecting the overall influence of a disease upon the organism, including factors involved in the im-  mune response and inflammatory reactions.In general, chronic inflammatory reactions are frequently observed in cancer patients and their escalations putatively correlate with progression of a disease (Pierce et al., 2009).Actually, several proteins involved in the inflammation and acute phase response showed increased levels in blood of cancer patients.These include serum amyloid A, haptoglobin and component C3, whose changed levels were observed in patients with different types of malignancies, including lung cancer (Dowling et al., 2012).Several publications have reported that HPT and its glycan-modified derivatives are associated with progression of both NSCLC (Hoagland et al., 2007) and SCLC (Bharti et al., 2004).Similarly, elevated serum level of SAA appears to be a general feature of progressive and metastatic cancer cases, including lung cancer (Malle et al., 2009;Cho et al., 2010).As expected, in this work we also observed association of increased HPT and SAA serum levels with the stage of NSCLC: elevated level of these proteins was observed in patients with advanced cancer.Among other inflammation-related proteins whose abundance in blood was significantly different between healthy donors and all groups of NSCLC patients including patients with early stage cancer was apolipoprotein H (APOH, beta-2-glycoprotein I).Importantly, elevated level of APOH in serum samples from patients with early stage NSCLC was revealed by both 2D-PAGE and LC-MS/MS.This protein was reported to affect angiogenesis and endothelial cell growth (Beecken et al., 2010), and showed relevance for the development of hepatocellular carcinoma due to interference with NF-kB (Jing et al., 2010).
To reveal other functional groups of proteins associated with progression of NSCLC seventy two differentiating proteins identified either by 2D-PAGE or LC-MS/ MS (including 37 proteins upregulated and 33 proteins downregulated in cancer samples) were annotated in the PANTHER pathway database (identified groups could partially overlap).In addition to defense and immunity factors (11 proteins) these groups included: enzyme modulators (13 proteins), nucleic acid binding proteins (10 proteins), signaling molecules (8 proteins), transfer/ carrier proteins (7 proteins) as well as proteins involved in cell adhesion, extracellular matrix and cytoskeleton (23 proteins).Hence, beside generally recognized inflammation and acute phase factors, proteins involved in other pathways and processes are potential candidates for biomarkers associated with the progression of NSCLC.Some of these proteins have already been reported to be associated with other types of cancer.In serum samples from patients with advanced cancer both 2D-PAGE and LC-MS/MS approach revealed reduced level of apolipoprotein A1 (APOA1), major protein component of high density lipoprotein, which was also proposed as a potential serum cancer marker.It has been showed recently that APOA1 is a potent suppressor of tumor growth and metastasis due to its immunomodulatory role in tumor microenvironment (Zamanian-Daryoush et al., 2013), offering functional association between its reduced level and progression of cancer.For several proteins whose association with cancer progression was revealed in this work, their cancer-related functions and biomarker potential remain to be characterized and validated in further studies.
In conclusion, we performed a complex proteomic analysis to identify serum proteome components associated with progression of NSCLC, which revealed proteins involved in the inflammation and other (potentially) cancer-related processes.Among the identified serum proteins there were those previously reported to gen-erally reflect progression of malignancy, including lung cancer (e.g.SAA, HPT or C3), or otherwise associated with different types of cancer (e.g.APOH or APOA1).We did not observe significant differences between histological types of NSCLC.Thus, one should conclude that the identified proteins reflected general influence of cancer progressing on patient's organism, with cancer-stage specificity rather than cancer-type specificity.Hence, such proteins are potentially good candidates for general markers of disease progression with potential applicability in molecular staging of different types of malignancies.Importantly, we found a number of proteins that could discriminate samples of healthy persons and patients with early stage NSCLC, indicating their potential applicability for early detection of lung cancer.

Figure 1 .
Figure 1.Hypothetical patterns of changes in abundance of serum components in blood of cancer patients compared to healthy controls.Healthy donors (0), and patients with early stage (1), locally advanced (2) and metastatic (3) NSCLC.Following patterns were defined: U1/D1 -controls different from all cancer groups, U2/ D2 -controls and early cancers different from advanced cancers, U3/D3 -metastatic cancers different from all other groups; U and D -abundances in cancer patient higher (upregulated) or lower (downregulated) as compared to healthy controls, respectively.

Figure 2 .
Figure 2. Mass profiles of the low-molecular-weight fraction of serum proteome.Panel A -Average MALDI-ToF spectrum of the serum proteome in the 2 000-14 000 Da range.Panel B -Serum components whose abundances were associated with progression of cancer.Each box shows: number of components following specific pattern of changes (U1-D3); examples of m/z component representative for each pattern (boxplots show minimum, lower quartile, median, upper quartile and maximum values for compared groups; asterisks represented statistically significant differences, p<0.05); hypothetical annotation of some of registered m/z components to the EPO-KB proteomic database.

Table 1 . Characteristics of analyzed groups of donors
*SCC -squamous cell carcinoma, AC -adenocarcinoma, n.o.s.NSCLC -not otherwise specified NSCLC Serum proteome markers for NSCLC progression