Panel of serum metabolites discriminates cancer patients and healthy participants of lung cancer screening – a pilot study

Introduction. Blood biomarkers may support early diagnosis of lung cancer by enabling pre-selection of candidates for computed tomography screening or discrimination between benign and malignant screeningdetected nodules. We aimed to identify features of serum metabolome distinguishing individuals with earlydetected lung cancer from healthy participants of the lung cancer screening program. Methods. Blood samples were collected in the course of a low-dose computed tomography screening program performed in the Gdansk district (Northern Poland). The analysis included 31 patients with screening-detected lung cancer and the pair-matched group of 92 healthy controls. The gas chromatography coupled to mass spectrometry (GC/ MS) approach was used to identify and quantify small metabolites present in serum. Results. There were several metabolites detected in the sera whose abundances discriminated patients with lung cancer from controls. Majority of the differentiating components were downregulated in cancer samples, including amino acids, carboxylic acids and tocopherols, whereas benzaldehyde was the only compound significantly upregulated. A classifier including nine serum metabolites allowed separation of cancer and control samples with 100% sensitivity and 95% specificity. Conclusions. Signature of serum metabolites discriminating between cancer patients and healthy participants of the early lung cancer screening program was identified using a GC/MS metabolomics approach. This signature, though not validated in an independent dataset, deserves further investigation in a larger cohort study.


INTRODUCTION
Lung cancer is the leading cause of cancer mortality, responsible for about one-fifth of cancer-related deaths worldwide.The majority of lung cancer cases are diagnosed at advanced stages and have a grim prognosis (the average 5-year survival of about 10-15%).However, in the case of disease detected at early stages, prognosis is much better (the average 5-year survival in the range of 65-85%).Thus, in addition to primary prevention (i.e., tobacco smoking control), screening for the early detection of lung cancer might be the major strategy to reduce lung cancer mortality (Hoffman et al., 2000;Jemal et al., 2010;Torre et al., 2012).Several diagnostic tools allowing early lung cancer detection have been investigated within the past decades, but none have found their routine application in clinical practice.Nevertheless, the low-dose computed tomography (LD-CT) screening in a high-risk group had shown a 20% reduction in lung cancer-specific mortality as compared with conventional chest X-ray examination (Aberle et al., 2011).Hence, lung cancer screening based on LD-CT is now the most efficacious strategy for lung screening, with a perspective for world-wide cancer mortality reduction.However, relatively low positive predictive value and sensitivity of this test may lead to "over-diagnosis".In our own experience, around 75% of patients with screening-detected lung abnormalities underwent unnecessary diagnostic work-up, including around 25% of patients subjected to further invasive procedures (Rzyman et al., 2013).For these reasons, complementation of CT-based screening with other tests allowing effective and reliable preselection of individuals for LD-CT examination, or better discrimination between benign and malignant nodules detected by LD-CT, seems a critical issue for practical application of this strategy (Priola et al., 2013;Rzyman et al., 2015).Blood is the most available source of biomarkers potentially enhancing the power of early lung cancer detection or differentiating lung nodules.Several components of blood, including circulating tumor cells, circulating tumor DNA, micro RNA, autoantibodies and specific serum/plasma proteins have been analyzed in the search for such biomarkers (Hassanein et al., 2012;Hassanein et al., 2011;Sozzi et al., 2014), but none has yet been adopted in the clinics.
The overall response of human organism to pathological conditions is mirrored in different molecular fractions of body fluids, including the metabolome.In recent years, monitoring of cancer-related metabolites in blood has been an emerging approach to detection and diagnosis of different malignancies (Spratlin et al., 2009).Several studies have demonstrated that profiling serum or plasma samples by mass spectrometry (MS) or nuclear magnetic resonance (NMR) spectroscopy could reveal metabolites whose blood levels discriminate patients with lung cancer from healthy individuals or from patients with non-malignant lung diseases.Such differentiating compounds included phospholipids, carboxylic acids, amino acids, sugars and many other small metabolites (Jordan et al., 2010;Rocha et al., 2011;Hori et al., 2011;Guo et al., 2012;Wang et al., 2013;Deja et al., 2014;Liu et al., 2014;Chen et al., 2015).More recently, two relatively large studies using NMR-based analysis of plasma or serum metabolome revealed a promising diagnostic potential of multicomponent lung cancer signatures built of different types of small metabolites (Puchades-Carrasco et al., 2016;Louis et al., 2016).Another study, using MS-based approaches, revealed a large set of metabolites whose serum levels discriminated lung cancer patients from matched controls, and allowed for building multicomponent cancer classifiers (Mazzone et al., 2016).However, lung cancer patients enrolled in the abovementioned studies included both, early and advanced cancer cases, and no study has yet been performed using material obtained exclusively from high-risk subjects participating in the LD-CT screening.Hence, potential relevance of proposed biomarkers for early detection of lung cancer remains to be verified.Here, we assessed the applicability of a GC-MS-based approach to identify a signature of serum metabolites discriminating between patients with screening-detected lung cancer and healthy participants of the LD-CT screening program.

Study subjects.
Material for this study was collected in the course of the Pomeranian Lung Cancer Screening Program performed by Gdansk Medical University between 2008 and 2010.This program enrolled over 8 000 participants and offered LD-CT examination for current or former smokers with at least a 20 pack-year history, aged from 50 to 75 years.Blood samples were collected from about 3 600 participants.The study group involved material from 31 participants who were finally diagnosed with lung cancer (i.e., 0.9% of the screened group) (Table 1).Each cancer case was accompanied by three controls, with no detected malignancy, matched according to sex, age and smoking history, who were selected from the participants of the LD-CT screening program (92 cases).The study was approved by the Ethics Committee of Gdansk Medical University (approval number NKEBN/42/2009), and each participant provided a written informed consent indicating her/his voluntary participation in the project and provision of blood samples for future research.
Sample preparation.Peripheral blood was collected into a 5 mL BD Vacutainer Tube, incubated for 30 min.at room temperature to allow clotting, and then centrifuged at 1 000×g for 10 min.to remove the clot.The serum was aliquoted and stored at -70°C prior to analysis.25 µl of serum was added to 200 µl mixture of MeOH:H 2 O (1:1 v/v), vortexed for 20 min., centrifuged for 10 min.at 18 000 g, and then the supernatant 1 was collected into a new tube.The pellet was re-suspended in 200 µl mixture of CH 2 Cl 2 :MeOH (3:1 v/v).The mixture was placed in the ultrasonic bath for 5 min, vortexed for 10 min and centrifuged for 10 min at 18 000×g, and then the supernatant 2 was collected into a new tube.Both supernatant fractions were combined and evaporated in a vacuum concentrator.
GC/MS analysis.Dried extracts were derivatized directly before GC/MS.Each sample was mixed with 25 µl of methoxyamine hydrochloride in pyridine (20 mg/ml) and vortexed (950 rpm) for 90 min.at 37°C, and then 80 µl of N-methyl-N-trimethylsilyl-trifluoro-acetamide was added to the mixture and vortexed (950 rpm) for 30 min.at 37°C.The GC/MS analysis was performed with Agilent 7890A gas chromatograph (Agilent Technologies) combined with Pegasus 4D GCxGC-TOFMS mass spectrometer (Leco).Compounds were separated using the DB-5 bonded-phase fused-silica capillary column (30 m length, 0.25 mm inner diameter, 0.25 µm film thickness) (J&W Scientific Co.); the GC oven temperature program was as follows: 2 min.at 70°C, raised by 8°C/ min.to 300°C and held for 16 min.at 300°C (the total time of GC analysis was 46.75 min).Helium was used as the carrier gas at a flow rate of 1 ml/min.One microliter of each sample was injected in a splitless mode.The initial injector temperature was 20°C for 0.1 min., then temperature raised to 350°C at the 600°C/min.rate.The septum purge flow rate was 3 ml/min.and the purge was turned on after 60 s.The transfer line and ion source temperatures were set to 250°C.In-source fragmentation was performed with 70 eV energy.
Analysis of spectra.Mass spectra were recorded in the mass range of 35-650 m/z.All spectra were subjected to automatic peak detection, deconvolution, retention index calculation and library search by Leco ChromaTOF-GC software (v4.51.6.0).The alkane series mixture (C-10 to C-36) was used to correct retention time (Rt) and to determine the retention index (RI) for each compound.Automated identification of metabolites was based on the Replib, Mainlib and Fiehn libraries; the quality threshold was set for similarity index (SI) above 700 and retention index ± 10.The unique quantification masses were specified for each component and the samples were reprocessed in order to obtain accurate peak areas for the deconvoluted components.The obtained profiles were normalized against the sum of chromatographic peak area (using the TIC approach).All peaks that were identified as artifacts (column bleed, alkanes, plasticizer, derivatization reagents) and peaks corresponding to unidentified compounds were excluded from further analysis.
Statistical analysis.Quantitative spectral data were log transformed and missing values were imputed using the k-nearest-neighbor approach with the standardized Euclidean metric on a per-group basis (only metabolites present in more than 2/3 of samples in each set were used for further analyses).For each compound, the normality of the distribution of abundance and the homogeneity of variances were assessed using the Lilliefors test and Bartlett's test, respectively.Then, the significance of differences between groups was estimated using either the two-sample T test (with a correction for heteroscedasticity, if necessary) or the nonparametric U Mann-Whitney test; the Benjamini-Hochberg approach was applied to the p-values for multiple testing correction (q-value).A multivariable logistic regression model was constructed to find the signature describing the relation between metabolite abundances and patient status.
A stepwise procedure combined with the Bayesian Information Criterion (BIC), R2 and p-value of likelihood ratio test was used for model selection.The contribution of individual predictors was measured using the Wald test.The optimal threshold for the discriminating function was found by maximizing the value of Youden's index based on the receiver operating characteristics (ROC) and the random guess line.The standard classification performance indices (sensitivity, specificity, positive predictive value, negative predictive value) were calculated as appropriate ratios of false/right negatives/ positives.Bioinformatics analysis.Metabolomic pathways were identified using the Metabolite Set Enrichment Analysis (accessed on 10.2016 at http://www.msea.ca/MSEA/faces/Home.jsp); a statistical significance of resulting over-representation was estimated using the hypergeometric test.

RESULTS
The GC/MS approach, a standard analytical tool in metabolomics study (Spratlin et al., 2009), was used to characterize profile of metabolites in serum samples collected from participants of the LD-CT screening program for early detection of lung cancer.The study included 31 patients with screening-detected lung cancer and 92 matched controls.In general, there were 195 unique metabolites identified in the analyzed samples.Out of them, 102 compounds detected and quantified in the majority of the samples in each group were used for further analyses and testing of a cancer classifier (Supplementary Table S1 at www.actabp.pl).In general, the inter-individual variability in levels of the detected metabolites was similar in both groups: the mean coefficient of variation was 0.65 and 0.83 in cancer and control samples, respectively.First, we looked for individual metabolites discriminating groups of cancer patients and healthy controls.We found 17 metabolites that showed significant difference between both groups (p<0.05),including 16 compounds with abundances lower in cancer compared to control samples (fold change 0.56 to 0.82; Table 2).The only compound significantly upregulated in cancer samples (fold change 1.67) was benzaldehyde (Fig. 1A).However, only three compounds, namely benzaldehyde, isoleucine and glycolic acid, retained statistical significance after correction for multiple testing (q<0.05).Additionally, there were six metabolites showing at least 50% downregulation in cancer samples and one metabolite showing at least 50% upregulation (fold change <1.50 or >0.67), yet observed differences were below the threshold of statistical significance (p>0.05;Table 2).
Metabolites whose abundances in serum discriminated patients with early lung cancer and healthy donors included mostly carboxylic acids and amino acids (Table 2).To reveal systemic information about potential functional importance of these differences, the identified differentiating compounds (p<0.05) were annotated to metabolic pathways using the Metabolite Set Enrichment Analysis (Fig. 1B).This type of analysis allowed identification of "over-represented" pathways associated with metabolites discriminating between cancer and control samples (i.e., pathways associated with the types of compounds that were more numerous than expected by chance).It is noteworthy that primary pathways associated with compounds downregulated in cancer samples included those involved in protein metabolism; there were two pathways that had shown a statistical significance of overrepresentation: protein biosynthesis, and Val, Leu and Ile degradation.
Finally, we built and tested a multicomponent signature that allowed separation of cancer and control samples.The optimum classifier included nine metabolites: benzaldehyde, hydroxypyruvic acid and urea (cancer-up-regulated compounds), and glycolic acid, isoleucine, gluconic acid lactone, allyl laurate, phenylalanine and linolenic acid (cancer-downregulated compounds; Table 2).This classifier allowed separation of cancer and healthy samples with 100% sensitivity, 95% specificity, 86% positive predictive value and 100% negative predictive value.The ROC characteristic of such a separator is shown in Fig. 2. The obtained classifier was not validated with an independent dataset, yet high indices observed in the discovery set apparently inspire for its testing in further studies.

DISCUSSION
General features of cancer metabolism that could be mirrored in blood metabolome include enhanced glycolysis and gluconeogenesis in combination with suppressed Krebs cycle and lipid catabolism.These features of cancer metabolism were also demonstrated in blood of lung cancer patients (Rocha et al., 2011;Hori et al., 2011;Louis et al., 2016).This characteristic of blood metabolome was accompanied by a decreased level of different amino acids, including Ala, Cys, Glu, His, Met, Pro, Thr, Trp, Tyr and Val (Rocha et al., 2011;Puchades-Carrasco et al., 2016;Mazzone et al., 2016).Metabolites specifically associated with glycolysis and gluconeogenesis were not differentiating in the current study, likely due to inclusion of only low-stage, early-detected cancer patients.On the other hand, decreased levels of amino acids (Gly, Ile, Pro, Val) and carboxylic/fatty acid (and their derivatives) were compatible with the known characteristic of the cancer metabolome.Among compounds with decreased abundance in the samples of early-detected lung cancer were vitamin E species (α-tocopherol and γ-tocopherol).The use of antioxidant vitamins in lung cancer chemoprevention has long been speculated, but there is no evidence that increased intake of vitamin E is associated with reduced risk of lung cancer in smokers (Willett et al., 1984;Wright et al., 2000;Mahabir et al., 2008;Virtamo et al., 2014;Wu et al., 2015).Similarly, plasma level of α-tocopherol does not seem to be associated with lung cancer risk (Comstock et al., 2008).On the other hand, however, α-tocopherol was downregulated while γ-tocopherol and δ-tocopherol were upregulated in serum of lung cancer patients when compared to healthy controls (Mazzone et al., 2016).Hence, results of the latter study may suggest some association between metabolism of vitamin E and the development of lung cancer.The general characteristic of early lung cancer signature developed in the present study seems to be coherent with the already known features of lung cancer metabo-  lome, yet only a small fraction of specific compounds discriminating cancer and control samples matched metabolites reported in previous studies.
We found that benzaldehyde was the only compound showing significantly increased abundance in samples collected from screening-detected lung cancer cases.Benzaldehyde was previously shown to be present among volatile compounds detected in exhaled breath of lung cancer patients (Bajtarevic et al., 2009).Importantly, this compound was specific for cancer patients (i.e. was not detected in breath of healthy volunteers), and its presence was not related to the smoking behaviour.Moreover, signature composed of different volatile compounds, including benzaldehyde, allowed discrimination between lung cancer patients and healthy controls with 50-80% sensitivity and 100% specificity.Hence, an increased level of benzaldehyde observed in serum of lung cancer patients apparently confirms the relevance of this compound in lung cancer detection.
A unique feature of our metabolomics study is the use of material derived solely from screening-detected cancer cases and corresponding controls collected in the course of the LD-CT lung cancer screening program in a population of high-risk smokers.Owing to a relatively low number of study samples, our results should be confronted with results of a previous largecohort studies including general population of lung cancer patients.A comprehensive study using the MSbased approaches, which involved 94 cancer patients and 190 matched controls, was recently published by Mazzone et al. (Mazzone et al., 2016).This study revealed tocopherols among essential components of signatures discriminating lung cancer patients from healthy controls, either in a general lung cancer population (downregulated α-tocopherol) or in the group of patients with squamous cell cancer (upregulated γ-tocopherol and δ-tocopherol).Moreover, all amino acids that showed reduced abundance in sera of cancer patients analyzed in the current study (Gly, Ile, Phe, Pro, Val), were also cancer-downregulated in the former study.The results of two other large metabolomics studies based on NMR spectroscopy involving 296 cancer cases vs. 114 controls (Puchades-Carrasco et al., 2016) and 357 cancer cases vs. 347 controls (Louis et al., 2016) were not confirmed in our study.This inconsistency could be attributed to both, different features of analytical approaches and different donor characteristics.Nevertheless, we are aware of some limitations of the study presented here, which include a relatively low number of donors and its "snapshot" design (no follow-up of controls is available).Our results should therefore be considered as exploratory.

CONCLUSIONS
In conclusion, we developed a signature based on a set of serum metabolites that discriminate cancer cases from the matched healthy subjects in a unique series of early lung cancer screening participants.Remarkably, several components of this signature were associated with the known features of cancer metabolism revealed in previous studies that included a general population of lung cancer patients.Hence, a further validation study is warranted to confirm the robustness of our data and to assess a potential clinical utility of the signature.Such study has been recently initiated in conjunction with a large LD-CT screening program carried by our group.

Figure 1 .
Figure 1.Metabolites discriminating early lung cancer patients from healthy controls.(Panel A) Serum level of benzaldehyde; boxplots show minimum, lower quartile, median, upper quartile and maximum values (abundance in arbitrary units).(Panel B) Metabolite sets enrichment overview of discriminating compounds, showing a relative over-representation (fold enrichment) of pathways associated with cancer-downregulated metabolites and its statistical significance (only pathways showing fold enrichment ≥4 are presented).

Figure 2 .
Figure 2. ROC characteristic of the separator including nine serum metabolites discriminating the cancer and control samples.

Table 1 . Characteristics of the donor groups.
N.A., not applicable

Table 2 . Compounds with differential abundance between early lung cancer and control samples
Compounds present in the cancer classifier are underlined; compounds with differences below and above the threshold of statistical significance (p≥0.05) are separated by horizontal lines; C.V., coefficient of variation; a.u., arbitrary units.