The classification accuracies for the other classifiers such as SVM in these five datasets are usually lower than that of the NBC method. This suggests that the performance of our network-based cancer classifier is very promising. For the leukemia and lung cancer datasets, SVM and RF classifiers provide the best mean performance, respectively.
NB and kNN classifiers show the worst mean performance for the leukemia and breast cancer datasets, respectively. These behaviors are generally independent of the number of genes used for classification Supplementary Tables 4 and 5. Classifiers show different variation levels in their accuracies. Even for the four datasets lung, NCI60, leukemia, and colon cancer datasets for which NBC does not have the smallest standard deviation, its standard deviation remains to be very small.
These observations suggest that the NBC method while providing comparative or better accuracy than other methods, its classification accuracy is also robust to choice of the feature selection method. Finally, we observe that as the number of genes used in the classification increase, above observations for the classification methods are mostly conserved see Supplementary Tables 4 and 5.
So far, we have discussed the average behavior of different feature selection or classification methods. Here, we focus on the specific combination of feature selection and classification methods. An appropriate combination of feature selection and classification methods leads to the best performance on different multi-class cancer datasets. However, our main results are mostly conserved as the number of genes used for classification increased Supplementary Tables 6 and 7. In the lung, colon, and leukemia cancer datasets, combinations of SVM classifier and IG feature selection method accuracy: Although, RF classifier does not achieve the best accuracy in any of the five datasets when we adopt SU as the feature selection method, this combination consistently achieves high accuracy levels.
Its ranking in the lung, breast, NCI60, leukemia, and colon cancer datasets are, respectively, 2, 3, 15, 2, and 3 Table 2. Because of that when we combined all the rankings in different datasets and found an average ranking, we observed that RF classifier combined with SU feature selection method is the second best algorithm Table 2.
These observations suggest that our new classifier is comparable to or better than the state-of-the-art classifiers including SVM and RF. Furthermore, we observe that NBC works best when it is combined with SU as the feature selection method. A summary of accuracy rankings for lung, breast, NCI60, leukemia, and colon cancer datasets for different feature selection and classification methods when the maximum number of allowable genes is set to In the first five tables, we report the best accuracy obtained by each combination of classifier and feature selection method, and their ranking.
In these tables, entries containing a pair of numbers V: W indicate the following: V refers to the best classification accuracy and W refers to its ranking.
We use bold face to highlight the top five highest accuracies. In the last table, we report the ranking of each combination of classifier and feature selection method based on the average accuracy obtained over all the five datasets. Another important observation that follows from these results is that incorrect combination of feature selection and classification methods may lead to a poor performance.
Similarly, NB combined with PAM feature selection method shows the worst performance in the breast cancer dataset accuracy: In the lung cancer and leukemia datasets, C4. So far, in our experiments we have demonstrated that NBC yields better or similar accuracy as compared to state-of-the-art methods. Next, we focus further on the NBC method to understand its characteristics, strengths, and limitations.
Briefly, two parameters characterize the predictive models generated by NBC.
These are i the number of genes selected and ii the Pearson correlation threshold. These two parameters control the number of nodes and edges in the network models generated by NBC, respectively. We vary the values of these two parameters and report the accuracy of NBC for each parameter setting. More specifically, we vary the number of genes in the [ Figure 1 presents the results. Accuracy dependency of the NBC method to the number of genes and gene-to-gene associations in the network.
Heat maps depicting the accuracy levels for varying number of genes and gene-to-gene interaction density are shown. In the figure, columns refer to the cancer datasets: Similarly, rows correspond to the feature selection methods: The x -axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y -axis denotes the number of genes used in the NBC method.
The dependency of NBC on these two variables is also governed by underlying feature selection method and the dataset. The NBC classifier accuracy behavior is very distinct in different cancer datasets also, possibly because of different network structures discussed below. Despite the feature selection method and dataset dependencies, there are clear patterns with regard to the number of genes used in the classification and correlation threshold.
For example, in general as the gene numbers increase, accuracy of the NBC method does not increase. We observe that our network-based classifier can predict at high accuracy levels while using only up to 75 genes in lung, breast, and NCI60 cancer datasets. The other algorithms, in particular SVM, usually need more genes to reach similar accuracy levels in these three datasets Supplementary Table 5. For example, in the breast cancer dataset, while NBC can reach the accuracy level of The leukemia cancer dataset was generally difficult for all the algorithms, and they needed higher gene numbers for high accuracy levels Supplementary Table 5.
Since measuring gene expression levels is expensive, these observations suggest that our method is probably more relevant for biological applications since it can function at high accuracy levels using smaller number of genes in comparison to traditional classifiers. With regard to Pearson correlation threshold cutoff, NBC method shows nonlinear behavior. When the threshold cutoff is small, we get lots of false-positive connections in the network that should not exist. When the threshold cutoff is large, probably we miss lots of gene-to-gene associations that should actually exist in the network.
For this reason as the correlation threshold increases, first accuracy levels increase for the NBC method and then decrease dramatically. Despite this general behavior, there is no single threshold level that works best for all the cancer datasets. NBC method constructs a different and unique network for each cancer class and uses these networks and predictor functions constructed by linear regression to predict expression levels for the selected genes in each sample.
In the next step, for each sample, it compares these class-specific predictions to actual gene expression levels in the sample. The method assigns the sample to the class that gives the minimum distance between the predicted and actual gene expression levels in the L 2 norm. To see how distinctive our method is in separating different classes, we computed the prediction errors for inter- and intra-subclasses using each class-specific predictor function of the NBC method. More specifically, we computed the error using the relative L 2 norm.
Figure 2 presents the results for all of the five cancer datasets we used. We make two important observations from these results. First, the prediction errors explain the classification accuracies of our method. As an example, for NCI60 dataset NBC classifier provides perfect accuracy levels, and in Figure 2 , we observe that the models created by the NBC classifier yield the least prediction error for the samples in the same class ie, the diagonal entries have the lowest values.
However, the same cannot be observed for the breast cancer dataset, which provides the lowest classification accuracies out of five cancer datasets see Table 1. Second, our results suggest that models for different classes have different prediction errors. For example, for the breast cancer, the model for class 1 produces significantly lower prediction error for the test samples in class 1 as compared to the samples in other classes.
However, the model for class 2 fails to predict the samples from its own class, since it gives lower prediction errors for other classes. Similar observations can be seen in the model for class 3. These results suggest that the low classification accuracy see Table 1 , and Supplementary Tables 4 and 5 for breast cancer is because of the inaccurate predictions of cancer patients in classes 2 and 3.
Intra- and inter-class prediction errors for different cancer datasets. In each graph, x -axis represents the class on which the model is built. Next, we focus on one of the most fundamental characteristics of the network models constructed by the NBC method, namely, we study the density of the resulting networks ie, average number of gene-to-gene associations formed by the NBC method for different cancer datasets.
Figure 3 plots the results for varying number of genes and Pearson correlation threshold values. We observe that network density depends on the number of genes and the correlation threshold. In general, as the number of genes increases and correlation threshold decreases, the number of associations in the network increases. While this qualitative behavior is dataset independent, it shows slight quantitative differences. However, the breast and colon cancer datasets show significantly lower density levels. Dependency of the network density on cancer datasets and feature selection methods. Heat maps depicting the network density levels for varying number of genes and Pearson correlation cutoffs are shown.
Columns refer to the cancer datasets: Rows correspond to the feature selection methods: For the leukemia, lung, and NCI60 cancer datasets, network densities are specific to feature selection methods. Similar behavior is observed for the NCI60 and lung cancer datasets. Despite the vast amount of experimental and computational studies, we still have limited knowledge about the mechanisms of different cancer types.
In order to understand cancer-dependent changes in the correlation-based co-expression networks, here we give a brief analysis of the network measures for the networks created by the NBC method for different cancer classes in leukemia and NCI60 cancer datasets Figs. As suggested above, the best feature selection method for the NBC classifier is the SU feature selection method.
Because of that, in this section we focused on the association networks created by the genes selected by the SU feature selection method. Owing to sparse network structures, we omitted the lung, breast, and colon cancer datasets in this experiment see Figure 3. We compared the networks created for different cancer classes with respect to three network measures, namely degree, clustering coefficient, and closeness centrality distributions of the nodes of the network models generated by NBC. For both datasets, we have used the network, which leads to the best classification of the datasets if up to genes are used Table 1 and Fig.
For the NCI60 dataset, the best accuracy is achieved at 75 genes with a correlation threshold of 0. Closeness centrality distributions of the networks in different cancer classes. In each graph, x -axis represents the closeness centrality score and y -axis represents the frequency. Network degree distributions of the networks in different cancer classes. In each graph, x -axis represents the degree and y -axis represents the frequency.
Next we measured the clustering coefficient and closeness centrality values for each gene. We observed that in classes 3, 4, 6, 7, and 8, networks have very small clustering coefficients Fig. In regard to the closeness centrality, these five classes showed centrality scores less than or equal to 8 Fig. We observed slightly different behaviors in cancer classes 1, 2, and 5 probably because of the smaller number of isolated genes. In these classes, networks showed slightly more clustering between genes and higher centrality score 9—21 Figs.
Clustering coefficient distributions of the networks in different cancer classes. In each graph, x -axis represents the clustering coefficient score and y -axis represents the frequency. These observations suggest that in classes 1, 2, and 5, the expression levels of the genes are slightly more correlated. Because of that, we observed genes with high degree, clustering, and centrality scores.
Clustering coefficient and centrality measures for each gene in this cancer type show cancer class-dependent behavior also. While in all the seven classes the clustering coefficient distributions show Gaussian behavior, the variance of these distributions is slightly different. In three of the seven classes classes 1, 2, and 3 , non-isolated genes have clustering coefficients between 0. In contrast, in classes 4, 5, 6, and 7, genes have slightly higher clustering coefficients 0. In regard to closeness centrality, we observed Gaussian-like distribution for closeness centrality scores in classes 1, 2 and 3 Fig.
In contrast to this, in classes 4, 5, 6 and 7, the frequency of the genes with high closeness centrality score is larger. In these classes, we observe more central genes where centrality scores vary between 30 and This network analysis suggests that the NBC method does not only achieve high classification accuracies for different cancer types, but also reveal potentially important insights about the topological differences among gene associations in various cancer types.
Network-based approaches for cancer classification have been proposed previously, where a combination of protein—protein interaction PPI networks and gene expression levels is used in sub-network identification for cancer classification. To reduce the dependency of the classification accuracy results on the genes used, we have employed five alternative feature selection methods to choose up to genes.
Our experimental results demonstrated that the choice of feature selection method does not have a very big impact on the classification accuracy. Our results also exhibited that our network-based classifier NBC method shows similar or better accuracy levels in comparison to those of the traditional classifiers such as SVM and RF.
We also showed that the correct combination of the feature selection and classification method is the key for successful cancer classification studies. In this regard, we observed that NBC and RF classifiers combined with SU feature selection method showed the best overall performances in five different cancer datasets. Our results also support that the selection of the best classifier and feature selection method is dataset specific. In a recent study, Staiger and colleagues 49 argued that the network-based cancer classification approaches do not really outperform the single-gene-based classifiers.
Below we explain potential reasons for this observation, the shortcomings of the earlier network-based classifiers, and how our method differs from them. The disappointing results observed in Staiger et al. The aforementioned differences between the NBC method and earlier network-based approaches are notable, and suggest that our method in contrast to earlier network-based methods is more suitable for cancer classification. Owing to high microarray costs, supervised cancer classification methods are still not employed in many cancer diagnoses.
In this sense, new classifiers that can produce accurate classification of different cancer types using small number of genes are needed. Detailed analysis of the NBC method showed that our method could reach to high classification accuracy levels using usually less than genes. In contrast, in general the traditional classifiers require more genes than the NBC method to reach similar accuracy levels. This suggests that our new network-based classifier might be medically more relevant in comparison to the other traditional classifiers. Future work in the medical application of our method to diagnosis of different cancer types is needed to elucidate this strength of our new classifier.
In order to analyze the class-dependent topological differences in gene-to-gene associations in different cancer types, we have also analyzed the network measures degree, clustering coefficient, and closeness centrality distributions in leukemia and NCI60 cancer datasets. In-depth analysis of the networks suggested by the NBC method provided new insights into the class-to-class changes of gene-to-gene interactions in cancer. While in some cancer classes we observed scale-free behavior in degree distributions of the genes, this scale behavior was lost in other cancer classes.
Similarly, clustering and centrality distributions of the genes show distinct behaviors in different cancer classes. These changes in the network properties suggest that different cancer classes will show distinct responses to similar drugs since their gene regulatory network topologies are different. It also suggests that the design of new cancer drugs should take into account the topological differences in the regulatory networks of different cancer classes.
Finally, our study indicates the need for new network-based classification algorithms and analysis techniques to decipher cancer mechanisms and find new therapeutic treatments for cancer.
This book synthesizes valuable insights into the network challenge and provides fantastic .. Chapter 19 Missing the Forest for the Trees: Network-Based HR. Missing the Forest for the Trees: Network-Based HR Strategies Valery Yakubovich, Ryan Burg. E - C H A PT E R 1 9 Missing the Forest for the Trees.
This document, which includes Supplementary Tables 4—7, summarizes results if the maximum number of allowable genes is set to or The tables present fold cross-validation prediction accuracy and accuracy rankings for different feature selection and classification method combinations on lung, breast, NCI60, leukemia, and colon cancer datasets.
Computational Advances in Cancer Informatics B. JT Efird, Editor in Chief.
The authors confirm that the funder had no influence over the study design, content of the article, or selection of this journal. Author s disclose no potential conflicts of interest. This paper was subject to independent, expert peer review by a minimum of two blind peer reviewers. All editorial decisions were made by the independent academic editor.
All authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to use of any copyrighted material, compliance with ICMJE authorship and competing interests disclosure guidelines and, where applicable, compliance with legal and ethical guidelines on human and animal research participants. Developed the NBC method: Conceived and designed the experiments: Wrote the first draft of the manuscript: Contributed to the writing of the manuscript: Agreed with manuscript results and conclusions: The definitive guide to effective management, 5th Edition.
The definitive guide to effective leadership, 5th Edition. Add To My Wish List. Kleindorfer , Yoram Jerry R. Wind , Robert E. Book This product currently is not for sale. About Features Breakthrough thinking and actionable strategies for managing the networks that are today's 1 source of business value. Identify and mitigate all the new sources of risk that networks introduce. Description Copyright Dimensions: Palmer on leadership in a networked global environment Dawn Iacobucci and James M. Visvikis on integrating financial and physical networks in global logistics Witold J.
However, our main results are mostly conserved as the number of genes used for classification increased Supplementary Tables 6 and 7. The selection of trees and their up-scaling can be inspired by studies that have successfully scaled-up water use from tree to stand level e. In the learning phase, for a given gene expression dataset of samples, first, we extract the most relevant features for classification of the training samples into their correct classes. Minding and Mining the Periphery Networks are sorted from oldest to newest. The degree of a node gene in our case is the number of connections it has to other genes.
Henisz on network-based political and social risk management Boaz Ganor on terrorism networks And much more Competitive Advantage in the Age of Networks 25 C. Climate change is expected to drive important changes in tree physiology with manifold but not yet fully understood impacts on forest ecosystem function and services. In this paper, we strongly support the recent calls to focus experimental, observational, and modeling efforts on the tree level to improve our understanding of climate change impacts on forests Fatichi et al.
These signals inform us on changes in plant hydraulics and carbon metabolism in xylem and phloem tissues Steppe et al. From the reviewed methods to quantify real-time water and carbon dynamics within a tree stem Steppe et al. As science evolves, other monitoring equipment, such as acoustic emission sensors De Roo et al. Sketch of the TreeWatch. Trees are typically sampled in natural forest ecosystems across the globe dots represent fictitious sampling locations , but can also be city trees in urban settings. The internet-connected plant sensors send their data to the PhytoSense cloud service, which handles data storage, data analysis, data processing, running of process-based model simulationsand calibrations, and sending out notifications.
This all happens in real-time, which enables TreeWatch. The unique approach of combining continuous tree measurements with process-based modeling lays the ground for the next-generation global maps displaying direct biological responses of the sampled trees; information which is currently lacking to bridge the gap with meteorology. Sap flow is measured with a sap flow sensor, which uses heat to sense water movement in the stem xylem and is typically expressed as sap flow rate in g h -1 ; Smith and Allen, ; Steppe et al.
Point dendrometers or linear variable displacement transducers LVDTs measure variations in stem diameter mm at high temporal resolution minute scale. The sensor signal simultaneously displays the integrated result of: Because of the tight coupling between tree hydraulics and radial stem growth and, hence, carbon metabolism, variations in stem diameter are the second vital component in our tree monitoring approach.
Current spatiotemporal knowledge of climate-forest dynamics is primarily based on simulations by dynamic global vegetation models DGVMs. Therefore, Fatichi et al. As discussed previously Steppe et al. Because turgor in living tree cells is mainly built-up during night upon refilling of dehydrated tissues, growth processes mainly occur during the night, and are only optimal when tree water status is also optimal Daudet et al.
If we aspire a better spatiotemporal description of water fluxes together with more realistic scenarios for future climate and the carbon cycle Friedlingstein et al. During the past few decades, implementation and application of process-based tree models has greatly advanced our knowledge on plant hydraulic functioning and growth Steppe et al.
To further our knowledge of climate change impacts on the forest scale, process-based tree models are likely to become increasingly important. In our approach, we advocate a combination of process-based tree modeling and continuous measurements at the tree level to better understand impacts of climate change on forests.
The history and the current-state-of-the-art of possible candidate process-based models have recently been reviewed De Swaef et al. Of particular interest for our approach is that such models feature essential hydraulic parameters resistance and capacitance , and enable simulation of vital, but often difficult to measure variables earlier described turgor, water potential , which all play an important role in hydraulic failure, tree mortality, and, therefore, long-term forest dynamics Fatichi et al.
Whereas continuous tree measurements, including sap flow and stem diameter variation, have been recognized as promising technology for monitoring tree hydraulics and carbon status Anderegg et al.
This is exactly what our approach is aiming at: All processing on the cloud service is performed automatically so that little or no user interaction is required. A wide range of transformations is available: More advanced transformations are also available to calculate sap flow rates in real-time, and to automatically remove disturbances from diameter variation signals. Once defined, transformations are automatically applied each time new data is received.
Besides transformations, PhytoSense also allows to run dynamic simulation models in real-time. Although not required, models are typically first implemented in the plant modeling software PhytoSim 3 and then converted into optimized code, which can run on the PhytoSense platform. These models can be any set of algebraic and first order differential equations [see for instance Steppe et al. This lays the ground for novel stress detection approaches and ecophysiological warning systems, because daily estimates of the calibrated model parameters can now be displayed as time series in real-time from which important tree physiological behavior can be derived.
Finally, notifications can be generated when measured or simulated data is below or above a threshold value for a specified amount of time, when a sensor is offline for a specified amount of time or when a model parameter exceeds the appropriate bounds. Any online data logger can use the API to send data to PhytoSense and custom-build applications or websites can use the API to visualize the available data. This makes the data from the TreeWatch. We plan to gradually extend the network by adding trees across north-south trajectories in different populations in Europe, and in other continents, to profit from a wide climatic gradient going from low temperatures in the northern sites to warm and dry conditions in the southern sites, where tree responses are expected to be temperature- and drought limited, respectively.
Trees will be sampled according to a stringent protocol taking into account various tree characteristics e. Sensors connected to data loggers with wireless data transfer and remote control accessibility are used to send the data to the PhytoSense cloud service. The harmonized data offered by TreeWatch.
Modeling will enable us to put the continuous measurements in a larger context by helping us understand the more general concepts underlying growth and tree hydraulic functioning. Continuous real-time model simulations of the much-needed turgor when aspiring growth modeling, but also dynamics in model parameters, including hydraulic resistance and capacitance, are only a few of the opportunities that will be at hand to perform an integrated survey of tree responses to changes in the regional climate.
These modeled features should be validated with ground-based data from fieldwork to increase confidence in the model, or to further improve it when discrepancies between modeled and measured data are observed. By visualizing hydraulic features, like hydraulic resistance, we will be the first to show changes in tree hydraulics and vulnerability to drought stress in real-time. The real-time aspect is a much-needed feature because now science relies on off-line, destructively collected vulnerability curves Choat et al. The results from TreeWatch.
Especially in DGVMs, the use of coarse scale observations and potentially incorrect mechanisms could mislead mitigation and adaptation plans of the future Hanson and Gunderson, But the use of TreeWatch. At present, a maple Acer pseudoplatanus L. But because of its intrinsic educational power, one of the long-term dissemination perspectives of TreeWatch. She directed the network toward simultaneous application of continuous tree measurements and process-based modeling. She established the first network in Belgium and added a city tree to demonstrate its potential.
She supervises the practical work, the analysis and data interpretation, and she coordinates the modeling activities within the TreeWatch.
DD developed the PhytoSense cloud service and supervises data acquisition, data processing and visualization. He also assists in modeling activities.