How W4M can be used to perform statistics including exploratory data analysis, hypothesis testing, machine learning and feature selection ?
At the end of the course, you will be able :
- to view the data (PCA, heatmap) ;
- perform statistical tests and apply corrections for multiple testing ;
- build predictive models (PLS, Random Forest, SVM) ;
- select the variables which are signifcant for the predictive model.
Basic knowledge of biostatistics and multivariate data analysis.
Here, we describe how to analyze a ‘sample by variable’ table of intensities, such as the one we generated during the previous ‘processing’ step. The objective is to explore the data (e.g. detect trends, clusters, or outliers), perform univariate hypothesis tests, build predictive models for the factor of interest (regression or classification), and select the significant variables (i.e. the molecular signature) for robust and high performance.
- Exploratory data analysis
- Hypothesis testing
- Multivariate predictive modeling
- Feature selection
W4M allows you to build comprehensive and reproducible workflows for data analysis. Diagnostics and correction methods are included to correct for multiple testing and avoid overfitting. The available modules can be applied to targeted or untargeted omics data.
- Guitton et al. (2017). Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics. The International Journal of Biochemistry and Cell Biology. https://doi.org/10.1016/j.biocel.2017.07.002
- Rinaudo et al. (2016). Biosigner : a new method for the discovery of significant molecular signatures from omics data. Frontiers in Molecular Biosciences. https://doi.org/10.3389/fmolb.2016.00026
- Shared statistical history: W4M00001 (http://workflow4metabolomics.org/W4M00001)
- Shared statistical history: W4M00003 (http://workflow4metabolomics.org/W4M00003)
- Thévenot et al. (2015). Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research. 10. https://doi.org/10.1021/acs.jproteome.5b00354
- Van Belle et al. (2004). Biostatistics - a methodology for the health sciences. Wiley
- Wehrens (2011). Chemometrics with R: multivariate data analysis in the natural sciences and life sciences. Springer