“Improving Identification of Compounds in Metabolomic Studies through Correlation Statistics“
Pablo Hoijemberg, Ph.D., Princeton University
- 6:00 pm Dinner
7:00 pm Seminar
- CABM – Room 010 (Center for Advanced Biotechnology and Medicine)
- Rutgers Busch Campus
- 679 Hoes Lane West, Piscataway NJ 08854
$15 employed / $5 students, postdoc, retired, unemployed. No cost for seminar only.
Abstract: In an untargeted metabolomic study the search for biomarker molecules serves to answer many questions, for which there is a need to find out the identity of these compounds of interest. Several dozen metabolites are normally detected by NMR analysis of a biofluid in measurable quantities. These spectra can have about a few hundred peaks, including overlapping peaks, variable multiplicities and different peak widths. The identification of compounds is normally done with the aid of commercial software packages containing their own databases, by literature search, and/or by searches in public databases by lists of chemical shifts. 2D NMR spectra also aid, by means of finding correlations to 13C atoms, to other 1H atoms, or by evaluating multiplets in a J-Resolved experiment. Given the amount of data collected for the multivariate data analysis, “statistical correlations” are attainable and are of utmost help to submit a “better” query on a database. The most popularized version so far was published almost a decade ago and named STOCSY, standing for Statistical TOtal Correlation SpectroscopY (Cloarec, O. et al., Anal. Chem., 2005, 77(5), 1282-1289, from Imperial College, UK). It is the statistical analysis of several experiments, not an NMR experiment per se, and it is based on the colinearity of the variations of the intensities of the peaks pertaining to the same compound over the spectra set, due to the changes in composition among the samples. Being “born” as a tool analyzing homo-spectroscopies (correlations of 1H to 1H), it had adaptations and variations that led to correlation analysis for biomarker identification on experiments of different nuclei, diffusion-edited and cross-platform (with mass spectrometry for example), as well as for finding pathway connectivities. It can be applied to 1D and 2D data, as well as to “small size” data matrices like in a “spectrum-to-spreadsheet” procedure (which I like to name Stick-STOCSY). Improvements for information recovery can be obtained by further statistical analysis on the (information redundant) STOCSY data matrix. 13C and 1D projections from 1H 2D J-Resolved spectra proved also to be good experiments to use STOCSY on, as the tool suffers from overlapped peaks that abound in some regions of the standard 1H spectrum of biofluids. Examples of its application on biological samples and synthetic mixtures will be shown (as it is not exclusive to biological samples).