Abstract
Partial least squares (PLS) is a widely used algorithm in the field of chemometrics. In calibration studies, a PLS variant called orthogonal projection to latent structures (O-PLS) has been shown to successfully reduce the number of model components while maintaining good prediction accuracy, although no theoretical analysis exists demonstrating its applicability in this context. Using a discrete formulation of the linear mixture model known as Beer's law, we explicitly analyze O-PLS solution properties for calibration data. We find that, in the absence of noise and for large n, O-PLS solutions are simpler but just as accurate as PLS solutions for systems in which analyte and background concentrations are uncorrelated. However, the same is not true for the most general chemometric data in which correlations between the analyte and background concentrations are nonzero and pure profiles overlap. On the contrary, forcing the removal of orthogonal components may actually degrade interpretability of the model. This situation can also arise when the data are noisy and n is small, because O-PLS may identify and model the noise as orthogonal when it is statistically uncorrelated with the analytes. For the types of data arising from systems biology studies, in which the number of response variables may be much greater than the number of observations, we show that O-PLS is unlikely to discover orthogonal variation whether or not it exists. In this case, O-PLS and PLS solutions are the same.
Original language | American English |
---|---|
Pages (from-to) | 514-525 |
Number of pages | 12 |
Journal | Journal of Chemometrics |
Volume | 25 |
Issue number | 9 |
DOIs | |
State | Published - 2011 |
NREL Publication Number
- NREL/JA-5100-49498
Keywords
- Beer's law
- Mid-infrared (MIR) calibration
- O-PLS
- Partial least squares (PLS)
- Systems biology