Data Deluge: Deeper look at the underlying statistical models for high throughput data.

Rocke and Durbin 2001

The additive multiplicative model for microarray data was first presented by Rocke and Durbin in 2001 [A Model for Measurement Error for Gene Expression Arrays, D.M. Rocke and B. Durbin J. Comp. Biol 8 557–569]. They based their model on error models used in analytical chemistry such as gas chromotography and mass spectroscopy. The model they give for general concentration based assays is:

y = α + β μ exp(η) + ε

Where y is the reponse and μ is the concentration and η and ε are normally distributed random variables with mean 0 and standard deviations σ(η) and σ(ε) respectively.

Thinking about how the terms ε and η play a role in the experimental data. The additive error ε will be a more significant component when concentrations are close to zero, whereas the multiplicative error η will be more significant at high concentrations.

The concentrations are then calculated by using a callibration curve to determine β but in the case of microarrays where there are two channel arrays or where the experiment is determining differences in expression there is no need for an absolute callibration and so β can be ignored.

This gives a three parameter model:

y = α + μ exp(η) + ε

The parameters can then either be estimated by the use of technical replicates or by the introduction of negative controls. The error ε becomes the B term of the general microarray model.

Ideker et al.2000

Ideker et al.2000 combined α and μ [Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data, T. Ideker, V. Thorsenn, A.F Siegel and L. Hood J. Comp. Biol. 7 805–817.] By writing the model in the form:

y = μ + μ ε + δ

Where μ is the mean intensity and ε and δ are the multiplicative and additive errors respectively. Using this form we can combine the $mu; terms, which is equivalent to losing the $alpha; term from Rocke and Durbin. This gives the most commonly used form of the error model.

The multiplicative error μ can then be decomposed into different components.

Data Deluge

Thursday, 14 October 2010

Deeper look at the underlying statistical models for high throughput data.

Rocke and Durbin 2001

Ideker et al.2000

No comments:

Post a Comment