Tuesday, 14 September 2010

Underlying Statistical Model

The intensity y of a single probe is given by:

Y = B + αS

Where B is the background noise composed of many different terms including optical effects and non-specific binding effects (this is important as this means it will be non-systematic and so normally distributed according to the CLT). Alpha is a gain factor and S is the specific binding. S will also be a random variable to account for probe effects and measurement error, and these errors are assumed to be multiplicative.

log(S) = θ + φ + ε

The variability of B and α between arrays means that even if S is the same it will not appear to be the same. This can be demonstrated by the use of spiked in amounts of a gene. There is attenuation bias - low concentrations are more strongly affected by background. This can be seen from the example SpikeInSubset from the affy library.


library("affy")
library("SpikeInSubset")
data(spikein133)
Index <- which(probeNames(spikein133)%in%colnames(pData(spikein133)))
pms <- pm(spikein133)[Index,]
genenames <- probeNames(spikein133)[Index]
nominal <- t(sapply(probeNames(spikein133)[Index],function(i) pData(spikein133)[,i]))
x <- as.vector(log2(nominal))
y <- as.vector(log2(pms))
avg <- tapply(y,x,mean,na.rm=TRUE)


This extracts the data subset we want to plot from the SpikeInSubset and scales it to log2. This is then plotted with:

plot(jitter(x),y,las=1,pch=".",
ylab="Log (Base 2) PM Intensity",
xlab="Nominal Log (Base 2) Concentration")
lines(as.numeric(names(avg)),avg,lwd=3,col="green")



The additive-multiplicative model is confirmed by plotting a large series of probe intensities for a given nominal concentration (1 picomol), against a normally distributed error. The resulting QQ-plot lies mostly on the diagonal confirming that the experimental intensities are normally distributed (this could have been done with a KS-test).

This is plotted with:
qqnorm(y[x==0],pch=".")

qqline(y[x==0],col="red")

No comments:

Post a Comment