Data Deluge: Clustering the Arrays

Thursday, 21 October 2010

Clustering the Arrays

If you are looking at biological variation you may want to cluster the arrays. When analysing a study of 54 tissue culture arrays from lung cancer cell lines the arrays were clustered after filtering using euclidean distance matrices for the log2 expression levels.

dgcrma1 <- dist(log2(t(exprs(fLCgcrma1))),method="euclidian")
hcgcrma1 <- hclust(dgcrma1, method = "average")
plot(hcgcrma1)

The cluster dendrograms from the different normalisation methods can then be compared, and a concensus devised. In this case RMA performed slightly worse than FARMS and GCRMA as the conserved clusters from the other two methods showed some variation in RMA. The factor that makes the biggest difference to the ordering is the number of genes in the clustering and so dendrograms are very sensitive to the filtering step.

Data Deluge

Thursday, 21 October 2010

Clustering the Arrays

No comments:

Post a Comment