Saturday, 23 October 2010

Finding the Clusters

When is a cluster real or an artefact of the analysis or noise in the data? You want to find consistency of the dendrograms between methods and also for different levels of clustering. Confidence in the clustering falls as there are more contradictions between methods and between filtering cut-offs. Here are some ideas about what you should be looking at:
  1. What is the effect of the different normalisation methods? - compare rma with gcrma for example
  2. What is the effect of filter cut-off? - try samples of 2000 and 500 genes
  3. Is the clustering affected by using different measures? Manhattan vs Euclidean
  4. Is the clustering affected by the clustering algorithm? - agglomerative or divisive

