Clustering Exercises (Fri Nov 24 2006) ====================================== Follow-up course of Microarray Data Analysis Nov 20-24 2006, PICB Shanghai Angela Becker, Peter Serocka 1. Final view onto the data from the Manila course: Normalised expression data from slide 1 (named slide205809) are supplied in /picb/course/41-Manila-Expr-Data as .mev files. Load them into TIGR MeV and compare in the main "heat map" view. As the data a supposed be replicates, how would you judge the reproducability? Does it make sense to apply statistics to measure significant differential expression? 2. Prepare Gain-of-function analysis on drosophila: Two time-series expression tables of 1000 genes (selected by highest occuring foldness) are provided in /picb/course/60-Drosophila/ . Load drosophila-gof.txt (TAB-delimited) into TIGR MeV Use "Utility" -> "Search" to visualise Angela's results by defining three "clusters" based on the following words found in the annotation: i. Angela-GOF (->assign color red) ii. Angela-LOF (->assign color blue) iii. Angela-HR (->assign color yellow) 3. Hierarchical clustering a. Perform hierarchical clustering (HCL) of the gene with default parameters. How do Angela's groups of candidates (colored in red, blue, yellow due to step 2) show up in the tree? a. Perform hierarchical clustering with different linkage types. Compare the results to the tree from step b.: Are the candidate groups arrange differently? 4. K-Means clustering a. Perform k-means clustering (KMC) first with default parameters, then with "Pearson correlation" as a metric. In which way do the expression graphs of individual clusters reflect the different ways ("metrics") of measuring the similarities of expression profiles? b. Perform KMC with default parameters, but ask for 20 clusters. Find small clusters with candidate genes as well as non-candidate genes. Find out more (from Angela) on how candidate genes where discriminated from non-candidates. 5. Other clustering methods Try out self-organized trees (SOTA) and self-organized maps (SOM), starting with the default parameters. 6. Gain-of-function vs. Loss-of-function Load drosophila-lof.txt into a NEW, separate MeV array view and color-label annotated candidates as in step 2. Using the clustering method(s) of you choice, find non-candidates that are similar to the candidates and present your results to Angela. 7. Does it seem useful to load both GOF and LOF tables into a common table and perform clustering?