Text Box:

Ronen Basri: Research

Scroll to next research page.Scroll to previous research page.Scroll up to table of content.

Analysis of Biological Data through clustering

 

Unsupervised clustering can provide a useful means for understanding complex data. In two studies we used clustering to reveal secondary structures in proteins and to analyze RNA expression data. In the first study we used data of folded proteins that included their backbone along with covalent interactions and hydrogen bonds. By seeking recurring network motifs in this data we managed to partition proteins into their parts and reveal clusters that correspond to α-helices, parallel and anti-parallel β-sheets, loops, and some hybrid, non-conventional structures. In the second study we exploited the bilinear nature of Singular Value Decomposition (SVD) and the symmetric construction of bi-stochastic matrices to construct spectral co-clustering algorithms. The objective of these algorithms was to reveal a “checkerboard” pattern in the expression data, after some normalization, potentially indicating the presence of marker genes whose expression levels are either up– or down-regulated in patients with particular type of cancer tumor.

What lies ahead? Many related problems can potentially be addressed using data analysis techniques (of course in concert with other, biological techniques). To mention a few, we would like to relate the structure of proteins with their function, e.g., by identifying their active sites. Also, better methods for understanding the role of genes in diseases are needed. Finally, RNA expression data appears to be very noisy; recent improvements in acquisition technologies as well as methods that combine expression data with data from other sources may improve our ability to reveal the underlying biological interactions despite the noise.

Our unsupervised approach for revealing secondary structures in protein was published in
                Barak Raveh, Ofer Rahat,
Ronen Basri, and Gideon Schreiber, “Rediscovering Secondary Structures as Network Motives –                 an Unsupervised Learning Approach,” Bioinformatics 23: e163-e169, 2007. Awarded best student paper in ECCB-06 and                 9th Israeli Bioinformatics Symposium, 2006.

The spectral co-clustering algorithm was published in
                Yuval Kluger,
Ronen Basri, Joseph T. Chang, and Mark Gerstein, “Spectral biclustering of microarray data: coclustering                 genes and conditions,” Genome Research, 13(4), 703-716, 2003.

Revealing secondary structures in proteins. The left panel shows the pairwise-distance matrix produced by our clustering algorithm, with the detected clusters marked by red squares. The arrows indicate the known secondary structures that match those clusters. The other two panels show an input protein (left, cartoon drawn according to DSSP) labeled with our clustering (middle, labeling displayed as color overlay).

Protein structure according to DSSP.Annotation of protein. Raveh et al., Bioinformatics 2007.Clustering of protein sub-structures. Raveh et al., Bioinformatics, 2007.