BlockCluster Project

Cluster analysis is an important tool in a variety of scientific areas such as pattern recognition, information retrieval, micro-array, data mining, and so forth. Although many clustering procedures such as hierarchical clustering, K-means or self-organizing maps, aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks as illustrated in figure below for binary data as a toy example.


To perform such clustering, we recently developed a package that allows to perform Co-clustering on binary, contingency and continuous datasets. The core library is developed in C++ and currently we distribute it in form of R package. The package is available on CRAN here. To begin with, a small tutorial is available for download. Try this package online.

Applications and Usage

In recent years, co-clustering have found numerous applications in a variety of scientific fields. Some of these applications include (but not limited to) following:

  • Data mining
    • Document Clustering
    • Collaborative filtering and Recommendation systems
  • Information retrieval
    • Web usage statistics
    • Social tagging systems
  • Computer vision
    • Scene modeling
    • Image grouping
  • Biology
    • Gene Expression data analysis

As the data can come in variety of formats (mostly binary, categorical and continuous form) , it would be very interesting to develop a framework that can perform Co-clustering on all the types of data in one place. This is what exactly is provided by this new package. To arouse your curiosity, we provide below examples with two types of data-sets. More details about these examples can be found in the tutorial mentioned above.

Image Segmentation

Document Clustering