2011年5月9日月曜日

Our SDM2011 paper about fast graph similarity search for massive graph databases


I gave a presentation of our paper at 2011 SIAM international conference on data mining (SDM2011). The paper is about a fast graph similarity search for massive graph databases.  Our method uses wavelet tree which is a memory efficient data structure on arrays [1]. Wavelet tree enables a fast array operations, such as range intersection and range minimum query. Recently, Shervashidze and Borgwardt proposed a method for representing a graph as bag-of-words, called  Weisfeiler-Lehman kernel [2]. First, our method represents input graphs as bag-of-words and then index graphs by wavelet tree. Graph similarity search for query graphs can efficiently be solved as semi conjunctive query on wavelet tree. Our method successfully applied 25 million molecular graphs in PubChem.
The paper and presentation slide are available as follows.

Yasuo Tabei and Koji Tsuda: Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees, Eleventh SIAM International Conference on Data Mining (SDM), Arizona, USA, 2011.  Link to the paper

A C++ implementation is available in the following website.
http://code.google.com/p/gwt/
References 
[1] R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 636–645, 2003.
[2]N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Advances in Neural Information Processing Systems (NIPS), 2010.

2011年2月26日土曜日

LGM: A Novel Mining Algorithm from Linear Graphs

Our paper has been accepted at PAKDD2011, an international conference for data-mining. The paper is about a novel mining algorithm from linear graphs. This is joint work with Daisuke Okanohara from Preferred Infrastructure, Shuichi Hirose and Koji Tsuda from AIST. I've uploaded the paper in arxiv.org. 


LGM: Mining Frequent Subgraphs from Linear Graphs, Yasuo Tabei, Daisuke Okanohara, Shuichi Hirose, Koji Tsuda, The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011)link to the paper

Linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. In this paper, we designed an algorithm for mining all frequent subgraphs from linear graphs by applying the reverse search technique.


I've uploaded a power point slide in the following.