2011年5月9日月曜日

Our SDM2011 paper about fast graph similarity search for massive graph databases


I gave a presentation of our paper at 2011 SIAM international conference on data mining (SDM2011). The paper is about a fast graph similarity search for massive graph databases.  Our method uses wavelet tree which is a memory efficient data structure on arrays [1]. Wavelet tree enables a fast array operations, such as range intersection and range minimum query. Recently, Shervashidze and Borgwardt proposed a method for representing a graph as bag-of-words, called  Weisfeiler-Lehman kernel [2]. First, our method represents input graphs as bag-of-words and then index graphs by wavelet tree. Graph similarity search for query graphs can efficiently be solved as semi conjunctive query on wavelet tree. Our method successfully applied 25 million molecular graphs in PubChem.
The paper and presentation slide are available as follows.

Yasuo Tabei and Koji Tsuda: Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees, Eleventh SIAM International Conference on Data Mining (SDM), Arizona, USA, 2011.  Link to the paper

A C++ implementation is available in the following website.
http://code.google.com/p/gwt/
References 
[1] R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 636–645, 2003.
[2]N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Advances in Neural Information Processing Systems (NIPS), 2010.