What is IndexedHBase?
As data intensive problems evolve, many research projects require efficient analysis of a target subset of data, rather than the whole data set. IndexedHBase is a storage system that extends HBase with a customizable indexing framework to support fast queries and analysis of interesting data subsets. Leveraging an architecture based on YARN, IndexedHBase can be integrated with various parallel computing platforms, such as Hadoop MapReduce and Twister, to complete efficient analysis of the query results.
What can IndexedHBase do?
By building index structures that are specially customized for the actual applications, IndexedHBase can achieve a query evaluation speed that is siginifcantly faster (by one to two orders of magnitudes) than using the existing indexing techniques provided by commercial NoSQL databases such as Riak.
IndexedHBase has been successfully used in several applications, including Text Indexing (with ClueWeb09), LCIR Synonym Mining, and Social Data Analysis (with Twitter data sets). Please check out our System Design and Publications for more details.
Funding and Leadership
This project is in part supported by National Science Foundation CAREER Grant OCI-1149432, and supervised by Judy Qiu, Assistant Professor of Computer Science, School of Informatics and Computing at Indiana University Bloomington.