Back to overview

On-disk storage techniques for semantic web data - are B-trees always the optimal solution?

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Publication date 2009
Author Weiss Cathrin, Bernstein Abraham,
Project The scalable triple store - towards web scale storage for RDF data
Show all

Proceedings (peer-reviewed)

Title of proceedings Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2
Place Washington, DC

Open Access

Type of Open Access Repository (Green Open Access)


Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B- trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.