Publication

Back to overview

On-disk storage techniques for semantic web data - are B-trees always the optimal solution?

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Publication date 2009
Author Weiss Cathrin, Bernstein Abraham,
Project The scalable triple store - towards web scale storage for RDF data
Show all

Proceedings (peer-reviewed)

Title of proceedings Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2
Place Washington, DC

Open Access

URL http://ceur-ws.org/Vol-517/ssws09-paper4.pdf
Type of Open Access Repository (Green Open Access)

Abstract

Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B- trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.
-