You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2011/04/01 11:27:51 UTC
Re: Different policy for concurrency access in TDB supporting a single
writer and multiple readers
On 30/03/11 22:46, Stephen Allen wrote:
> Andy,
> As an aside, I recall you mentioning that you had a BDB version of
> TDB, using that would seem to offer a fast, stable way of adding
> transactions to your B-trees. Out of curiosity, were there problems
> with using BDB?
https://github.com/afs/TDB-BDB
No problems as such but it just isn't very fast (non-transactionally).
There is no bulk loading advantage at all, and query performance was
slower but OK. That's before turning on transactions. As the data
scaled, the difference between TDB native and TDB-BDB became more
pronounced.
BDB-C and BDB-JE are about the same speed.
Given they were already slower, and for TxTDB, I want to retain
reader-performance, that doesn't look like a good starting point.
It might be a good place for a version with different goals - less
emphasis on scale, more on high-frequency writer (and less reads), for
example a sensor data hub.
I don't know why they are slower but I speculate that the general
purpose design of both BDBs (e.g. fully variable length key and value,
node size, overhead in the tree blocks for all sorts of features not
used) means it is optimized for something else. BDB is designed for
highed-write concurrency - RDF datastores are for publishing (read
dominant). Sometimes these design objectives pull in different directions.
I used BDB to store the string table as well (lexical forms of nodes).
It was better to use a native string file.
Maybe it's a case of not using them to their best advantage.
tdbloader1 simply does the loading work in an order that is better than
adding triples one at a time, inbexsing as you go. It loads the primary
index, then builds the secondary indexes by copying from the primary.
That applies to BDB but it didn't help.
tdbloader2 uses Unix sort(1) to prepare the index data by sorting into
the order for each index, then writes the B+Trees directly to disk (from
the bottom up and very carefully).
Andy