You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2011/04/01 11:27:51 UTC

Re: Different policy for concurrency access in TDB supporting a single writer and multiple readers


On 30/03/11 22:46, Stephen Allen wrote:
> Andy,

> As an aside, I recall you mentioning that you had a BDB version of
> TDB, using that would seem to offer a fast, stable way of adding
> transactions to your B-trees.  Out of curiosity, were there problems
> with using BDB?

https://github.com/afs/TDB-BDB

No problems as such but it just isn't very fast (non-transactionally). 
There is no bulk loading advantage at all, and query performance was 
slower but OK.  That's before turning on transactions.  As the data 
scaled, the difference between TDB native and TDB-BDB became more 
pronounced.

BDB-C and BDB-JE are about the same speed.

Given they were already slower, and for TxTDB, I want to retain 
reader-performance, that doesn't look like a good starting point.

It might be a good place for a version with different goals - less 
emphasis on scale, more on high-frequency writer (and less reads), for 
example a sensor data hub.

I don't know why they are slower but I speculate that the general 
purpose design of both BDBs  (e.g. fully variable length key and value, 
node size, overhead in the tree blocks for all sorts of features not 
used) means it is optimized for something else. BDB is designed for 
highed-write concurrency - RDF datastores are for publishing (read 
dominant).  Sometimes these design objectives pull in different directions.

I used BDB to store the string table as well (lexical forms of nodes). 
It was better to use a native string file.

Maybe it's a case of not using them to their best advantage.

tdbloader1 simply does the loading work in an order that is better than 
adding triples one at a time, inbexsing as you go.  It loads the primary 
index, then builds the secondary indexes by copying from the primary. 
That applies to BDB but it didn't help.

tdbloader2 uses Unix sort(1) to prepare the index data by sorting into 
the order for each index, then writes the B+Trees directly to disk (from 
the bottom up and very carefully).

	Andy