You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "A. Soroka" <aj...@virginia.edu> on 2015/09/26 19:31:48 UTC

Re: Timing tests for jena-624: doing better

I’ve committed the change to using separate triple and quad indexes (via DatasetGraphTriplesQuads). There appears to be definite and significant improvement, from Andy’s numbers showing the current implementation getting 5 times the load performance of the new implementation to my numbers (below) which show the new impl improved so that the current impl is at maybe 2.5 times its performance. Thanks for that advice, Andy! 

I’ll probably take a look next at moving to a more powerful library for persistent structures that might either perform better raw or offer finer control over tree creation as discussed above in this thread.

On a related note, are there any Jena standard parts for query testing for this kind of situation? I know that BSBM has several sophisticated suites of tests defined, but are any of them considered particularly appropriate, or has anyone out there in dev-land built their own harness for BSBM or something else that I could “borrow”? {grin}

— 
A. Soroka
The University of Virginia Library

=== Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ====
    Size: 1,000,312 (2.947s, 339,434 tps)
==== DSG/mix/auto (warm N=3)
==== DSG/mix/txn  (warm N=3)
==== DSG/mem/auto (warm N=3)
==== DSG/mem/txn  (warm N=3)
==== DSG/mix/auto (N=20)
==== DSG/mix/auto (N=20) Time: 108.331s (184,676 tps)
==== DSG/mix/txn  (N=20)
==== DSG/mix/txn  (N=20) Time: 105.424s (189,769 tps)
==== DSG/mem/auto (N=20)
==== DSG/mem/auto (N=20) Time: 283.680s (70,523 tps)
==== DSG/mem/txn  (N=20)
==== DSG/mem/txn  (N=20) Time: 224.501s (89,114 tps)

> On Sep 26, 2015, at 9:21 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 26/09/15 12:07, A. Soroka wrote:
>> Ooh! Those numbers are awful.
> 
> Early days. The general purpose dataset has no features.   And, of course, a concurrent read is completely blocked - that's a major issue for some usages.
> 
> Access performance, having update not block query, in a very reliable implementation is a valuable thing to have. And if it is described as a "complete temporal database", it is all a good thing.  Marketing.
> 
> The storage implementation is now a self-contained thing to look at. ... seems there is no shortage of options ... google quickly got me:
> 
> http://stackoverflow.com/questions/8575723/whats-a-good-persistent-collections-framework-for-use-in-java
> 
> and there are more.  Various data structures I have not heard of before.
> 
>> Per your point 2, it does create a new
>> tree per add/remove. And PCollections’ bulk operations are just loops
>> over the single-element operations, so trying to accumulate data and
>> use a single operation will create the same number of trees.
>> Unfortunately, PCollections does not have something like Clojure’s
>> transient operations [*], where under carefully-controlled conditions
>> a normally persistent structure can be mutated in place for celerity
>> of operation. I have no commitment to PCollections, and I can switch
>> and see what happens with Clojure and transiency. But I should first
>> go back over the code with a fine-toothed comb and make sure that
>> there isn’t a plain old mistake of some kind.
>> 
>> As far as the indexes, I’m not quite sure what you mean by
>> “triples+quads”. Do you mean a single map from graph name to  three
>> triple-covering indexes? Something like Map<Node, TripleIndex>, with
>> TripleIndex having within it three covering indexes for triples in
>> the way that current HexIndex has within it six covering indexes for
>> quads?
> 
> That's one way - I meant using the supporting framework in DatasetGraphTriplesQuads so
> 
> DatasetGraphQuads => DatasetGraphTriplesQuads
> 
> The default graph is handled separately from named graphs.
> 
> TDB uses this - there is a triple table (dft: 3 index) and a quads table (dft: 6 index)
> 
> 	Andy
> 
>> 
>> --- A. Soroka The University of Virginia Library
>> 
>> [*] http://clojure.org/transients
>> 
>>> On Sep 26, 2015, at 6:42 AM, Andy Seaborne <an...@apache.org>
>>> wrote:
>>> 
>>> Some thoughts:
>>> 
>>> 1/ If it were a triples+quads design (TripleTable, QuadTable) , not
>>> just quads, there would be 3 indexes not 6 for triples so 2x
>>> faster.
>>> 
>>> 2/ As autocommit and txn forms are nearly the same, I guess that
>>> every add(Quad) is causing a new pcollections tree for each index.
>>> 
>>> I don't know pcollections but is it possible to use it so a
>>> independent tree is created only at begin(W). i.e. copy-to-root
>>> does not happen on stuff updated already touched after begin(W).
>>> 
>>> Andy
>> 
> 


Re: Timing tests for jena-624: even a little better

Posted by "A. Soroka" <aj...@virginia.edu>.
Sorry for spamming the list a bit today, but before COB I wanted to offer some more figures on this effort. Using a port of Scala’s immutable collections [*] in a new branch [**] the new implementation is now seeing a little better than half the load performance of the “stock” impl (see below sig). Of course these figures are very rough, but hopefully they demonstrate motion in the right direction. I still intend to try out Clojure’s collections, but I think I’m a lot closer to a realistic level of performance. I hope to demonstrate something about the query performance here soon.

[*] https://github.com/andrewoma/dexx

[**] https://github.com/ajs6f/jena/tree/jena-624-dexx

Anyone who is interested in examining these branches should be aware that they are currently moving targets— commits several times a day.

---
A. Soroka
The University of Virginia Library



Running org.apache.jena.sparql.core.mem.PerfTest
==== Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ====
    Size: 1,000,312 (2.978s, 335,900 tps)
==== DSG/mix/auto (warm N=3)
==== DSG/mix/txn  (warm N=3)
==== DSG/mem/auto (warm N=3)
==== DSG/mem/txn  (warm N=3)
==== DSG/mix/auto (N=20)
==== DSG/mix/auto (N=20) Time: 97.761s (204,644 tps)
==== DSG/mix/txn  (N=20)
==== DSG/mix/txn  (N=20) Time: 101.668s (196,780 tps)
==== DSG/mem/auto (N=20)
==== DSG/mem/auto (N=20) Time: 211.971s (94,381 tps)
==== DSG/mem/txn  (N=20)
==== DSG/mem/txn  (N=20) Time: 151.359s (132,177 tps)

> On Sep 26, 2015, at 1:31 PM, A. Soroka <aj...@email.virginia.edu> wrote:
> 
> I’ve committed the change to using separate triple and quad indexes (via DatasetGraphTriplesQuads). There appears to be definite and significant improvement, from Andy’s numbers showing the current implementation getting 5 times the load performance of the new implementation to my numbers (below) which show the new impl improved so that the current impl is at maybe 2.5 times its performance. Thanks for that advice, Andy! 
> 
> I’ll probably take a look next at moving to a more powerful library for persistent structures that might either perform better raw or offer finer control over tree creation as discussed above in this thread.
> 
> On a related note, are there any Jena standard parts for query testing for this kind of situation? I know that BSBM has several sophisticated suites of tests defined, but are any of them considered particularly appropriate, or has anyone out there in dev-land built their own harness for BSBM or something else that I could “borrow”? {grin}
> 
> — 
> A. Soroka
> The University of Virginia Library
> 
> === Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ====
>    Size: 1,000,312 (2.947s, 339,434 tps)
> ==== DSG/mix/auto (warm N=3)
> ==== DSG/mix/txn  (warm N=3)
> ==== DSG/mem/auto (warm N=3)
> ==== DSG/mem/txn  (warm N=3)
> ==== DSG/mix/auto (N=20)
> ==== DSG/mix/auto (N=20) Time: 108.331s (184,676 tps)
> ==== DSG/mix/txn  (N=20)
> ==== DSG/mix/txn  (N=20) Time: 105.424s (189,769 tps)
> ==== DSG/mem/auto (N=20)
> ==== DSG/mem/auto (N=20) Time: 283.680s (70,523 tps)
> ==== DSG/mem/txn  (N=20)
> ==== DSG/mem/txn  (N=20) Time: 224.501s (89,114 tps)
> 
>> On Sep 26, 2015, at 9:21 AM, Andy Seaborne <an...@apache.org> wrote:
>> 
>> On 26/09/15 12:07, A. Soroka wrote:
>>> Ooh! Those numbers are awful.
>> 
>> Early days. The general purpose dataset has no features.   And, of course, a concurrent read is completely blocked - that's a major issue for some usages.
>> 
>> Access performance, having update not block query, in a very reliable implementation is a valuable thing to have. And if it is described as a "complete temporal database", it is all a good thing.  Marketing.
>> 
>> The storage implementation is now a self-contained thing to look at. ... seems there is no shortage of options ... google quickly got me:
>> 
>> http://stackoverflow.com/questions/8575723/whats-a-good-persistent-collections-framework-for-use-in-java
>> 
>> and there are more.  Various data structures I have not heard of before.
>> 
>>> Per your point 2, it does create a new
>>> tree per add/remove. And PCollections’ bulk operations are just loops
>>> over the single-element operations, so trying to accumulate data and
>>> use a single operation will create the same number of trees.
>>> Unfortunately, PCollections does not have something like Clojure’s
>>> transient operations [*], where under carefully-controlled conditions
>>> a normally persistent structure can be mutated in place for celerity
>>> of operation. I have no commitment to PCollections, and I can switch
>>> and see what happens with Clojure and transiency. But I should first
>>> go back over the code with a fine-toothed comb and make sure that
>>> there isn’t a plain old mistake of some kind.
>>> 
>>> As far as the indexes, I’m not quite sure what you mean by
>>> “triples+quads”. Do you mean a single map from graph name to  three
>>> triple-covering indexes? Something like Map<Node, TripleIndex>, with
>>> TripleIndex having within it three covering indexes for triples in
>>> the way that current HexIndex has within it six covering indexes for
>>> quads?
>> 
>> That's one way - I meant using the supporting framework in DatasetGraphTriplesQuads so
>> 
>> DatasetGraphQuads => DatasetGraphTriplesQuads
>> 
>> The default graph is handled separately from named graphs.
>> 
>> TDB uses this - there is a triple table (dft: 3 index) and a quads table (dft: 6 index)
>> 
>> 	Andy
>> 
>>> 
>>> --- A. Soroka The University of Virginia Library
>>> 
>>> [*] http://clojure.org/transients
>>> 
>>>> On Sep 26, 2015, at 6:42 AM, Andy Seaborne <an...@apache.org>
>>>> wrote:
>>>> 
>>>> Some thoughts:
>>>> 
>>>> 1/ If it were a triples+quads design (TripleTable, QuadTable) , not
>>>> just quads, there would be 3 indexes not 6 for triples so 2x
>>>> faster.
>>>> 
>>>> 2/ As autocommit and txn forms are nearly the same, I guess that
>>>> every add(Quad) is causing a new pcollections tree for each index.
>>>> 
>>>> I don't know pcollections but is it possible to use it so a
>>>> independent tree is created only at begin(W). i.e. copy-to-root
>>>> does not happen on stuff updated already touched after begin(W).
>>>> 
>>>> Andy
>>> 
>> 
> 


Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.
Okay, that would seem to me to be mostly about documentation/messaging, since Graph could be implemented by any fellow off the street.

After a few simple tests on freshly-generated BSBM data (different flavors of find()), the results are pretty much as one would expect. When queries tilt towards the iterative end of things (more wildcards in non-graph-name positions), the stock implementation wins out, usually by a factor of two or three. The iterative machinery in the new implementation is heavier (using the Streams API), so that’s not surprising. When queries tilt to direct retrieval (fewer wildcards), letting the new implementation really make use of its “INDEX ALL THE THINGS” maps, the new implementation wins, sometimes by a little, sometimes by a factor of several dozen. I’m eager to see what real-world use looks like!

---
A. Soroka
The University of Virginia Library

> On Oct 6, 2015, at 6:05 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 05/10/15 20:57, A. Soroka wrote:
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>> 
>> Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?
> 
> Yes - and also the fact it'll materialize the triples which if it's some complicated backward chained inference setup might lead to a lot of work/space.  We just need to manage the integration/migration.
> 
> 	Andy
> 
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> On 29/09/15 15:00, A. Soroka wrote:
>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>> I can't try out your new stuff for a few days due to not being near
>>>>> a suitable computer.
>>>> 
>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>> the branch shows improvement to within half of the stock performance.
>>> 
>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>> 
>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>> 
>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>> 
>>>> I have tried now with some variations using the Clojure types (shown
>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>> functionality, because Clojure transients do not afford iteration,
>>>> which is needed to support find(). It seems feasible to me that a
>>>> custom implementation with the ability to use mutate-in-place within
>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>> kettle of fish.
>>>> 
>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>> include something that exercises property paths, which actually
>>>> happen to be very interesting for a few use cases in which I am
>>>> interested). I’m not sure how to engage real world use very
>>>> effectively. I can certainly spin up examples, but it seems like we
>>>> would want a broader set of users than just me to try it out, no?
>>>> {grin}
>>> 
>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>> 
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>> 
>>> Discussion/proposal:
>>> 
>>> * Add this as DatasetFactory.createTxnMem(),
>>> * Add DatasetFactory.createGeneral()
>>> * ?? Deprecate DatasetFactory.createMem(),
>>>     referring to createTxnMem() and createGeneral()
>>> (other clearing up of DatasetFactory ...)
>>> * Release.
>>> 
>>> 
>>> 	Andy
>>> 
>>>> 
>>>> --- A. Soroka The University of Virginia Library
>>> 
>>> 2015-01-03:
>>> jena-624-dexx branch:
>>> 
>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>     Size: 1,000,312 (3.253s, 307,504 tps)
>>> ==== DSG/mix/auto (warm N=3)
>>> ==== DSG/mix/txn  (warm N=3)
>>> ==== DSG/mem/auto (warm N=3)
>>> ==== DSG/mem/txn  (warm N=3)
>>> ==== DSG/mix/auto (N=20)
>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>> ==== DSG/mix/txn  (N=20)
>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>> ==== DSG/mem/auto (N=20)
>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>> ==== DSG/mem/txn  (N=20)
>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>> 
> 


Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.
On 05/10/15 20:57, A. Soroka wrote:
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>
> Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?

Yes - and also the fact it'll materialize the triples which if it's some 
complicated backward chained inference setup might lead to a lot of 
work/space.  We just need to manage the integration/migration.

	Andy

>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 29/09/15 15:00, A. Soroka wrote:
>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>> I can't try out your new stuff for a few days due to not being near
>>>> a suitable computer.
>>>
>>> No problem. On my machine using Dexx, that port of the Scala types,
>>> the branch shows improvement to within half of the stock performance.
>>
>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>
>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>
>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>
>>> I have tried now with some variations using the Clojure types (shown
>>> after my sig) and didn’t see much difference, so I’ll leave that
>>> question alone for the moment. I wasn’t able to use Clojure’s
>>> transient (mutate-in-place-within-a-thread/transaction)
>>> functionality, because Clojure transients do not afford iteration,
>>> which is needed to support find(). It seems feasible to me that a
>>> custom implementation with the ability to use mutate-in-place within
>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>> kettle of fish.
>>>
>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>> include something that exercises property paths, which actually
>>> happen to be very interesting for a few use cases in which I am
>>> interested). I’m not sure how to engage real world use very
>>> effectively. I can certainly spin up examples, but it seems like we
>>> would want a broader set of users than just me to try it out, no?
>>> {grin}
>>
>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>
>> Discussion/proposal:
>>
>> * Add this as DatasetFactory.createTxnMem(),
>> * Add DatasetFactory.createGeneral()
>> * ?? Deprecate DatasetFactory.createMem(),
>>      referring to createTxnMem() and createGeneral()
>> (other clearing up of DatasetFactory ...)
>> * Release.
>>
>>
>> 	Andy
>>
>>>
>>> --- A. Soroka The University of Virginia Library
>>
>> 2015-01-03:
>> jena-624-dexx branch:
>>
>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>      Size: 1,000,312 (3.253s, 307,504 tps)
>> ==== DSG/mix/auto (warm N=3)
>> ==== DSG/mix/txn  (warm N=3)
>> ==== DSG/mem/auto (warm N=3)
>> ==== DSG/mem/txn  (warm N=3)
>> ==== DSG/mix/auto (N=20)
>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>> ==== DSG/mix/txn  (N=20)
>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>> ==== DSG/mem/auto (N=20)
>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>> ==== DSG/mem/txn  (N=20)
>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>


Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.
> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.

Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?

---
A. Soroka
The University of Virginia Library

> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 29/09/15 15:00, A. Soroka wrote:
>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>> I can't try out your new stuff for a few days due to not being near
>>> a suitable computer.
>> 
>> No problem. On my machine using Dexx, that port of the Scala types,
>> the branch shows improvement to within half of the stock performance.
> 
> Excellent. That's looking very good.  It's does something so it's going to cost something.
> 
> My figures below on same hardware as before - the txn/non-txn is making a difference now.
> 
> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
> 
>> I have tried now with some variations using the Clojure types (shown
>> after my sig) and didn’t see much difference, so I’ll leave that
>> question alone for the moment. I wasn’t able to use Clojure’s
>> transient (mutate-in-place-within-a-thread/transaction)
>> functionality, because Clojure transients do not afford iteration,
>> which is needed to support find(). It seems feasible to me that a
>> custom implementation with the ability to use mutate-in-place within
>> transactions might offer more improvement, but that’s a whole ‘nuther
>> kettle of fish.
>> 
>> I’ll spend some time soon moving on with the Dexx branch and trying
>> out some simple tests of the kind you’ve outlined below (and I’ll
>> include something that exercises property paths, which actually
>> happen to be very interesting for a few use cases in which I am
>> interested). I’m not sure how to engage real world use very
>> effectively. I can certainly spin up examples, but it seems like we
>> would want a broader set of users than just me to try it out, no?
>> {grin}
> 
> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
> 
> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
> 
> Discussion/proposal:
> 
> * Add this as DatasetFactory.createTxnMem(),
> * Add DatasetFactory.createGeneral()
> * ?? Deprecate DatasetFactory.createMem(),
>     referring to createTxnMem() and createGeneral()
> (other clearing up of DatasetFactory ...)
> * Release.
> 
> 
> 	Andy
> 
>> 
>> --- A. Soroka The University of Virginia Library
> 
> 2015-01-03:
> jena-624-dexx branch:
> 
> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>     Size: 1,000,312 (3.253s, 307,504 tps)
> ==== DSG/mix/auto (warm N=3)
> ==== DSG/mix/txn  (warm N=3)
> ==== DSG/mem/auto (warm N=3)
> ==== DSG/mem/txn  (warm N=3)
> ==== DSG/mix/auto (N=20)
> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
> ==== DSG/mix/txn  (N=20)
> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
> ==== DSG/mem/auto (N=20)
> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
> ==== DSG/mem/txn  (N=20)
> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)


Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.
On 07/10/15 21:03, A. Soroka wrote:
> Okay, I’ll do a bit of polish and commenting and get a PR in.
>
> I hadn’t even thought about the assembler system— well, it’s a chance to learn about another part of Jena! {grin}

Shouldn't be hard - there is DatasetAssembler for building the general 
one.  It reuses the general "build a model" stuff but DatasetMem needs 
it's own configuration setup (i.e. which files to load) which will be 
differed.

	Andy

>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 7, 2015, at 9:35 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 06/10/15 18:28, A. Soroka wrote:
>>> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>>>
>>
>> Good idea.  I haven't looked in depth at changes across the rest of base/core/arq and a PR will be easier to find those places.
>>
>> There will also need to be an assembler at some time, including ways to initialize it from loading files.  The current ja:RDFDataset is definitely centrer around the concept of building a dataset from models.
>>
>> 	Andy
>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> On 29/09/15 15:00, A. Soroka wrote:
>>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>>> I can't try out your new stuff for a few days due to not being near
>>>>>> a suitable computer.
>>>>>
>>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>>> the branch shows improvement to within half of the stock performance.
>>>>
>>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>>>
>>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>>>
>>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>>>
>>>>> I have tried now with some variations using the Clojure types (shown
>>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>>> functionality, because Clojure transients do not afford iteration,
>>>>> which is needed to support find(). It seems feasible to me that a
>>>>> custom implementation with the ability to use mutate-in-place within
>>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>>> kettle of fish.
>>>>>
>>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>>> include something that exercises property paths, which actually
>>>>> happen to be very interesting for a few use cases in which I am
>>>>> interested). I’m not sure how to engage real world use very
>>>>> effectively. I can certainly spin up examples, but it seems like we
>>>>> would want a broader set of users than just me to try it out, no?
>>>>> {grin}
>>>>
>>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>>>
>>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>>>
>>>> Discussion/proposal:
>>>>
>>>> * Add this as DatasetFactory.createTxnMem(),
>>>> * Add DatasetFactory.createGeneral()
>>>> * ?? Deprecate DatasetFactory.createMem(),
>>>>      referring to createTxnMem() and createGeneral()
>>>> (other clearing up of DatasetFactory ...)
>>>> * Release.
>>>>
>>>>
>>>> 	Andy
>>>>
>>>>>
>>>>> --- A. Soroka The University of Virginia Library
>>>>
>>>> 2015-01-03:
>>>> jena-624-dexx branch:
>>>>
>>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>>      Size: 1,000,312 (3.253s, 307,504 tps)
>>>> ==== DSG/mix/auto (warm N=3)
>>>> ==== DSG/mix/txn  (warm N=3)
>>>> ==== DSG/mem/auto (warm N=3)
>>>> ==== DSG/mem/txn  (warm N=3)
>>>> ==== DSG/mix/auto (N=20)
>>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>>> ==== DSG/mix/txn  (N=20)
>>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>>> ==== DSG/mem/auto (N=20)
>>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>>> ==== DSG/mem/txn  (N=20)
>>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>>>
>>
>


Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.
Okay, I’ll do a bit of polish and commenting and get a PR in.

I hadn’t even thought about the assembler system— well, it’s a chance to learn about another part of Jena! {grin}

---
A. Soroka
The University of Virginia Library

> On Oct 7, 2015, at 9:35 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 06/10/15 18:28, A. Soroka wrote:
>> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>> 
> 
> Good idea.  I haven't looked in depth at changes across the rest of base/core/arq and a PR will be easier to find those places.
> 
> There will also need to be an assembler at some time, including ways to initialize it from loading files.  The current ja:RDFDataset is definitely centrer around the concept of building a dataset from models.
> 
> 	Andy
> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> On 29/09/15 15:00, A. Soroka wrote:
>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>> I can't try out your new stuff for a few days due to not being near
>>>>> a suitable computer.
>>>> 
>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>> the branch shows improvement to within half of the stock performance.
>>> 
>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>> 
>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>> 
>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>> 
>>>> I have tried now with some variations using the Clojure types (shown
>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>> functionality, because Clojure transients do not afford iteration,
>>>> which is needed to support find(). It seems feasible to me that a
>>>> custom implementation with the ability to use mutate-in-place within
>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>> kettle of fish.
>>>> 
>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>> include something that exercises property paths, which actually
>>>> happen to be very interesting for a few use cases in which I am
>>>> interested). I’m not sure how to engage real world use very
>>>> effectively. I can certainly spin up examples, but it seems like we
>>>> would want a broader set of users than just me to try it out, no?
>>>> {grin}
>>> 
>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>> 
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>> 
>>> Discussion/proposal:
>>> 
>>> * Add this as DatasetFactory.createTxnMem(),
>>> * Add DatasetFactory.createGeneral()
>>> * ?? Deprecate DatasetFactory.createMem(),
>>>     referring to createTxnMem() and createGeneral()
>>> (other clearing up of DatasetFactory ...)
>>> * Release.
>>> 
>>> 
>>> 	Andy
>>> 
>>>> 
>>>> --- A. Soroka The University of Virginia Library
>>> 
>>> 2015-01-03:
>>> jena-624-dexx branch:
>>> 
>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>     Size: 1,000,312 (3.253s, 307,504 tps)
>>> ==== DSG/mix/auto (warm N=3)
>>> ==== DSG/mix/txn  (warm N=3)
>>> ==== DSG/mem/auto (warm N=3)
>>> ==== DSG/mem/txn  (warm N=3)
>>> ==== DSG/mix/auto (N=20)
>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>> ==== DSG/mix/txn  (N=20)
>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>> ==== DSG/mem/auto (N=20)
>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>> ==== DSG/mem/txn  (N=20)
>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>> 
> 


Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.
On 06/10/15 18:28, A. Soroka wrote:
> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>

Good idea.  I haven't looked in depth at changes across the rest of 
base/core/arq and a PR will be easier to find those places.

There will also need to be an assembler at some time, including ways to 
initialize it from loading files.  The current ja:RDFDataset is 
definitely centrer around the concept of building a dataset from models.

	Andy

> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 29/09/15 15:00, A. Soroka wrote:
>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>> I can't try out your new stuff for a few days due to not being near
>>>> a suitable computer.
>>>
>>> No problem. On my machine using Dexx, that port of the Scala types,
>>> the branch shows improvement to within half of the stock performance.
>>
>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>
>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>
>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>
>>> I have tried now with some variations using the Clojure types (shown
>>> after my sig) and didn’t see much difference, so I’ll leave that
>>> question alone for the moment. I wasn’t able to use Clojure’s
>>> transient (mutate-in-place-within-a-thread/transaction)
>>> functionality, because Clojure transients do not afford iteration,
>>> which is needed to support find(). It seems feasible to me that a
>>> custom implementation with the ability to use mutate-in-place within
>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>> kettle of fish.
>>>
>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>> include something that exercises property paths, which actually
>>> happen to be very interesting for a few use cases in which I am
>>> interested). I’m not sure how to engage real world use very
>>> effectively. I can certainly spin up examples, but it seems like we
>>> would want a broader set of users than just me to try it out, no?
>>> {grin}
>>
>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>
>> Discussion/proposal:
>>
>> * Add this as DatasetFactory.createTxnMem(),
>> * Add DatasetFactory.createGeneral()
>> * ?? Deprecate DatasetFactory.createMem(),
>>      referring to createTxnMem() and createGeneral()
>> (other clearing up of DatasetFactory ...)
>> * Release.
>>
>>
>> 	Andy
>>
>>>
>>> --- A. Soroka The University of Virginia Library
>>
>> 2015-01-03:
>> jena-624-dexx branch:
>>
>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>      Size: 1,000,312 (3.253s, 307,504 tps)
>> ==== DSG/mix/auto (warm N=3)
>> ==== DSG/mix/txn  (warm N=3)
>> ==== DSG/mem/auto (warm N=3)
>> ==== DSG/mem/txn  (warm N=3)
>> ==== DSG/mix/auto (N=20)
>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>> ==== DSG/mix/txn  (N=20)
>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>> ==== DSG/mem/auto (N=20)
>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>> ==== DSG/mem/txn  (N=20)
>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>


Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.
Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?

---
A. Soroka
The University of Virginia Library

> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 29/09/15 15:00, A. Soroka wrote:
>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>> I can't try out your new stuff for a few days due to not being near
>>> a suitable computer.
>> 
>> No problem. On my machine using Dexx, that port of the Scala types,
>> the branch shows improvement to within half of the stock performance.
> 
> Excellent. That's looking very good.  It's does something so it's going to cost something.
> 
> My figures below on same hardware as before - the txn/non-txn is making a difference now.
> 
> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
> 
>> I have tried now with some variations using the Clojure types (shown
>> after my sig) and didn’t see much difference, so I’ll leave that
>> question alone for the moment. I wasn’t able to use Clojure’s
>> transient (mutate-in-place-within-a-thread/transaction)
>> functionality, because Clojure transients do not afford iteration,
>> which is needed to support find(). It seems feasible to me that a
>> custom implementation with the ability to use mutate-in-place within
>> transactions might offer more improvement, but that’s a whole ‘nuther
>> kettle of fish.
>> 
>> I’ll spend some time soon moving on with the Dexx branch and trying
>> out some simple tests of the kind you’ve outlined below (and I’ll
>> include something that exercises property paths, which actually
>> happen to be very interesting for a few use cases in which I am
>> interested). I’m not sure how to engage real world use very
>> effectively. I can certainly spin up examples, but it seems like we
>> would want a broader set of users than just me to try it out, no?
>> {grin}
> 
> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
> 
> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
> 
> Discussion/proposal:
> 
> * Add this as DatasetFactory.createTxnMem(),
> * Add DatasetFactory.createGeneral()
> * ?? Deprecate DatasetFactory.createMem(),
>     referring to createTxnMem() and createGeneral()
> (other clearing up of DatasetFactory ...)
> * Release.
> 
> 
> 	Andy
> 
>> 
>> --- A. Soroka The University of Virginia Library
> 
> 2015-01-03:
> jena-624-dexx branch:
> 
> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>     Size: 1,000,312 (3.253s, 307,504 tps)
> ==== DSG/mix/auto (warm N=3)
> ==== DSG/mix/txn  (warm N=3)
> ==== DSG/mem/auto (warm N=3)
> ==== DSG/mem/txn  (warm N=3)
> ==== DSG/mix/auto (N=20)
> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
> ==== DSG/mix/txn  (N=20)
> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
> ==== DSG/mem/auto (N=20)
> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
> ==== DSG/mem/txn  (N=20)
> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)


Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.
On 29/09/15 15:00, A. Soroka wrote:
> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>> I can't try out your new stuff for a few days due to not being near
>> a suitable computer.
>
> No problem. On my machine using Dexx, that port of the Scala types,
> the branch shows improvement to within half of the stock performance.

Excellent. That's looking very good.  It's does something so it's going 
to cost something.

My figures below on same hardware as before - the txn/non-txn is making 
a difference now.

Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which 
is no problem.

> I have tried now with some variations using the Clojure types (shown
> after my sig) and didn’t see much difference, so I’ll leave that
> question alone for the moment. I wasn’t able to use Clojure’s
> transient (mutate-in-place-within-a-thread/transaction)
> functionality, because Clojure transients do not afford iteration,
> which is needed to support find(). It seems feasible to me that a
> custom implementation with the ability to use mutate-in-place within
> transactions might offer more improvement, but that’s a whole ‘nuther
> kettle of fish.
>
> I’ll spend some time soon moving on with the Dexx branch and trying
> out some simple tests of the kind you’ve outlined below (and I’ll
> include something that exercises property paths, which actually
> happen to be very interesting for a few use cases in which I am
> interested). I’m not sure how to engage real world use very
> effectively. I can certainly spin up examples, but it seems like we
> would want a broader set of users than just me to try it out, no?
> {grin}

That would be ideal but it's not always easy to do.  Email to users@ 
possibly with a quite large notice saying people are affected.

I think the problem areas are around adding inference graphs to general 
datasets, not the details of this new dataset implementation.

Discussion/proposal:

* Add this as DatasetFactory.createTxnMem(),
* Add DatasetFactory.createGeneral()
* ?? Deprecate DatasetFactory.createMem(),
      referring to createTxnMem() and createGeneral()
(other clearing up of DatasetFactory ...)
* Release.


	Andy

>
> --- A. Soroka The University of Virginia Library

2015-01-03:
jena-624-dexx branch:

==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
      Size: 1,000,312 (3.253s, 307,504 tps)
==== DSG/mix/auto (warm N=3)
==== DSG/mix/txn  (warm N=3)
==== DSG/mem/auto (warm N=3)
==== DSG/mem/txn  (warm N=3)
==== DSG/mix/auto (N=20)
==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
==== DSG/mix/txn  (N=20)
==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
==== DSG/mem/auto (N=20)
==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
==== DSG/mem/txn  (N=20)
==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)

Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.
On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
> I can't try out your new stuff for a few days due to not being near a suitable computer.

No problem. On my machine using Dexx, that port of the Scala types, the branch shows improvement to within half of the stock performance. I have tried now with some variations using the Clojure types (shown after my sig) and didn’t see much difference, so I’ll leave that question alone for the moment. I wasn’t able to use Clojure’s transient (mutate-in-place-within-a-thread/transaction) functionality, because Clojure transients do not afford iteration, which is needed to support find(). It seems feasible to me that a custom implementation with the ability to use mutate-in-place within transactions might offer more improvement, but that’s a whole ‘nuther kettle of fish.

I’ll spend some time soon moving on with the Dexx branch and trying out some simple tests of the kind you’ve outlined below (and I’ll include something that exercises property paths, which actually happen to be very interesting for a few use cases in which I am interested). I’m not sure how to engage real world use very effectively. I can certainly spin up examples, but it seems like we would want a broader set of users than just me to try it out, no? {grin}

---
A. Soroka
The University of Virginia Library

Clojure w/o transients 
Running org.apache.jena.sparql.core.mem.PerfTest
==== Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ====
    Size: 1,000,312 (3.551s, 281,698 tps)
==== DSG/mix/auto (warm N=3)
==== DSG/mix/txn  (warm N=3)
==== DSG/mem/auto (warm N=3)
==== DSG/mem/txn  (warm N=3)
==== DSG/mix/auto (N=20)
==== DSG/mix/auto (N=20) Time: 96.106s (208,168 tps)
==== DSG/mix/txn  (N=20)
==== DSG/mix/txn  (N=20) Time: 95.053s (210,474 tps)
==== DSG/mem/auto (N=20)
==== DSG/mem/auto (N=20) Time: 221.693s (90,242 tps)
==== DSG/mem/txn  (N=20)
==== DSG/mem/txn  (N=20) Time: 168.189s (118,950 tps)

> 
> On 26/09/15 18:31, A. Soroka wrote:
>> On a related note, are there any Jena standard parts for query
>> testing for this kind of situation? I know that BSBM has several
>> sophisticated suites of tests defined, but are any of them considered
>> particularly appropriate, or has anyone out there in dev-land built
>> their own harness for BSBM or something else that I could “borrow”?
>> {grin}
> 
> Benchmarks like BSBM are looking at scale in a way that is different. BSBM is as much about the mem-storage boundary.
> 
> For the general purpose in-memory dataset, the need is for some lower level tests mainly to ensure nothing really bad, and easily addressable is happening.
> 
> SPARQL execution is only lightly going to be influenced by dataset speed.  Complex queries do a lot of intermediate processing (e.g. sorting) and that's not to do with the base data.  One exception (isn't there always) is property paths.  The current implementation can hit the store at fine grain quite hard; the ideal is better algorithms for property paths but it also presents what code that directly uses the API might do.
> 
> In TDB, it would be better to computer in NodeIds but the current integration gets the Nodes IIRC.  [Hmm - there is a fairly obvious way to fix that ... different discussion.]
> 
> A few simple tests that come to mind are:
> 
> 1. count all triples - test end to end scan of the dataset
> 2. write the whole dataset to /dev/null.
> 3. same as above but for a graph, default or named.
> 
> 4. Some find() cases that are more important like find(G,S,?,?) find(G,?,P,O) [key look up] or find(G,?,P,?)
>  find(G,?,?,?) is covered by (3)
> 
> 5. and the non-G versions for a graph.
> *6. Union graph (if supported)
> 
> Given those, I think the next level of verification is real use, rather than specific (artificial) situations.  Of course, there is also mega-sized in-memory use cases (systems can deploy at lot of RAM these days).  Then GC and/or off heap memory starts getting fun.
> 
> 	Andy


Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.
I can't try out your new stuff for a few days due to not being near a
suitable computer.

On 26/09/15 18:31, A. Soroka wrote:
> On a related note, are there any Jena standard parts for query
> testing for this kind of situation? I know that BSBM has several
> sophisticated suites of tests defined, but are any of them considered
> particularly appropriate, or has anyone out there in dev-land built
> their own harness for BSBM or something else that I could “borrow”?
> {grin}

Benchmarks like BSBM are looking at scale in a way that is different. 
BSBM is as much about the mem-storage boundary.

For the general purpose in-memory dataset, the need is for some lower 
level tests mainly to ensure nothing really bad, and easily addressable 
is happening.

SPARQL execution is only lightly going to be influenced by dataset 
speed.  Complex queries do a lot of intermediate processing (e.g. 
sorting) and that's not to do with the base data.  One exception (isn't 
there always) is property paths.  The current implementation can hit the 
store at fine grain quite hard; the ideal is better algorithms for 
property paths but it also presents what code that directly uses the API 
might do.

In TDB, it would be better to computer in NodeIds but the current 
integration gets the Nodes IIRC.  [Hmm - there is a fairly obvious way 
to fix that ... different discussion.]

A few simple tests that come to mind are:

1. count all triples - test end to end scan of the dataset
2. write the whole dataset to /dev/null.
3. same as above but for a graph, default or named.

4. Some find() cases that are more important like find(G,S,?,?) 
find(G,?,P,O) [key look up] or find(G,?,P,?)
   find(G,?,?,?) is covered by (3)

5. and the non-G versions for a graph.
*6. Union graph (if supported)

Given those, I think the next level of verification is real use, rather 
than specific (artificial) situations.  Of course, there is also 
mega-sized in-memory use cases (systems can deploy at lot of RAM these 
days).  Then GC and/or off heap memory starts getting fun.

	Andy