You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2015/10/04 11:37:49 UTC

Re: Timing tests for jena-624: doing better

On 29/09/15 15:00, A. Soroka wrote:
> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>> I can't try out your new stuff for a few days due to not being near
>> a suitable computer.
>
> No problem. On my machine using Dexx, that port of the Scala types,
> the branch shows improvement to within half of the stock performance.

Excellent. That's looking very good.  It's does something so it's going 
to cost something.

My figures below on same hardware as before - the txn/non-txn is making 
a difference now.

Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which 
is no problem.

> I have tried now with some variations using the Clojure types (shown
> after my sig) and didn’t see much difference, so I’ll leave that
> question alone for the moment. I wasn’t able to use Clojure’s
> transient (mutate-in-place-within-a-thread/transaction)
> functionality, because Clojure transients do not afford iteration,
> which is needed to support find(). It seems feasible to me that a
> custom implementation with the ability to use mutate-in-place within
> transactions might offer more improvement, but that’s a whole ‘nuther
> kettle of fish.
>
> I’ll spend some time soon moving on with the Dexx branch and trying
> out some simple tests of the kind you’ve outlined below (and I’ll
> include something that exercises property paths, which actually
> happen to be very interesting for a few use cases in which I am
> interested). I’m not sure how to engage real world use very
> effectively. I can certainly spin up examples, but it seems like we
> would want a broader set of users than just me to try it out, no?
> {grin}

That would be ideal but it's not always easy to do.  Email to users@ 
possibly with a quite large notice saying people are affected.

I think the problem areas are around adding inference graphs to general 
datasets, not the details of this new dataset implementation.

Discussion/proposal:

* Add this as DatasetFactory.createTxnMem(),
* Add DatasetFactory.createGeneral()
* ?? Deprecate DatasetFactory.createMem(),
      referring to createTxnMem() and createGeneral()
(other clearing up of DatasetFactory ...)
* Release.


	Andy

>
> --- A. Soroka The University of Virginia Library

2015-01-03:
jena-624-dexx branch:

==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
      Size: 1,000,312 (3.253s, 307,504 tps)
==== DSG/mix/auto (warm N=3)
==== DSG/mix/txn  (warm N=3)
==== DSG/mem/auto (warm N=3)
==== DSG/mem/txn  (warm N=3)
==== DSG/mix/auto (N=20)
==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
==== DSG/mix/txn  (N=20)
==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
==== DSG/mem/auto (N=20)
==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
==== DSG/mem/txn  (N=20)
==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)

Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.

Okay, that would seem to me to be mostly about documentation/messaging, since Graph could be implemented by any fellow off the street.

After a few simple tests on freshly-generated BSBM data (different flavors of find()), the results are pretty much as one would expect. When queries tilt towards the iterative end of things (more wildcards in non-graph-name positions), the stock implementation wins out, usually by a factor of two or three. The iterative machinery in the new implementation is heavier (using the Streams API), so that’s not surprising. When queries tilt to direct retrieval (fewer wildcards), letting the new implementation really make use of its “INDEX ALL THE THINGS” maps, the new implementation wins, sometimes by a little, sometimes by a factor of several dozen. I’m eager to see what real-world use looks like!

---
A. Soroka
The University of Virginia Library

> On Oct 6, 2015, at 6:05 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 05/10/15 20:57, A. Soroka wrote:
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>> 
>> Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?
> 
> Yes - and also the fact it'll materialize the triples which if it's some complicated backward chained inference setup might lead to a lot of work/space.  We just need to manage the integration/migration.
> 
> 	Andy
> 
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> On 29/09/15 15:00, A. Soroka wrote:
>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>> I can't try out your new stuff for a few days due to not being near
>>>>> a suitable computer.
>>>> 
>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>> the branch shows improvement to within half of the stock performance.
>>> 
>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>> 
>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>> 
>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>> 
>>>> I have tried now with some variations using the Clojure types (shown
>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>> functionality, because Clojure transients do not afford iteration,
>>>> which is needed to support find(). It seems feasible to me that a
>>>> custom implementation with the ability to use mutate-in-place within
>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>> kettle of fish.
>>>> 
>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>> include something that exercises property paths, which actually
>>>> happen to be very interesting for a few use cases in which I am
>>>> interested). I’m not sure how to engage real world use very
>>>> effectively. I can certainly spin up examples, but it seems like we
>>>> would want a broader set of users than just me to try it out, no?
>>>> {grin}
>>> 
>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>> 
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>> 
>>> Discussion/proposal:
>>> 
>>> * Add this as DatasetFactory.createTxnMem(),
>>> * Add DatasetFactory.createGeneral()
>>> * ?? Deprecate DatasetFactory.createMem(),
>>>     referring to createTxnMem() and createGeneral()
>>> (other clearing up of DatasetFactory ...)
>>> * Release.
>>> 
>>> 
>>> 	Andy
>>> 
>>>> 
>>>> --- A. Soroka The University of Virginia Library
>>> 
>>> 2015-01-03:
>>> jena-624-dexx branch:
>>> 
>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>     Size: 1,000,312 (3.253s, 307,504 tps)
>>> ==== DSG/mix/auto (warm N=3)
>>> ==== DSG/mix/txn  (warm N=3)
>>> ==== DSG/mem/auto (warm N=3)
>>> ==== DSG/mem/txn  (warm N=3)
>>> ==== DSG/mix/auto (N=20)
>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>> ==== DSG/mix/txn  (N=20)
>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>> ==== DSG/mem/auto (N=20)
>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>> ==== DSG/mem/txn  (N=20)
>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>> 
>

Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.

On 05/10/15 20:57, A. Soroka wrote:
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>
> Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?

Yes - and also the fact it'll materialize the triples which if it's some 
complicated backward chained inference setup might lead to a lot of 
work/space.  We just need to manage the integration/migration.

	Andy

>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 29/09/15 15:00, A. Soroka wrote:
>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>> I can't try out your new stuff for a few days due to not being near
>>>> a suitable computer.
>>>
>>> No problem. On my machine using Dexx, that port of the Scala types,
>>> the branch shows improvement to within half of the stock performance.
>>
>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>
>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>
>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>
>>> I have tried now with some variations using the Clojure types (shown
>>> after my sig) and didn’t see much difference, so I’ll leave that
>>> question alone for the moment. I wasn’t able to use Clojure’s
>>> transient (mutate-in-place-within-a-thread/transaction)
>>> functionality, because Clojure transients do not afford iteration,
>>> which is needed to support find(). It seems feasible to me that a
>>> custom implementation with the ability to use mutate-in-place within
>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>> kettle of fish.
>>>
>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>> include something that exercises property paths, which actually
>>> happen to be very interesting for a few use cases in which I am
>>> interested). I’m not sure how to engage real world use very
>>> effectively. I can certainly spin up examples, but it seems like we
>>> would want a broader set of users than just me to try it out, no?
>>> {grin}
>>
>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>
>> Discussion/proposal:
>>
>> * Add this as DatasetFactory.createTxnMem(),
>> * Add DatasetFactory.createGeneral()
>> * ?? Deprecate DatasetFactory.createMem(),
>>      referring to createTxnMem() and createGeneral()
>> (other clearing up of DatasetFactory ...)
>> * Release.
>>
>>
>> 	Andy
>>
>>>
>>> --- A. Soroka The University of Virginia Library
>>
>> 2015-01-03:
>> jena-624-dexx branch:
>>
>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>      Size: 1,000,312 (3.253s, 307,504 tps)
>> ==== DSG/mix/auto (warm N=3)
>> ==== DSG/mix/txn  (warm N=3)
>> ==== DSG/mem/auto (warm N=3)
>> ==== DSG/mem/txn  (warm N=3)
>> ==== DSG/mix/auto (N=20)
>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>> ==== DSG/mix/txn  (N=20)
>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>> ==== DSG/mem/auto (N=20)
>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>> ==== DSG/mem/txn  (N=20)
>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>

Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.

> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.

Just to be sure that I understand the issue here, is the problem that one could make a graph using an inferring implementation, add the graph to this kind of dataset, and expect the inference to function inside the dataset (which it won’t, because of the copy-on-add-graph semantic)?

---
A. Soroka
The University of Virginia Library

> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 29/09/15 15:00, A. Soroka wrote:
>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>> I can't try out your new stuff for a few days due to not being near
>>> a suitable computer.
>> 
>> No problem. On my machine using Dexx, that port of the Scala types,
>> the branch shows improvement to within half of the stock performance.
> 
> Excellent. That's looking very good.  It's does something so it's going to cost something.
> 
> My figures below on same hardware as before - the txn/non-txn is making a difference now.
> 
> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
> 
>> I have tried now with some variations using the Clojure types (shown
>> after my sig) and didn’t see much difference, so I’ll leave that
>> question alone for the moment. I wasn’t able to use Clojure’s
>> transient (mutate-in-place-within-a-thread/transaction)
>> functionality, because Clojure transients do not afford iteration,
>> which is needed to support find(). It seems feasible to me that a
>> custom implementation with the ability to use mutate-in-place within
>> transactions might offer more improvement, but that’s a whole ‘nuther
>> kettle of fish.
>> 
>> I’ll spend some time soon moving on with the Dexx branch and trying
>> out some simple tests of the kind you’ve outlined below (and I’ll
>> include something that exercises property paths, which actually
>> happen to be very interesting for a few use cases in which I am
>> interested). I’m not sure how to engage real world use very
>> effectively. I can certainly spin up examples, but it seems like we
>> would want a broader set of users than just me to try it out, no?
>> {grin}
> 
> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
> 
> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
> 
> Discussion/proposal:
> 
> * Add this as DatasetFactory.createTxnMem(),
> * Add DatasetFactory.createGeneral()
> * ?? Deprecate DatasetFactory.createMem(),
>     referring to createTxnMem() and createGeneral()
> (other clearing up of DatasetFactory ...)
> * Release.
> 
> 
> 	Andy
> 
>> 
>> --- A. Soroka The University of Virginia Library
> 
> 2015-01-03:
> jena-624-dexx branch:
> 
> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>     Size: 1,000,312 (3.253s, 307,504 tps)
> ==== DSG/mix/auto (warm N=3)
> ==== DSG/mix/txn  (warm N=3)
> ==== DSG/mem/auto (warm N=3)
> ==== DSG/mem/txn  (warm N=3)
> ==== DSG/mix/auto (N=20)
> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
> ==== DSG/mix/txn  (N=20)
> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
> ==== DSG/mem/auto (N=20)
> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
> ==== DSG/mem/txn  (N=20)
> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)

Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.

On 07/10/15 21:03, A. Soroka wrote:
> Okay, I’ll do a bit of polish and commenting and get a PR in.
>
> I hadn’t even thought about the assembler system— well, it’s a chance to learn about another part of Jena! {grin}

Shouldn't be hard - there is DatasetAssembler for building the general 
one.  It reuses the general "build a model" stuff but DatasetMem needs 
it's own configuration setup (i.e. which files to load) which will be 
differed.

	Andy

>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 7, 2015, at 9:35 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 06/10/15 18:28, A. Soroka wrote:
>>> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>>>
>>
>> Good idea.  I haven't looked in depth at changes across the rest of base/core/arq and a PR will be easier to find those places.
>>
>> There will also need to be an assembler at some time, including ways to initialize it from loading files.  The current ja:RDFDataset is definitely centrer around the concept of building a dataset from models.
>>
>> 	Andy
>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> On 29/09/15 15:00, A. Soroka wrote:
>>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>>> I can't try out your new stuff for a few days due to not being near
>>>>>> a suitable computer.
>>>>>
>>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>>> the branch shows improvement to within half of the stock performance.
>>>>
>>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>>>
>>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>>>
>>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>>>
>>>>> I have tried now with some variations using the Clojure types (shown
>>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>>> functionality, because Clojure transients do not afford iteration,
>>>>> which is needed to support find(). It seems feasible to me that a
>>>>> custom implementation with the ability to use mutate-in-place within
>>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>>> kettle of fish.
>>>>>
>>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>>> include something that exercises property paths, which actually
>>>>> happen to be very interesting for a few use cases in which I am
>>>>> interested). I’m not sure how to engage real world use very
>>>>> effectively. I can certainly spin up examples, but it seems like we
>>>>> would want a broader set of users than just me to try it out, no?
>>>>> {grin}
>>>>
>>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>>>
>>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>>>
>>>> Discussion/proposal:
>>>>
>>>> * Add this as DatasetFactory.createTxnMem(),
>>>> * Add DatasetFactory.createGeneral()
>>>> * ?? Deprecate DatasetFactory.createMem(),
>>>>      referring to createTxnMem() and createGeneral()
>>>> (other clearing up of DatasetFactory ...)
>>>> * Release.
>>>>
>>>>
>>>> 	Andy
>>>>
>>>>>
>>>>> --- A. Soroka The University of Virginia Library
>>>>
>>>> 2015-01-03:
>>>> jena-624-dexx branch:
>>>>
>>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>>      Size: 1,000,312 (3.253s, 307,504 tps)
>>>> ==== DSG/mix/auto (warm N=3)
>>>> ==== DSG/mix/txn  (warm N=3)
>>>> ==== DSG/mem/auto (warm N=3)
>>>> ==== DSG/mem/txn  (warm N=3)
>>>> ==== DSG/mix/auto (N=20)
>>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>>> ==== DSG/mix/txn  (N=20)
>>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>>> ==== DSG/mem/auto (N=20)
>>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>>> ==== DSG/mem/txn  (N=20)
>>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>>>
>>
>

Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.

Okay, I’ll do a bit of polish and commenting and get a PR in.

I hadn’t even thought about the assembler system— well, it’s a chance to learn about another part of Jena! {grin}

---
A. Soroka
The University of Virginia Library

> On Oct 7, 2015, at 9:35 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 06/10/15 18:28, A. Soroka wrote:
>> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>> 
> 
> Good idea.  I haven't looked in depth at changes across the rest of base/core/arq and a PR will be easier to find those places.
> 
> There will also need to be an assembler at some time, including ways to initialize it from loading files.  The current ja:RDFDataset is definitely centrer around the concept of building a dataset from models.
> 
> 	Andy
> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> On 29/09/15 15:00, A. Soroka wrote:
>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>>> I can't try out your new stuff for a few days due to not being near
>>>>> a suitable computer.
>>>> 
>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>> the branch shows improvement to within half of the stock performance.
>>> 
>>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>> 
>>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>> 
>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>> 
>>>> I have tried now with some variations using the Clojure types (shown
>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>> functionality, because Clojure transients do not afford iteration,
>>>> which is needed to support find(). It seems feasible to me that a
>>>> custom implementation with the ability to use mutate-in-place within
>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>> kettle of fish.
>>>> 
>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>> include something that exercises property paths, which actually
>>>> happen to be very interesting for a few use cases in which I am
>>>> interested). I’m not sure how to engage real world use very
>>>> effectively. I can certainly spin up examples, but it seems like we
>>>> would want a broader set of users than just me to try it out, no?
>>>> {grin}
>>> 
>>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>> 
>>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>> 
>>> Discussion/proposal:
>>> 
>>> * Add this as DatasetFactory.createTxnMem(),
>>> * Add DatasetFactory.createGeneral()
>>> * ?? Deprecate DatasetFactory.createMem(),
>>>     referring to createTxnMem() and createGeneral()
>>> (other clearing up of DatasetFactory ...)
>>> * Release.
>>> 
>>> 
>>> 	Andy
>>> 
>>>> 
>>>> --- A. Soroka The University of Virginia Library
>>> 
>>> 2015-01-03:
>>> jena-624-dexx branch:
>>> 
>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>     Size: 1,000,312 (3.253s, 307,504 tps)
>>> ==== DSG/mix/auto (warm N=3)
>>> ==== DSG/mix/txn  (warm N=3)
>>> ==== DSG/mem/auto (warm N=3)
>>> ==== DSG/mem/txn  (warm N=3)
>>> ==== DSG/mix/auto (N=20)
>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>> ==== DSG/mix/txn  (N=20)
>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>> ==== DSG/mem/auto (N=20)
>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>> ==== DSG/mem/txn  (N=20)
>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>> 
>

Re: Timing tests for jena-624: doing better

Posted by Andy Seaborne <an...@apache.org>.

On 06/10/15 18:28, A. Soroka wrote:
> Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?
>

Good idea.  I haven't looked in depth at changes across the rest of 
base/core/arq and a PR will be easier to find those places.

There will also need to be an assembler at some time, including ways to 
initialize it from loading files.  The current ja:RDFDataset is 
definitely centrer around the concept of building a dataset from models.

	Andy

> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 29/09/15 15:00, A. Soroka wrote:
>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>>> I can't try out your new stuff for a few days due to not being near
>>>> a suitable computer.
>>>
>>> No problem. On my machine using Dexx, that port of the Scala types,
>>> the branch shows improvement to within half of the stock performance.
>>
>> Excellent. That's looking very good.  It's does something so it's going to cost something.
>>
>> My figures below on same hardware as before - the txn/non-txn is making a difference now.
>>
>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
>>
>>> I have tried now with some variations using the Clojure types (shown
>>> after my sig) and didn’t see much difference, so I’ll leave that
>>> question alone for the moment. I wasn’t able to use Clojure’s
>>> transient (mutate-in-place-within-a-thread/transaction)
>>> functionality, because Clojure transients do not afford iteration,
>>> which is needed to support find(). It seems feasible to me that a
>>> custom implementation with the ability to use mutate-in-place within
>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>> kettle of fish.
>>>
>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>> include something that exercises property paths, which actually
>>> happen to be very interesting for a few use cases in which I am
>>> interested). I’m not sure how to engage real world use very
>>> effectively. I can certainly spin up examples, but it seems like we
>>> would want a broader set of users than just me to try it out, no?
>>> {grin}
>>
>> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
>>
>> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
>>
>> Discussion/proposal:
>>
>> * Add this as DatasetFactory.createTxnMem(),
>> * Add DatasetFactory.createGeneral()
>> * ?? Deprecate DatasetFactory.createMem(),
>>      referring to createTxnMem() and createGeneral()
>> (other clearing up of DatasetFactory ...)
>> * Release.
>>
>>
>> 	Andy
>>
>>>
>>> --- A. Soroka The University of Virginia Library
>>
>> 2015-01-03:
>> jena-624-dexx branch:
>>
>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>      Size: 1,000,312 (3.253s, 307,504 tps)
>> ==== DSG/mix/auto (warm N=3)
>> ==== DSG/mix/txn  (warm N=3)
>> ==== DSG/mem/auto (warm N=3)
>> ==== DSG/mem/txn  (warm N=3)
>> ==== DSG/mix/auto (N=20)
>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>> ==== DSG/mix/txn  (N=20)
>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>> ==== DSG/mem/auto (N=20)
>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>> ==== DSG/mem/txn  (N=20)
>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>

Re: Timing tests for jena-624: doing better

Posted by "A. Soroka" <aj...@virginia.edu>.

Andy— would it be appropriate at this time to issue a PR on this Dexx-based branch, so that other people can more easily comment on it?

---
A. Soroka
The University of Virginia Library

> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> On 29/09/15 15:00, A. Soroka wrote:
>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <an...@apache.org> wrote
>>> I can't try out your new stuff for a few days due to not being near
>>> a suitable computer.
>> 
>> No problem. On my machine using Dexx, that port of the Scala types,
>> the branch shows improvement to within half of the stock performance.
> 
> Excellent. That's looking very good.  It's does something so it's going to cost something.
> 
> My figures below on same hardware as before - the txn/non-txn is making a difference now.
> 
> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is no problem.
> 
>> I have tried now with some variations using the Clojure types (shown
>> after my sig) and didn’t see much difference, so I’ll leave that
>> question alone for the moment. I wasn’t able to use Clojure’s
>> transient (mutate-in-place-within-a-thread/transaction)
>> functionality, because Clojure transients do not afford iteration,
>> which is needed to support find(). It seems feasible to me that a
>> custom implementation with the ability to use mutate-in-place within
>> transactions might offer more improvement, but that’s a whole ‘nuther
>> kettle of fish.
>> 
>> I’ll spend some time soon moving on with the Dexx branch and trying
>> out some simple tests of the kind you’ve outlined below (and I’ll
>> include something that exercises property paths, which actually
>> happen to be very interesting for a few use cases in which I am
>> interested). I’m not sure how to engage real world use very
>> effectively. I can certainly spin up examples, but it seems like we
>> would want a broader set of users than just me to try it out, no?
>> {grin}
> 
> That would be ideal but it's not always easy to do.  Email to users@ possibly with a quite large notice saying people are affected.
> 
> I think the problem areas are around adding inference graphs to general datasets, not the details of this new dataset implementation.
> 
> Discussion/proposal:
> 
> * Add this as DatasetFactory.createTxnMem(),
> * Add DatasetFactory.createGeneral()
> * ?? Deprecate DatasetFactory.createMem(),
>     referring to createTxnMem() and createGeneral()
> (other clearing up of DatasetFactory ...)
> * Release.
> 
> 
> 	Andy
> 
>> 
>> --- A. Soroka The University of Virginia Library
> 
> 2015-01-03:
> jena-624-dexx branch:
> 
> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>     Size: 1,000,312 (3.253s, 307,504 tps)
> ==== DSG/mix/auto (warm N=3)
> ==== DSG/mix/txn  (warm N=3)
> ==== DSG/mem/auto (warm N=3)
> ==== DSG/mem/txn  (warm N=3)
> ==== DSG/mix/auto (N=20)
> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
> ==== DSG/mix/txn  (N=20)
> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
> ==== DSG/mem/auto (N=20)
> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
> ==== DSG/mem/txn  (N=20)
> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)