You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2013/08/29 01:39:57 UTC

Impact on deprecation of BulkUpdateHandler on SDB

SDB currently implements its own BulkUpdateHandler, and I just made some 
tests that indicate that it is significantly faster than using 
GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that 
BulkUpdateHandler has been deprecated, and Model.add is already using 
GraphUtil.add, what call sequence are we supposed to use to retain the 
good performance of the BulkUpdateHandler? Could a method 
Graph.add(Iterable<Triple>) be added to allow graphs to optimize the 
behavior for specific Graph types?

Thanks
Holger

Re: SPARQL Query to retrieve a particular RdfId

Posted by Charles Li <ch...@gmail.com>.

Never mind, figured out - str() function.

Sorry for bothering!

Thanks!
- Charles

On Aug 28, 2013, at 6:32 PM, Charles Li <ch...@gmail.com> wrote:

> I want to query a Jena model loaded by TDB from an RDF/XML file to get all subjects whose objects each contains one particular RDFID. I tried the following SPARQL query and it didn't return any result.
>  
> select ?s  WHERE
> {
>    ?s  ?p  ?o .
>    FILTER(CONTAINS(?o, "_{9F750A5B-F02E-4B64-8D78-D0F527ACB900}>"))
> }
>  
> I guess ?o is a Resource, and there should be some SPARQL function to apply to "?o" first to extract the RDFID string and then do the "CONTAINS" function, but I just couldn't find such a function.
>  
> Please help!
> 
> Thanks a lot!!
> - Charles
>

SPARQL Query to retrieve a particular RdfId

Posted by Charles Li <ch...@gmail.com>.

I want to query a Jena model loaded by TDB from an RDF/XML file to get all subjects whose objects each contains one particular RDFID. I tried the following SPARQL query and it didn't return any result.
 
select ?s  WHERE
{
   ?s  ?p  ?o .
   FILTER(CONTAINS(?o, "_{9F750A5B-F02E-4B64-8D78-D0F527ACB900}>"))
}
 
I guess ?o is a Resource, and there should be some SPARQL function to apply to "?o" first to extract the RDFID string and then do the "CONTAINS" function, but I just couldn't find such a function.
 
Please help!

Thanks a lot!!
- Charles

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Holger Knublauch <ho...@knublauch.com>.

Hi Andy,

before I contacted this list I was doing some background reading to try 
to figure out why the BulkUpdateHandler had been deleted, but you know 
how difficult this can be by searching through mailing list archives. 
And there is just too much traffic to stay up to date on a daily basis. 
As a constructive advice, it would be good if the deprecation of BUH 
would have been properly documented in the source code. Right now it 
just states "Bulk update operations are going to be removed" without 
hinting at a proper replacement. My original question was exactly on 
what will replace it, and it seems like the TransactionHandlers are now 
responsible for it.

I did some tests but at least for SDB this pattern does not seem to be 
working as it should.

The example operation is to insert 10k triples into an SDB:

         List<Triple> triples = new LinkedList<Triple>();
         for(int i = 0; i < 10000; i++) {
             triples.add(Triple.create(OWL.Thing.asNode(),
                    RDFS.seeAlso.asNode(), NodeFactory.createLiteral("" 
+ i)));
         }

Wrapping the add with a TransactionHandler.begin/commit takes 40 seconds 
(sdb is a GraphSDB):

         {
             sdb.getTransactionHandler().begin();
             GraphUtil.add(sdb, triples);
             sdb.getTransactionHandler().commit();
             sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(), 
Node.ANY).toList();
         }

While the SDB-specific trick with the event manager takes 1 second:

         {
             sdb.getEventManager().notifyEvent(sdb, GraphEvents.startRead);
             GraphUtil.add(sdb, triples);
             sdb.getEventManager().notifyEvent(sdb, GraphEvents.finishRead);
             sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(), 
Node.ANY).toList();
         }

(The find at the end is there to verify that the triples are immediately 
available after write, and not delayed by some thread in the background).

The latter solution (above) uses the same call sequence as SDB's 
BulkUpdateHandler, which seems to use some background thread with a 
queue to do the actual writing:

             store.getLoader().startBulkUpdate();
             ...
             store.getLoader().flushTriples();

while the TransactionHandlerSDB does the following

             sqlConnection.setAutoCommit(false) ;
             ...
             sqlConnection.commit() ;
             sqlConnection.setAutoCommit(true) ;

So the two approaches are very different, with the current 
implementation of TransactionHandlerSDB in my tests much less efficient 
than the BulkUpdateHandler. Obviously I would prefer to call the 
BulkUpdateHandler mechanism until this has been resolved (or shown to be 
my mistake).

Further comment below...

On 9/4/2013 23:22, Andy Seaborne wrote:
> Current documentation:
>
> http://jena.apache.org/documentation/sdb/loading_data.html
>
> which includes:
>
>  model.notifyEvent(GraphEvents.startRead)
>  ... do add/remove operations ...
>  model.notifyEvent(GraphEvents.finishRead)

This design pattern looks weak to me and it looks weird to call a 
startRead to initiate a write. We furthermore cannot rely on some 
SDB-specific code as our code shall also work with other back-ends. So 
I'd much rather call begin/commit using transactions.

>>> TopQuadrant have already released recently.

Yes and this is why we are having a breather now to finally catch up 
with the new Jena version. Such low-level, risky changes should be done 
in the beginning of a life cycle.

>>> Currently, as SDB patches come in, I personally try to find time to 
>>> apply
>>> them.  I don't have an SDB test system and certainly do not have 
>>> setups of
>>> each of the databases with SDB adapters.
>>>
>>> Such time is my time - my employer doesn't use SDB.

I hear your frustration, and as always your contributions are greatly 
appreciated. Rest assured that I also have other things to do than 
tracking down and adjusting our code to changes between Jena versions. 
So far I have spent one week on this bulk update issue alone. We do have 
90 usages of the BulkUpdateHandler in our code, and who knows what other 
side effects the migration into Transactions will have. So what seems 
like a good simplification to the API from the perspective of the Jena 
developers also has downsides to users with a very large, six-year-old 
code base like ours.

Thanks
Holger

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/13 12:06, Claude Warren wrote:
> Is the recommended migration path to do the following instead of the bulk
> update:
>
> Start a transaction
> Insert each triple
> Commit transaction
>
> With the assumption that the underlying transaction implementation will
> batch the update to the storage layer?
>
> Claude

Current documentation:

http://jena.apache.org/documentation/sdb/loading_data.html

which includes:

  model.notifyEvent(GraphEvents.startRead)
  ... do add/remove operations ...
  model.notifyEvent(GraphEvents.finishRead)

but as I am trying to point out, transactions capture a logical unit of 
change in an application.

	Andy

>
>
>
> On Wed, Sep 4, 2013 at 10:14 AM, Andy Seaborne <an...@apache.org> wrote:
>
>> It has to be the applications responsibility to add transaction boundaries.
>>
>> * JDBC connection re often controlled by the environment such as pooling.
>>
>> * The application may already be in a transaction.
>>
>> * Transactions are logical grouping of actions.
>>    Model operations are not application logical boundaries.
>>    The application will want a group of model operations in
>>    one transaction.
>>
>> * Nested transactions are rarely supported.
>>
>> Aside from everything else, TopQuadrant should be using transactions.
>> Auto-commit could easily explain 40s.  We can't tell - it's closed source.
>>
>> TopQuadrant can call the SDB controls.  Jnea is open source - see the code
>> starting at UpdateHandlerSDB.
>>
>> -- from GraphSDB --
>>
>> store.getLoader().**startBulkUpdate() ;
>>    .... do stuff ...
>> store.getLoader().**startBulkUpdate() ;
>>
>> --
>>
>>>> As this is quite a
>>>> crucial issue for our upgrade right now,
>>
>> TopQuadrant have already released recently.
>> https://groups.google.com/d/**msg/topbraid-users/**
>> El9BbRV6wk4/ac-602N2emoJ<https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ>
>>
>> Re: SDB
>>
>> The position of SDB has been discussed:
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
>> mbox/%3C519BA513.1030403%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E>
>>
>>
>>>> we could gain time for a proper redesign
>> I'll detail the history of BulkUpdateHandler in a separate message.
>>
>> This message is relevant:
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
>> mbox/%3C51A5678A.5070607@**knublauch.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E>
>>
>> where Holger writes:
>> [[ Subject: [DISCUSS] SDB future
>> Having to rely on commercial alternatives would affect
>> the overall price tag of our solutions, and having an open source
>> solution that seamlessly works with Jena is a great asset.
>>
>> I have brought up the topic of this thread in our management to see
>> whether we can allocate any resources to its future, but I cannot report
>> any decision at this stage.
>> ]]
>>
>> My suggestion is a add a Graph that does batching.
>>
>> In fact, my advocacy is that all additonal functionality (events,
>> batching, read-only, security, ...) are done by graph-wrapping AKA
>> implementation inheritance.
>>
>> Transaction should be consolidated on the dataset interface.
>>
>> --
>>
>> Currently, as SDB patches come in, I personally try to find time to apply
>> them.  I don't have an SDB test system and certainly do not have setups of
>> each of the databases with SDB adapters.
>>
>> Such time is my time - my employer doesn't use SDB.
>>
>>          Andy
>>
>>
>>
>> On 04/09/13 01:32, Holger Knublauch wrote:
>>
>>> On 9/4/2013 3:15, Claude Warren wrote:
>>>
>>>> As I recall the discuss around this topic dealt with the idea that you
>>>> could add each triple inside a transaction and when the transaction
>>>> committed transaction code would do the bulk update if supported.
>>>>
>>>
>>> If this were the case, then the code in the GraphUtil helper functions
>>> should probably wrap the individual performUpdate calls with a
>>> transaction, but they don't.
>>>
>>>      However
>>>> I may be way off base here.  I have no objection to retaining the BUH.
>>>>
>>>
>>> I would greatly appreciate seeing this resolved before the final
>>> release. As suggested earlier, we could gain time for a proper redesign
>>> by avoiding the calls to the GraphUtil replacement functions, or
>>> changing those functions so that they call graph.getBulkUpdateHandler()
>>> for the time being, and possibly undeprecate BUH for now.
>>>
>>> Thanks,
>>> Holger
>>>
>>>
>>>> Claude
>>>>
>>>>
>>>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>>>> <ho...@knublauch.com>wrote:
>>>>
>>>>   Hi group,
>>>>>
>>>>> I did not see any response to my question below, which is usual for this
>>>>> list where responses are usually fast and competent. As this is quite a
>>>>> crucial issue for our upgrade right now, I would like to ask again, and
>>>>> rephrase my question. I understand SDB is rather unsupported, but the
>>>>> issue
>>>>> is really a question on the core API.
>>>>>
>>>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>>>> database such as Oracle RDF (the Jena adapter of which implements its
>>>>> own
>>>>> BUH right now). Granted, the class is not gone yet, but some existing
>>>>> API
>>>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>>>> this
>>>>> was premature (revision 1419595). My suggestion is to continue to
>>>>> delegate
>>>>> Model.add through the BulkUpdateHandler for the upcoming release
>>>>> until the
>>>>> interface has been truly removed/replaced with something else. BUH
>>>>> does not
>>>>> represent much implementation overhead for Graph implementers,
>>>>> because they
>>>>> can simply use the default implementation. The current implementation is
>>>>> too inefficient for our product.
>>>>>
>>>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>>>> happy to hear about it.
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>>
>>>>>
>>>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>>>
>>>>>   SDB currently implements its own BulkUpdateHandler, and I just made
>>>>>> some
>>>>>> tests that indicate that it is significantly faster than using
>>>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>>>> the good
>>>>>> performance of the BulkUpdateHandler? Could a method
>>>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>>>> behavior for specific Graph types?
>>>>>>
>>>>>> Thanks
>>>>>> Holger
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Claude Warren <cl...@xenei.com>.

Is the recommended migration path to do the following instead of the bulk
update:

Start a transaction
Insert each triple
Commit transaction

With the assumption that the underlying transaction implementation will
batch the update to the storage layer?

Claude



On Wed, Sep 4, 2013 at 10:14 AM, Andy Seaborne <an...@apache.org> wrote:

> It has to be the applications responsibility to add transaction boundaries.
>
> * JDBC connection re often controlled by the environment such as pooling.
>
> * The application may already be in a transaction.
>
> * Transactions are logical grouping of actions.
>   Model operations are not application logical boundaries.
>   The application will want a group of model operations in
>   one transaction.
>
> * Nested transactions are rarely supported.
>
> Aside from everything else, TopQuadrant should be using transactions.
> Auto-commit could easily explain 40s.  We can't tell - it's closed source.
>
> TopQuadrant can call the SDB controls.  Jnea is open source - see the code
> starting at UpdateHandlerSDB.
>
> -- from GraphSDB --
>
> store.getLoader().**startBulkUpdate() ;
>   .... do stuff ...
> store.getLoader().**startBulkUpdate() ;
>
> --
>
> >> As this is quite a
> >> crucial issue for our upgrade right now,
>
> TopQuadrant have already released recently.
> https://groups.google.com/d/**msg/topbraid-users/**
> El9BbRV6wk4/ac-602N2emoJ<https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ>
>
> Re: SDB
>
> The position of SDB has been discussed:
> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
> mbox/%3C519BA513.1030403%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E>
>
>
> >> we could gain time for a proper redesign
> I'll detail the history of BulkUpdateHandler in a separate message.
>
> This message is relevant:
> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
> mbox/%3C51A5678A.5070607@**knublauch.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E>
>
> where Holger writes:
> [[ Subject: [DISCUSS] SDB future
> Having to rely on commercial alternatives would affect
> the overall price tag of our solutions, and having an open source
> solution that seamlessly works with Jena is a great asset.
>
> I have brought up the topic of this thread in our management to see
> whether we can allocate any resources to its future, but I cannot report
> any decision at this stage.
> ]]
>
> My suggestion is a add a Graph that does batching.
>
> In fact, my advocacy is that all additonal functionality (events,
> batching, read-only, security, ...) are done by graph-wrapping AKA
> implementation inheritance.
>
> Transaction should be consolidated on the dataset interface.
>
> --
>
> Currently, as SDB patches come in, I personally try to find time to apply
> them.  I don't have an SDB test system and certainly do not have setups of
> each of the databases with SDB adapters.
>
> Such time is my time - my employer doesn't use SDB.
>
>         Andy
>
>
>
> On 04/09/13 01:32, Holger Knublauch wrote:
>
>> On 9/4/2013 3:15, Claude Warren wrote:
>>
>>> As I recall the discuss around this topic dealt with the idea that you
>>> could add each triple inside a transaction and when the transaction
>>> committed transaction code would do the bulk update if supported.
>>>
>>
>> If this were the case, then the code in the GraphUtil helper functions
>> should probably wrap the individual performUpdate calls with a
>> transaction, but they don't.
>>
>>     However
>>> I may be way off base here.  I have no objection to retaining the BUH.
>>>
>>
>> I would greatly appreciate seeing this resolved before the final
>> release. As suggested earlier, we could gain time for a proper redesign
>> by avoiding the calls to the GraphUtil replacement functions, or
>> changing those functions so that they call graph.getBulkUpdateHandler()
>> for the time being, and possibly undeprecate BUH for now.
>>
>> Thanks,
>> Holger
>>
>>
>>> Claude
>>>
>>>
>>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>>> <ho...@knublauch.com>wrote:
>>>
>>>  Hi group,
>>>>
>>>> I did not see any response to my question below, which is usual for this
>>>> list where responses are usually fast and competent. As this is quite a
>>>> crucial issue for our upgrade right now, I would like to ask again, and
>>>> rephrase my question. I understand SDB is rather unsupported, but the
>>>> issue
>>>> is really a question on the core API.
>>>>
>>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>>> database such as Oracle RDF (the Jena adapter of which implements its
>>>> own
>>>> BUH right now). Granted, the class is not gone yet, but some existing
>>>> API
>>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>>> this
>>>> was premature (revision 1419595). My suggestion is to continue to
>>>> delegate
>>>> Model.add through the BulkUpdateHandler for the upcoming release
>>>> until the
>>>> interface has been truly removed/replaced with something else. BUH
>>>> does not
>>>> represent much implementation overhead for Graph implementers,
>>>> because they
>>>> can simply use the default implementation. The current implementation is
>>>> too inefficient for our product.
>>>>
>>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>>> happy to hear about it.
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>>>
>>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>>
>>>>  SDB currently implements its own BulkUpdateHandler, and I just made
>>>>> some
>>>>> tests that indicate that it is significantly faster than using
>>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>>> the good
>>>>> performance of the BulkUpdateHandler? Could a method
>>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>>> behavior for specific Graph types?
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>>
>>>>>
>>>
>>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Andy Seaborne <an...@apache.org>.

It has to be the applications responsibility to add transaction boundaries.

* JDBC connection re often controlled by the environment such as pooling.

* The application may already be in a transaction.

* Transactions are logical grouping of actions.
   Model operations are not application logical boundaries.
   The application will want a group of model operations in
   one transaction.

* Nested transactions are rarely supported.

Aside from everything else, TopQuadrant should be using transactions. 
Auto-commit could easily explain 40s.  We can't tell - it's closed source.

TopQuadrant can call the SDB controls.  Jnea is open source - see the 
code starting at UpdateHandlerSDB.

-- from GraphSDB --

store.getLoader().startBulkUpdate() ;
   .... do stuff ...
store.getLoader().startBulkUpdate() ;

-- 

 >> As this is quite a
 >> crucial issue for our upgrade right now,

TopQuadrant have already released recently.
https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ

Re: SDB

The position of SDB has been discussed:
http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E

 >> we could gain time for a proper redesign
I'll detail the history of BulkUpdateHandler in a separate message.

This message is relevant:
http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E

where Holger writes:
[[ Subject: [DISCUSS] SDB future
Having to rely on commercial alternatives would affect
the overall price tag of our solutions, and having an open source
solution that seamlessly works with Jena is a great asset.

I have brought up the topic of this thread in our management to see
whether we can allocate any resources to its future, but I cannot report 
any decision at this stage.
]]

My suggestion is a add a Graph that does batching.

In fact, my advocacy is that all additonal functionality (events, 
batching, read-only, security, ...) are done by graph-wrapping AKA 
implementation inheritance.

Transaction should be consolidated on the dataset interface.

--

Currently, as SDB patches come in, I personally try to find time to 
apply them.  I don't have an SDB test system and certainly do not have 
setups of each of the databases with SDB adapters.

Such time is my time - my employer doesn't use SDB.

	Andy

On 04/09/13 01:32, Holger Knublauch wrote:
> On 9/4/2013 3:15, Claude Warren wrote:
>> As I recall the discuss around this topic dealt with the idea that you
>> could add each triple inside a transaction and when the transaction
>> committed transaction code would do the bulk update if supported.
>
> If this were the case, then the code in the GraphUtil helper functions
> should probably wrap the individual performUpdate calls with a
> transaction, but they don't.
>
>>    However
>> I may be way off base here.  I have no objection to retaining the BUH.
>
> I would greatly appreciate seeing this resolved before the final
> release. As suggested earlier, we could gain time for a proper redesign
> by avoiding the calls to the GraphUtil replacement functions, or
> changing those functions so that they call graph.getBulkUpdateHandler()
> for the time being, and possibly undeprecate BUH for now.
>
> Thanks,
> Holger
>
>>
>> Claude
>>
>>
>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>> <ho...@knublauch.com>wrote:
>>
>>> Hi group,
>>>
>>> I did not see any response to my question below, which is usual for this
>>> list where responses are usually fast and competent. As this is quite a
>>> crucial issue for our upgrade right now, I would like to ask again, and
>>> rephrase my question. I understand SDB is rather unsupported, but the
>>> issue
>>> is really a question on the core API.
>>>
>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>> database such as Oracle RDF (the Jena adapter of which implements its
>>> own
>>> BUH right now). Granted, the class is not gone yet, but some existing
>>> API
>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>> this
>>> was premature (revision 1419595). My suggestion is to continue to
>>> delegate
>>> Model.add through the BulkUpdateHandler for the upcoming release
>>> until the
>>> interface has been truly removed/replaced with something else. BUH
>>> does not
>>> represent much implementation overhead for Graph implementers,
>>> because they
>>> can simply use the default implementation. The current implementation is
>>> too inefficient for our product.
>>>
>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>> happy to hear about it.
>>>
>>> Thanks
>>> Holger
>>>
>>>
>>>
>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>
>>>> SDB currently implements its own BulkUpdateHandler, and I just made
>>>> some
>>>> tests that indicate that it is significantly faster than using
>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>> the good
>>>> performance of the BulkUpdateHandler? Could a method
>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>> behavior for specific Graph types?
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>
>

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Claude Warren <cl...@xenei.com>.

I opened https://issues.apache.org/jira/browse/JENA-528 for this.  Please
add comments there, vote it up and perhaps watch it.

Claude


On Wed, Sep 4, 2013 at 1:32 AM, Holger Knublauch <ho...@knublauch.com>wrote:

> On 9/4/2013 3:15, Claude Warren wrote:
>
>> As I recall the discuss around this topic dealt with the idea that you
>> could add each triple inside a transaction and when the transaction
>> committed transaction code would do the bulk update if supported.
>>
>
> If this were the case, then the code in the GraphUtil helper functions
> should probably wrap the individual performUpdate calls with a transaction,
> but they don't.
>
>
>     However
>> I may be way off base here.  I have no objection to retaining the BUH.
>>
>
> I would greatly appreciate seeing this resolved before the final release.
> As suggested earlier, we could gain time for a proper redesign by avoiding
> the calls to the GraphUtil replacement functions, or changing those
> functions so that they call graph.getBulkUpdateHandler() for the time
> being, and possibly undeprecate BUH for now.
>
> Thanks,
> Holger
>
>
>
>> Claude
>>
>>
>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <holger@knublauch.com
>> >wrote:
>>
>>  Hi group,
>>>
>>> I did not see any response to my question below, which is usual for this
>>> list where responses are usually fast and competent. As this is quite a
>>> crucial issue for our upgrade right now, I would like to ask again, and
>>> rephrase my question. I understand SDB is rather unsupported, but the
>>> issue
>>> is really a question on the core API.
>>>
>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>> database such as Oracle RDF (the Jena adapter of which implements its own
>>> BUH right now). Granted, the class is not gone yet, but some existing API
>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>> this
>>> was premature (revision 1419595). My suggestion is to continue to
>>> delegate
>>> Model.add through the BulkUpdateHandler for the upcoming release until
>>> the
>>> interface has been truly removed/replaced with something else. BUH does
>>> not
>>> represent much implementation overhead for Graph implementers, because
>>> they
>>> can simply use the default implementation. The current implementation is
>>> too inefficient for our product.
>>>
>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>> happy to hear about it.
>>>
>>> Thanks
>>> Holger
>>>
>>>
>>>
>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>
>>>  SDB currently implements its own BulkUpdateHandler, and I just made some
>>>> tests that indicate that it is significantly faster than using
>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>> GraphUtil.add, what call sequence are we supposed to use to retain the
>>>> good
>>>> performance of the BulkUpdateHandler? Could a method
>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>> behavior for specific Graph types?
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>>>
>>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Holger Knublauch <ho...@knublauch.com>.

On 9/4/2013 3:15, Claude Warren wrote:
> As I recall the discuss around this topic dealt with the idea that you
> could add each triple inside a transaction and when the transaction
> committed transaction code would do the bulk update if supported.

If this were the case, then the code in the GraphUtil helper functions 
should probably wrap the individual performUpdate calls with a 
transaction, but they don't.

>    However
> I may be way off base here.  I have no objection to retaining the BUH.

I would greatly appreciate seeing this resolved before the final 
release. As suggested earlier, we could gain time for a proper redesign 
by avoiding the calls to the GraphUtil replacement functions, or 
changing those functions so that they call graph.getBulkUpdateHandler() 
for the time being, and possibly undeprecate BUH for now.

Thanks,
Holger

>
> Claude
>
>
> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <ho...@knublauch.com>wrote:
>
>> Hi group,
>>
>> I did not see any response to my question below, which is usual for this
>> list where responses are usually fast and competent. As this is quite a
>> crucial issue for our upgrade right now, I would like to ask again, and
>> rephrase my question. I understand SDB is rather unsupported, but the issue
>> is really a question on the core API.
>>
>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>> database such as Oracle RDF (the Jena adapter of which implements its own
>> BUH right now). Granted, the class is not gone yet, but some existing API
>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe this
>> was premature (revision 1419595). My suggestion is to continue to delegate
>> Model.add through the BulkUpdateHandler for the upcoming release until the
>> interface has been truly removed/replaced with something else. BUH does not
>> represent much implementation overhead for Graph implementers, because they
>> can simply use the default implementation. The current implementation is
>> too inefficient for our product.
>>
>> If there is a cleaner mechanism to get the same performance, then I'd be
>> happy to hear about it.
>>
>> Thanks
>> Holger
>>
>>
>>
>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>
>>> SDB currently implements its own BulkUpdateHandler, and I just made some
>>> tests that indicate that it is significantly faster than using
>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>> GraphUtil.add, what call sequence are we supposed to use to retain the good
>>> performance of the BulkUpdateHandler? Could a method
>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>> behavior for specific Graph types?
>>>
>>> Thanks
>>> Holger
>>>
>>>
>

Re: History of BulkUpdateHandler changes

Posted by Alan Wu <al...@oracle.com>.

Hi Andy,

Yes. GraphOracleSem and DatasetGraphOracleSem allow addition/deletion of triples and quads, respectively.
Users can choose to commit or rollback transactions at any time. Without committing, one session can of course
see its own changes. But other sessions won't.

We will use http://jena.apache.org in our document, if we haven't already done so :)

Cheers,

Zhe

On 9/4/2013 2:48 PM, Andy Seaborne wrote:
> On 04/09/13 17:02, Alan Wu wrote:
>> Hi Andy,
>>
>> FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top
>> Quadrant's tools and
>> customer's applications.
>>
>> Thanks,
>>
>> Zhe Wu
>> Oracle Spatial and Graph
>
> Zhe,
>
> Thanks for the information.  It's beginning to look like transaction boundaries are a significant factor, for example, deleting and adding triples as a signal ACID action.  How's that handled? Via oracle.spatial.rdf.client.jena.GraphOracleSem?
>
>     Andy
>
> PS Hopefully the docuemnt will change http://jena.sourceforge.net/ to http://jena.apache.org/ as well :-)  Your lawyers can advise on the legal side.
>
>>
>> On 9/4/2013 2:15 AM, Andy Seaborne wrote:
>>> The Jena project works in public.  The history of the discussions for
>>> BulkUpdateHandler and SDB are in various public archives.
>>>
>>> I would like to see acknowledgement of prior discussions and the
>>> intentions behind the changes.
>>>
>>> We made the graph-level bulk update handler change at 2.10.0 and we've
>>> had 2.10.1 since then.
>>>
>>> There was a message on the users list Nov 2012
>>>
>>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>>>
>>>
>>> and the dev list a year ago:
>>>
>>> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>>>
>>>
>>> Oracle are aware of the changes:
>>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>>>
>>>
>>> Oracle do not track Jena versions.
>>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>>> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>>>
>>> I do know that the complexities arising in Jena lead to costs for
>>> storage implementers.  I want to reduce those costs in the long term.
>>>
>>>     Andy
>>
>

Re: History of BulkUpdateHandler changes

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/13 17:02, Alan Wu wrote:
> Hi Andy,
>
> FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top
> Quadrant's tools and
> customer's applications.
>
> Thanks,
>
> Zhe Wu
> Oracle Spatial and Graph

Zhe,

Thanks for the information.  It's beginning to look like transaction 
boundaries are a significant factor, for example, deleting and adding 
triples as a signal ACID action.  How's that handled? Via 
oracle.spatial.rdf.client.jena.GraphOracleSem?

	Andy

PS Hopefully the docuemnt will change http://jena.sourceforge.net/ to 
http://jena.apache.org/ as well :-)  Your lawyers can advise on the 
legal side.

>
> On 9/4/2013 2:15 AM, Andy Seaborne wrote:
>> The Jena project works in public.  The history of the discussions for
>> BulkUpdateHandler and SDB are in various public archives.
>>
>> I would like to see acknowledgement of prior discussions and the
>> intentions behind the changes.
>>
>> We made the graph-level bulk update handler change at 2.10.0 and we've
>> had 2.10.1 since then.
>>
>> There was a message on the users list Nov 2012
>>
>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>>
>>
>> and the dev list a year ago:
>>
>> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>>
>>
>> Oracle are aware of the changes:
>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>>
>>
>> Oracle do not track Jena versions.
>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>>
>> I do know that the complexities arising in Jena lead to costs for
>> storage implementers.  I want to reduce those costs in the long term.
>>
>>     Andy
>

Re: History of BulkUpdateHandler changes

Posted by Alan Wu <al...@oracle.com>.

Hi Andy,

FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top Quadrant's tools and
customer's applications.

Thanks,

Zhe Wu
Oracle Spatial and Graph

On 9/4/2013 2:15 AM, Andy Seaborne wrote:
> The Jena project works in public.  The history of the discussions for BulkUpdateHandler and SDB are in various public archives.
>
> I would like to see acknowledgement of prior discussions and the intentions behind the changes.
>
> We made the graph-level bulk update handler change at 2.10.0 and we've had 2.10.1 since then.
>
> There was a message on the users list Nov 2012
>
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>
> and the dev list a year ago:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>
> Oracle are aware of the changes:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>
> Oracle do not track Jena versions.
> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>
> I do know that the complexities arising in Jena lead to costs for storage implementers.  I want to reduce those costs in the long term.
>
>     Andy

Re: History of BulkUpdateHandler changes

Posted by Holger Knublauch <ho...@knublauch.com>.

Hi Claude,

yes this may work, but I don't think SDB's TransactionHandler does this right now. I am not (at all) familiar with SDB, but my observations as described in a parallel thread indicate that the SDB BulkUpdateHandler has the best performance. My concern is that the work needed to bring SDB up to speed outweighs the potential benefits of simplifying the API.

Thanks,
Holger


On Sep 13, 2013, at 4:29 PM, Claude Warren wrote:

> Holger,
> 
> Would it not make sense for the TransactionHandler to track all the updates
> and delete that occur within a transaction and submit them to the
> underlying db in blocks while calling the listener methods on the graph at
> commit?  Does this provide the path you are looking for to keep bulk
> updates?
> 
> Claude
> 
> 
> On Thu, Sep 5, 2013 at 5:50 AM, Holger Knublauch <ho...@knublauch.com>wrote:
> 
>> Hi Andy,
>> 
>> thanks for pointing at the old discussions. Reading through them, I notice
>> that TopQuadrant should have responded earlier. I don't know whether I
>> actually noticed this email, or whether I didn't understand the
>> implications at the time, or whether tracking the low level details of Jena
>> was outside of my responsibility at the time. In either case it was an
>> oversight and I would like to give my input, albeit late.
>> 
>> On 9/4/2013 19:15, Andy Seaborne wrote:
>> 
>>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>>> 201211.mbox/%3C50B660D4.**6070306%40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E>
>>> 
>> 
>>> "[the BulkUpdateHandler] is not used"
>> 
>> This is not correct as SDB and OracleRDF are using it, possibly others.
>> 
>> 
>> 
>> and the dev list a year ago:
>>> 
>>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201209.**
>>> mbox/%3C5044E9F3.8060705%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E>
>>> 
>> 
>> Remove BulkUpdateHandler interface
>>> 
>>     Migrate it's few useful operation to Graph.
>> 
>> 
>> Yes, migrating the useful operations to Graph would IMHO have made sense,
>> but this has not happened yet - instead the suggestion is to use
>> transactions.
>> 
>> 
>>> UpdateHandlerSDB / A few of it's operations are useful but most turn
>> into nothing but loops to call add(Triple)/delete(Triple).
>> 
>> The SDB implementation is very useful and makes significant performance
>> differences. I assume likewise for Oracle.
>> 
>> 
>> 
>>> Oracle are aware of the changes:
>>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>>> 201211.mbox/%3C50B688D8.**9040600%40oracle.com%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E>
>>> 
>> 
>> Zhe responded that BUH is used, but judging from the archive, the
>> discussion seems to have ended without a proper conclusion.
>> 
>> 
>> 
>>> Oracle do not track Jena versions.
>>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>>> http://docs.oracle.com/cd/**E18283_01/appdev.112/e11828/**sem_jena.htm<http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm>
>>> 
>>> I do know that the complexities arising in Jena lead to costs for storage
>>> implementers.  I want to reduce those costs in the long term.
>>> 
>> 
>> The latter argument is IMHO very weak. There are probably less than 10
>> Jena Graph database implementations (SDB, TDB, Oracle etc). They already
>> have BUH implementations. Even if 10 more Graph implementations are added,
>> it would mean that those 10 developers need to add approximately three
>> lines of code:
>> 
>>    public BulkUpdateHandler getBulkUpdateHandler() {
>>        return new SimpleBulkUpdateHandler(this);
>>    }
>> 
>> OTOH by removing BulkUpdateHandler, you will see every user of this API
>> affected, certainly more than 10. The overhead of adjusting SDB alone seems
>> to far outweigh the cost savings (unless my previous observations about SDB
>> were incorrect).
>> 
>> BTW I do agree that the number of event listener methods should be greatly
>> reduced. Maybe only have notifyAddTriple and notifyAddIterable (taking an
>> Iterable instead of a List). I am not 100% sure that only having
>> notifyAddTriple would be sufficient for our use cases, so I'd rather see
>> one form of bulk event preserved and Iterable seems the most generic one.
>> 
>> Thanks,
>> Holger
>> 
>> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

Re: History of BulkUpdateHandler changes

Posted by Claude Warren <cl...@xenei.com>.

Holger,

Would it not make sense for the TransactionHandler to track all the updates
and delete that occur within a transaction and submit them to the
underlying db in blocks while calling the listener methods on the graph at
commit?  Does this provide the path you are looking for to keep bulk
updates?

Claude


On Thu, Sep 5, 2013 at 5:50 AM, Holger Knublauch <ho...@knublauch.com>wrote:

> Hi Andy,
>
> thanks for pointing at the old discussions. Reading through them, I notice
> that TopQuadrant should have responded earlier. I don't know whether I
> actually noticed this email, or whether I didn't understand the
> implications at the time, or whether tracking the low level details of Jena
> was outside of my responsibility at the time. In either case it was an
> oversight and I would like to give my input, albeit late.
>
> On 9/4/2013 19:15, Andy Seaborne wrote:
>
>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>> 201211.mbox/%3C50B660D4.**6070306%40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E>
>>
>
> > "[the BulkUpdateHandler] is not used"
>
> This is not correct as SDB and OracleRDF are using it, possibly others.
>
>
>
>  and the dev list a year ago:
>>
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201209.**
>> mbox/%3C5044E9F3.8060705%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E>
>>
>
>  Remove BulkUpdateHandler interface
>>
>      Migrate it's few useful operation to Graph.
>
>
> Yes, migrating the useful operations to Graph would IMHO have made sense,
> but this has not happened yet - instead the suggestion is to use
> transactions.
>
>
> > UpdateHandlerSDB / A few of it's operations are useful but most turn
> into nothing but loops to call add(Triple)/delete(Triple).
>
> The SDB implementation is very useful and makes significant performance
> differences. I assume likewise for Oracle.
>
>
>
>> Oracle are aware of the changes:
>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>> 201211.mbox/%3C50B688D8.**9040600%40oracle.com%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E>
>>
>
> Zhe responded that BUH is used, but judging from the archive, the
> discussion seems to have ended without a proper conclusion.
>
>
>
>> Oracle do not track Jena versions.
>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>> http://docs.oracle.com/cd/**E18283_01/appdev.112/e11828/**sem_jena.htm<http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm>
>>
>> I do know that the complexities arising in Jena lead to costs for storage
>> implementers.  I want to reduce those costs in the long term.
>>
>
> The latter argument is IMHO very weak. There are probably less than 10
> Jena Graph database implementations (SDB, TDB, Oracle etc). They already
> have BUH implementations. Even if 10 more Graph implementations are added,
> it would mean that those 10 developers need to add approximately three
> lines of code:
>
>     public BulkUpdateHandler getBulkUpdateHandler() {
>         return new SimpleBulkUpdateHandler(this);
>     }
>
> OTOH by removing BulkUpdateHandler, you will see every user of this API
> affected, certainly more than 10. The overhead of adjusting SDB alone seems
> to far outweigh the cost savings (unless my previous observations about SDB
> were incorrect).
>
> BTW I do agree that the number of event listener methods should be greatly
> reduced. Maybe only have notifyAddTriple and notifyAddIterable (taking an
> Iterable instead of a List). I am not 100% sure that only having
> notifyAddTriple would be sufficient for our use cases, so I'd rather see
> one form of bulk event preserved and Iterable seems the most generic one.
>
> Thanks,
> Holger
>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: History of BulkUpdateHandler changes

Posted by Holger Knublauch <ho...@knublauch.com>.

Hi Andy,

thanks for pointing at the old discussions. Reading through them, I 
notice that TopQuadrant should have responded earlier. I don't know 
whether I actually noticed this email, or whether I didn't understand 
the implications at the time, or whether tracking the low level details 
of Jena was outside of my responsibility at the time. In either case it 
was an oversight and I would like to give my input, albeit late.

On 9/4/2013 19:15, Andy Seaborne wrote:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E 
>

 > "[the BulkUpdateHandler] is not used"

This is not correct as SDB and OracleRDF are using it, possibly others.

> and the dev list a year ago:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E 
>

> Remove BulkUpdateHandler interface
      Migrate it's few useful operation to Graph.

Yes, migrating the useful operations to Graph would IMHO have made 
sense, but this has not happened yet - instead the suggestion is to use 
transactions.

 > UpdateHandlerSDB / A few of it's operations are useful but most turn 
into nothing but loops to call add(Triple)/delete(Triple).

The SDB implementation is very useful and makes significant performance 
differences. I assume likewise for Oracle.

>
> Oracle are aware of the changes:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E 
>

Zhe responded that BUH is used, but judging from the archive, the 
discussion seems to have ended without a proper conclusion.

>
> Oracle do not track Jena versions.
> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>
> I do know that the complexities arising in Jena lead to costs for 
> storage implementers.  I want to reduce those costs in the long term.

The latter argument is IMHO very weak. There are probably less than 10 
Jena Graph database implementations (SDB, TDB, Oracle etc). They already 
have BUH implementations. Even if 10 more Graph implementations are 
added, it would mean that those 10 developers need to add approximately 
three lines of code:

     public BulkUpdateHandler getBulkUpdateHandler() {
         return new SimpleBulkUpdateHandler(this);
     }

OTOH by removing BulkUpdateHandler, you will see every user of this API 
affected, certainly more than 10. The overhead of adjusting SDB alone 
seems to far outweigh the cost savings (unless my previous observations 
about SDB were incorrect).

BTW I do agree that the number of event listener methods should be 
greatly reduced. Maybe only have notifyAddTriple and notifyAddIterable 
(taking an Iterable instead of a List). I am not 100% sure that only 
having notifyAddTriple would be sufficient for our use cases, so I'd 
rather see one form of bulk event preserved and Iterable seems the most 
generic one.

Thanks,
Holger

History of BulkUpdateHandler changes

Posted by Andy Seaborne <an...@apache.org>.

The Jena project works in public.  The history of the discussions for 
BulkUpdateHandler and SDB are in various public archives.

I would like to see acknowledgement of prior discussions and the 
intentions behind the changes.

We made the graph-level bulk update handler change at 2.10.0 and we've 
had 2.10.1 since then.

There was a message on the users list Nov 2012

http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E

and the dev list a year ago:

http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E

Oracle are aware of the changes:
http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E

Oracle do not track Jena versions.
Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm

I do know that the complexities arising in Jena lead to costs for 
storage implementers.  I want to reduce those costs in the long term.

	Andy

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Claude Warren <cl...@xenei.com>.

As I recall the discuss around this topic dealt with the idea that you
could add each triple inside a transaction and when the transaction
committed transaction code would do the bulk update if supported.  However
I may be way off base here.  I have no objection to retaining the BUH.

Claude


On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <ho...@knublauch.com>wrote:

> Hi group,
>
> I did not see any response to my question below, which is usual for this
> list where responses are usually fast and competent. As this is quite a
> crucial issue for our upgrade right now, I would like to ask again, and
> rephrase my question. I understand SDB is rather unsupported, but the issue
> is really a question on the core API.
>
> Deprecating the BulkUpdateHandler will not only affect SDB but any other
> database such as Oracle RDF (the Jena adapter of which implements its own
> BUH right now). Granted, the class is not gone yet, but some existing API
> calls (Model.add) already bypass the BulkUpdateHandler, and I believe this
> was premature (revision 1419595). My suggestion is to continue to delegate
> Model.add through the BulkUpdateHandler for the upcoming release until the
> interface has been truly removed/replaced with something else. BUH does not
> represent much implementation overhead for Graph implementers, because they
> can simply use the default implementation. The current implementation is
> too inefficient for our product.
>
> If there is a cleaner mechanism to get the same performance, then I'd be
> happy to hear about it.
>
> Thanks
> Holger
>
>
>
> On 8/29/2013 9:39, Holger Knublauch wrote:
>
>> SDB currently implements its own BulkUpdateHandler, and I just made some
>> tests that indicate that it is significantly faster than using
>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>> BulkUpdateHandler has been deprecated, and Model.add is already using
>> GraphUtil.add, what call sequence are we supposed to use to retain the good
>> performance of the BulkUpdateHandler? Could a method
>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>> behavior for specific Graph types?
>>
>> Thanks
>> Holger
>>
>>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Impact on deprecation of BulkUpdateHandler on SDB

Posted by Holger Knublauch <ho...@knublauch.com>.

Hi group,

I did not see any response to my question below, which is usual for this 
list where responses are usually fast and competent. As this is quite a 
crucial issue for our upgrade right now, I would like to ask again, and 
rephrase my question. I understand SDB is rather unsupported, but the 
issue is really a question on the core API.

Deprecating the BulkUpdateHandler will not only affect SDB but any other 
database such as Oracle RDF (the Jena adapter of which implements its 
own BUH right now). Granted, the class is not gone yet, but some 
existing API calls (Model.add) already bypass the BulkUpdateHandler, and 
I believe this was premature (revision 1419595). My suggestion is to 
continue to delegate Model.add through the BulkUpdateHandler for the 
upcoming release until the interface has been truly removed/replaced 
with something else. BUH does not represent much implementation overhead 
for Graph implementers, because they can simply use the default 
implementation. The current implementation is too inefficient for our 
product.

If there is a cleaner mechanism to get the same performance, then I'd be 
happy to hear about it.

Thanks
Holger

On 8/29/2013 9:39, Holger Knublauch wrote:
> SDB currently implements its own BulkUpdateHandler, and I just made 
> some tests that indicate that it is significantly faster than using 
> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that 
> BulkUpdateHandler has been deprecated, and Model.add is already using 
> GraphUtil.add, what call sequence are we supposed to use to retain the 
> good performance of the BulkUpdateHandler? Could a method 
> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the 
> behavior for specific Graph types?
>
> Thanks
> Holger
>