You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2013/08/29 01:39:57 UTC
Impact on deprecation of BulkUpdateHandler on SDB
SDB currently implements its own BulkUpdateHandler, and I just made some
tests that indicate that it is significantly faster than using
GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
BulkUpdateHandler has been deprecated, and Model.add is already using
GraphUtil.add, what call sequence are we supposed to use to retain the
good performance of the BulkUpdateHandler? Could a method
Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
behavior for specific Graph types?
Thanks
Holger
Re: SPARQL Query to retrieve a particular RdfId
Posted by Charles Li <ch...@gmail.com>.
Never mind, figured out - str() function.
Sorry for bothering!
Thanks!
- Charles
On Aug 28, 2013, at 6:32 PM, Charles Li <ch...@gmail.com> wrote:
> I want to query a Jena model loaded by TDB from an RDF/XML file to get all subjects whose objects each contains one particular RDFID. I tried the following SPARQL query and it didn't return any result.
>
> select ?s WHERE
> {
> ?s ?p ?o .
> FILTER(CONTAINS(?o, "_{9F750A5B-F02E-4B64-8D78-D0F527ACB900}>"))
> }
>
> I guess ?o is a Resource, and there should be some SPARQL function to apply to "?o" first to extract the RDFID string and then do the "CONTAINS" function, but I just couldn't find such a function.
>
> Please help!
>
> Thanks a lot!!
> - Charles
>
SPARQL Query to retrieve a particular RdfId
Posted by Charles Li <ch...@gmail.com>.
I want to query a Jena model loaded by TDB from an RDF/XML file to get all subjects whose objects each contains one particular RDFID. I tried the following SPARQL query and it didn't return any result.
select ?s WHERE
{
?s ?p ?o .
FILTER(CONTAINS(?o, "_{9F750A5B-F02E-4B64-8D78-D0F527ACB900}>"))
}
I guess ?o is a Resource, and there should be some SPARQL function to apply to "?o" first to extract the RDFID string and then do the "CONTAINS" function, but I just couldn't find such a function.
Please help!
Thanks a lot!!
- Charles
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Holger Knublauch <ho...@knublauch.com>.
Hi Andy,
before I contacted this list I was doing some background reading to try
to figure out why the BulkUpdateHandler had been deleted, but you know
how difficult this can be by searching through mailing list archives.
And there is just too much traffic to stay up to date on a daily basis.
As a constructive advice, it would be good if the deprecation of BUH
would have been properly documented in the source code. Right now it
just states "Bulk update operations are going to be removed" without
hinting at a proper replacement. My original question was exactly on
what will replace it, and it seems like the TransactionHandlers are now
responsible for it.
I did some tests but at least for SDB this pattern does not seem to be
working as it should.
The example operation is to insert 10k triples into an SDB:
List<Triple> triples = new LinkedList<Triple>();
for(int i = 0; i < 10000; i++) {
triples.add(Triple.create(OWL.Thing.asNode(),
RDFS.seeAlso.asNode(), NodeFactory.createLiteral(""
+ i)));
}
Wrapping the add with a TransactionHandler.begin/commit takes 40 seconds
(sdb is a GraphSDB):
{
sdb.getTransactionHandler().begin();
GraphUtil.add(sdb, triples);
sdb.getTransactionHandler().commit();
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(),
Node.ANY).toList();
}
While the SDB-specific trick with the event manager takes 1 second:
{
sdb.getEventManager().notifyEvent(sdb, GraphEvents.startRead);
GraphUtil.add(sdb, triples);
sdb.getEventManager().notifyEvent(sdb, GraphEvents.finishRead);
sdb.find(OWL.Thing.asNode(), RDFS.seeAlso.asNode(),
Node.ANY).toList();
}
(The find at the end is there to verify that the triples are immediately
available after write, and not delayed by some thread in the background).
The latter solution (above) uses the same call sequence as SDB's
BulkUpdateHandler, which seems to use some background thread with a
queue to do the actual writing:
store.getLoader().startBulkUpdate();
...
store.getLoader().flushTriples();
while the TransactionHandlerSDB does the following
sqlConnection.setAutoCommit(false) ;
...
sqlConnection.commit() ;
sqlConnection.setAutoCommit(true) ;
So the two approaches are very different, with the current
implementation of TransactionHandlerSDB in my tests much less efficient
than the BulkUpdateHandler. Obviously I would prefer to call the
BulkUpdateHandler mechanism until this has been resolved (or shown to be
my mistake).
Further comment below...
On 9/4/2013 23:22, Andy Seaborne wrote:
> Current documentation:
>
> http://jena.apache.org/documentation/sdb/loading_data.html
>
> which includes:
>
> model.notifyEvent(GraphEvents.startRead)
> ... do add/remove operations ...
> model.notifyEvent(GraphEvents.finishRead)
This design pattern looks weak to me and it looks weird to call a
startRead to initiate a write. We furthermore cannot rely on some
SDB-specific code as our code shall also work with other back-ends. So
I'd much rather call begin/commit using transactions.
>>> TopQuadrant have already released recently.
Yes and this is why we are having a breather now to finally catch up
with the new Jena version. Such low-level, risky changes should be done
in the beginning of a life cycle.
>>> Currently, as SDB patches come in, I personally try to find time to
>>> apply
>>> them. I don't have an SDB test system and certainly do not have
>>> setups of
>>> each of the databases with SDB adapters.
>>>
>>> Such time is my time - my employer doesn't use SDB.
I hear your frustration, and as always your contributions are greatly
appreciated. Rest assured that I also have other things to do than
tracking down and adjusting our code to changes between Jena versions.
So far I have spent one week on this bulk update issue alone. We do have
90 usages of the BulkUpdateHandler in our code, and who knows what other
side effects the migration into Transactions will have. So what seems
like a good simplification to the API from the perspective of the Jena
developers also has downsides to users with a very large, six-year-old
code base like ours.
Thanks
Holger
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/13 12:06, Claude Warren wrote:
> Is the recommended migration path to do the following instead of the bulk
> update:
>
> Start a transaction
> Insert each triple
> Commit transaction
>
> With the assumption that the underlying transaction implementation will
> batch the update to the storage layer?
>
> Claude
Current documentation:
http://jena.apache.org/documentation/sdb/loading_data.html
which includes:
model.notifyEvent(GraphEvents.startRead)
... do add/remove operations ...
model.notifyEvent(GraphEvents.finishRead)
but as I am trying to point out, transactions capture a logical unit of
change in an application.
Andy
>
>
>
> On Wed, Sep 4, 2013 at 10:14 AM, Andy Seaborne <an...@apache.org> wrote:
>
>> It has to be the applications responsibility to add transaction boundaries.
>>
>> * JDBC connection re often controlled by the environment such as pooling.
>>
>> * The application may already be in a transaction.
>>
>> * Transactions are logical grouping of actions.
>> Model operations are not application logical boundaries.
>> The application will want a group of model operations in
>> one transaction.
>>
>> * Nested transactions are rarely supported.
>>
>> Aside from everything else, TopQuadrant should be using transactions.
>> Auto-commit could easily explain 40s. We can't tell - it's closed source.
>>
>> TopQuadrant can call the SDB controls. Jnea is open source - see the code
>> starting at UpdateHandlerSDB.
>>
>> -- from GraphSDB --
>>
>> store.getLoader().**startBulkUpdate() ;
>> .... do stuff ...
>> store.getLoader().**startBulkUpdate() ;
>>
>> --
>>
>>>> As this is quite a
>>>> crucial issue for our upgrade right now,
>>
>> TopQuadrant have already released recently.
>> https://groups.google.com/d/**msg/topbraid-users/**
>> El9BbRV6wk4/ac-602N2emoJ<https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ>
>>
>> Re: SDB
>>
>> The position of SDB has been discussed:
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
>> mbox/%3C519BA513.1030403%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E>
>>
>>
>>>> we could gain time for a proper redesign
>> I'll detail the history of BulkUpdateHandler in a separate message.
>>
>> This message is relevant:
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
>> mbox/%3C51A5678A.5070607@**knublauch.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E>
>>
>> where Holger writes:
>> [[ Subject: [DISCUSS] SDB future
>> Having to rely on commercial alternatives would affect
>> the overall price tag of our solutions, and having an open source
>> solution that seamlessly works with Jena is a great asset.
>>
>> I have brought up the topic of this thread in our management to see
>> whether we can allocate any resources to its future, but I cannot report
>> any decision at this stage.
>> ]]
>>
>> My suggestion is a add a Graph that does batching.
>>
>> In fact, my advocacy is that all additonal functionality (events,
>> batching, read-only, security, ...) are done by graph-wrapping AKA
>> implementation inheritance.
>>
>> Transaction should be consolidated on the dataset interface.
>>
>> --
>>
>> Currently, as SDB patches come in, I personally try to find time to apply
>> them. I don't have an SDB test system and certainly do not have setups of
>> each of the databases with SDB adapters.
>>
>> Such time is my time - my employer doesn't use SDB.
>>
>> Andy
>>
>>
>>
>> On 04/09/13 01:32, Holger Knublauch wrote:
>>
>>> On 9/4/2013 3:15, Claude Warren wrote:
>>>
>>>> As I recall the discuss around this topic dealt with the idea that you
>>>> could add each triple inside a transaction and when the transaction
>>>> committed transaction code would do the bulk update if supported.
>>>>
>>>
>>> If this were the case, then the code in the GraphUtil helper functions
>>> should probably wrap the individual performUpdate calls with a
>>> transaction, but they don't.
>>>
>>> However
>>>> I may be way off base here. I have no objection to retaining the BUH.
>>>>
>>>
>>> I would greatly appreciate seeing this resolved before the final
>>> release. As suggested earlier, we could gain time for a proper redesign
>>> by avoiding the calls to the GraphUtil replacement functions, or
>>> changing those functions so that they call graph.getBulkUpdateHandler()
>>> for the time being, and possibly undeprecate BUH for now.
>>>
>>> Thanks,
>>> Holger
>>>
>>>
>>>> Claude
>>>>
>>>>
>>>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>>>> <ho...@knublauch.com>wrote:
>>>>
>>>> Hi group,
>>>>>
>>>>> I did not see any response to my question below, which is usual for this
>>>>> list where responses are usually fast and competent. As this is quite a
>>>>> crucial issue for our upgrade right now, I would like to ask again, and
>>>>> rephrase my question. I understand SDB is rather unsupported, but the
>>>>> issue
>>>>> is really a question on the core API.
>>>>>
>>>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>>>> database such as Oracle RDF (the Jena adapter of which implements its
>>>>> own
>>>>> BUH right now). Granted, the class is not gone yet, but some existing
>>>>> API
>>>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>>>> this
>>>>> was premature (revision 1419595). My suggestion is to continue to
>>>>> delegate
>>>>> Model.add through the BulkUpdateHandler for the upcoming release
>>>>> until the
>>>>> interface has been truly removed/replaced with something else. BUH
>>>>> does not
>>>>> represent much implementation overhead for Graph implementers,
>>>>> because they
>>>>> can simply use the default implementation. The current implementation is
>>>>> too inefficient for our product.
>>>>>
>>>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>>>> happy to hear about it.
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>>
>>>>>
>>>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>>>
>>>>> SDB currently implements its own BulkUpdateHandler, and I just made
>>>>>> some
>>>>>> tests that indicate that it is significantly faster than using
>>>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>>>> the good
>>>>>> performance of the BulkUpdateHandler? Could a method
>>>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>>>> behavior for specific Graph types?
>>>>>>
>>>>>> Thanks
>>>>>> Holger
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Claude Warren <cl...@xenei.com>.
Is the recommended migration path to do the following instead of the bulk
update:
Start a transaction
Insert each triple
Commit transaction
With the assumption that the underlying transaction implementation will
batch the update to the storage layer?
Claude
On Wed, Sep 4, 2013 at 10:14 AM, Andy Seaborne <an...@apache.org> wrote:
> It has to be the applications responsibility to add transaction boundaries.
>
> * JDBC connection re often controlled by the environment such as pooling.
>
> * The application may already be in a transaction.
>
> * Transactions are logical grouping of actions.
> Model operations are not application logical boundaries.
> The application will want a group of model operations in
> one transaction.
>
> * Nested transactions are rarely supported.
>
> Aside from everything else, TopQuadrant should be using transactions.
> Auto-commit could easily explain 40s. We can't tell - it's closed source.
>
> TopQuadrant can call the SDB controls. Jnea is open source - see the code
> starting at UpdateHandlerSDB.
>
> -- from GraphSDB --
>
> store.getLoader().**startBulkUpdate() ;
> .... do stuff ...
> store.getLoader().**startBulkUpdate() ;
>
> --
>
> >> As this is quite a
> >> crucial issue for our upgrade right now,
>
> TopQuadrant have already released recently.
> https://groups.google.com/d/**msg/topbraid-users/**
> El9BbRV6wk4/ac-602N2emoJ<https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ>
>
> Re: SDB
>
> The position of SDB has been discussed:
> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
> mbox/%3C519BA513.1030403%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E>
>
>
> >> we could gain time for a proper redesign
> I'll detail the history of BulkUpdateHandler in a separate message.
>
> This message is relevant:
> http://mail-archives.apache.**org/mod_mbox/jena-dev/201305.**
> mbox/%3C51A5678A.5070607@**knublauch.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E>
>
> where Holger writes:
> [[ Subject: [DISCUSS] SDB future
> Having to rely on commercial alternatives would affect
> the overall price tag of our solutions, and having an open source
> solution that seamlessly works with Jena is a great asset.
>
> I have brought up the topic of this thread in our management to see
> whether we can allocate any resources to its future, but I cannot report
> any decision at this stage.
> ]]
>
> My suggestion is a add a Graph that does batching.
>
> In fact, my advocacy is that all additonal functionality (events,
> batching, read-only, security, ...) are done by graph-wrapping AKA
> implementation inheritance.
>
> Transaction should be consolidated on the dataset interface.
>
> --
>
> Currently, as SDB patches come in, I personally try to find time to apply
> them. I don't have an SDB test system and certainly do not have setups of
> each of the databases with SDB adapters.
>
> Such time is my time - my employer doesn't use SDB.
>
> Andy
>
>
>
> On 04/09/13 01:32, Holger Knublauch wrote:
>
>> On 9/4/2013 3:15, Claude Warren wrote:
>>
>>> As I recall the discuss around this topic dealt with the idea that you
>>> could add each triple inside a transaction and when the transaction
>>> committed transaction code would do the bulk update if supported.
>>>
>>
>> If this were the case, then the code in the GraphUtil helper functions
>> should probably wrap the individual performUpdate calls with a
>> transaction, but they don't.
>>
>> However
>>> I may be way off base here. I have no objection to retaining the BUH.
>>>
>>
>> I would greatly appreciate seeing this resolved before the final
>> release. As suggested earlier, we could gain time for a proper redesign
>> by avoiding the calls to the GraphUtil replacement functions, or
>> changing those functions so that they call graph.getBulkUpdateHandler()
>> for the time being, and possibly undeprecate BUH for now.
>>
>> Thanks,
>> Holger
>>
>>
>>> Claude
>>>
>>>
>>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>>> <ho...@knublauch.com>wrote:
>>>
>>> Hi group,
>>>>
>>>> I did not see any response to my question below, which is usual for this
>>>> list where responses are usually fast and competent. As this is quite a
>>>> crucial issue for our upgrade right now, I would like to ask again, and
>>>> rephrase my question. I understand SDB is rather unsupported, but the
>>>> issue
>>>> is really a question on the core API.
>>>>
>>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>>> database such as Oracle RDF (the Jena adapter of which implements its
>>>> own
>>>> BUH right now). Granted, the class is not gone yet, but some existing
>>>> API
>>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>>> this
>>>> was premature (revision 1419595). My suggestion is to continue to
>>>> delegate
>>>> Model.add through the BulkUpdateHandler for the upcoming release
>>>> until the
>>>> interface has been truly removed/replaced with something else. BUH
>>>> does not
>>>> represent much implementation overhead for Graph implementers,
>>>> because they
>>>> can simply use the default implementation. The current implementation is
>>>> too inefficient for our product.
>>>>
>>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>>> happy to hear about it.
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>>>
>>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>>
>>>> SDB currently implements its own BulkUpdateHandler, and I just made
>>>>> some
>>>>> tests that indicate that it is significantly faster than using
>>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>>> the good
>>>>> performance of the BulkUpdateHandler? Could a method
>>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>>> behavior for specific Graph types?
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>>
>>>>>
>>>
>>
>
--
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Andy Seaborne <an...@apache.org>.
It has to be the applications responsibility to add transaction boundaries.
* JDBC connection re often controlled by the environment such as pooling.
* The application may already be in a transaction.
* Transactions are logical grouping of actions.
Model operations are not application logical boundaries.
The application will want a group of model operations in
one transaction.
* Nested transactions are rarely supported.
Aside from everything else, TopQuadrant should be using transactions.
Auto-commit could easily explain 40s. We can't tell - it's closed source.
TopQuadrant can call the SDB controls. Jnea is open source - see the
code starting at UpdateHandlerSDB.
-- from GraphSDB --
store.getLoader().startBulkUpdate() ;
.... do stuff ...
store.getLoader().startBulkUpdate() ;
--
>> As this is quite a
>> crucial issue for our upgrade right now,
TopQuadrant have already released recently.
https://groups.google.com/d/msg/topbraid-users/El9BbRV6wk4/ac-602N2emoJ
Re: SDB
The position of SDB has been discussed:
http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C519BA513.1030403%40apache.org%3E
>> we could gain time for a proper redesign
I'll detail the history of BulkUpdateHandler in a separate message.
This message is relevant:
http://mail-archives.apache.org/mod_mbox/jena-dev/201305.mbox/%3C51A5678A.5070607@knublauch.com%3E
where Holger writes:
[[ Subject: [DISCUSS] SDB future
Having to rely on commercial alternatives would affect
the overall price tag of our solutions, and having an open source
solution that seamlessly works with Jena is a great asset.
I have brought up the topic of this thread in our management to see
whether we can allocate any resources to its future, but I cannot report
any decision at this stage.
]]
My suggestion is a add a Graph that does batching.
In fact, my advocacy is that all additonal functionality (events,
batching, read-only, security, ...) are done by graph-wrapping AKA
implementation inheritance.
Transaction should be consolidated on the dataset interface.
--
Currently, as SDB patches come in, I personally try to find time to
apply them. I don't have an SDB test system and certainly do not have
setups of each of the databases with SDB adapters.
Such time is my time - my employer doesn't use SDB.
Andy
On 04/09/13 01:32, Holger Knublauch wrote:
> On 9/4/2013 3:15, Claude Warren wrote:
>> As I recall the discuss around this topic dealt with the idea that you
>> could add each triple inside a transaction and when the transaction
>> committed transaction code would do the bulk update if supported.
>
> If this were the case, then the code in the GraphUtil helper functions
> should probably wrap the individual performUpdate calls with a
> transaction, but they don't.
>
>> However
>> I may be way off base here. I have no objection to retaining the BUH.
>
> I would greatly appreciate seeing this resolved before the final
> release. As suggested earlier, we could gain time for a proper redesign
> by avoiding the calls to the GraphUtil replacement functions, or
> changing those functions so that they call graph.getBulkUpdateHandler()
> for the time being, and possibly undeprecate BUH for now.
>
> Thanks,
> Holger
>
>>
>> Claude
>>
>>
>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch
>> <ho...@knublauch.com>wrote:
>>
>>> Hi group,
>>>
>>> I did not see any response to my question below, which is usual for this
>>> list where responses are usually fast and competent. As this is quite a
>>> crucial issue for our upgrade right now, I would like to ask again, and
>>> rephrase my question. I understand SDB is rather unsupported, but the
>>> issue
>>> is really a question on the core API.
>>>
>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>> database such as Oracle RDF (the Jena adapter of which implements its
>>> own
>>> BUH right now). Granted, the class is not gone yet, but some existing
>>> API
>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>> this
>>> was premature (revision 1419595). My suggestion is to continue to
>>> delegate
>>> Model.add through the BulkUpdateHandler for the upcoming release
>>> until the
>>> interface has been truly removed/replaced with something else. BUH
>>> does not
>>> represent much implementation overhead for Graph implementers,
>>> because they
>>> can simply use the default implementation. The current implementation is
>>> too inefficient for our product.
>>>
>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>> happy to hear about it.
>>>
>>> Thanks
>>> Holger
>>>
>>>
>>>
>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>
>>>> SDB currently implements its own BulkUpdateHandler, and I just made
>>>> some
>>>> tests that indicate that it is significantly faster than using
>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>> GraphUtil.add, what call sequence are we supposed to use to retain
>>>> the good
>>>> performance of the BulkUpdateHandler? Could a method
>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>> behavior for specific Graph types?
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>
>
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Claude Warren <cl...@xenei.com>.
I opened https://issues.apache.org/jira/browse/JENA-528 for this. Please
add comments there, vote it up and perhaps watch it.
Claude
On Wed, Sep 4, 2013 at 1:32 AM, Holger Knublauch <ho...@knublauch.com>wrote:
> On 9/4/2013 3:15, Claude Warren wrote:
>
>> As I recall the discuss around this topic dealt with the idea that you
>> could add each triple inside a transaction and when the transaction
>> committed transaction code would do the bulk update if supported.
>>
>
> If this were the case, then the code in the GraphUtil helper functions
> should probably wrap the individual performUpdate calls with a transaction,
> but they don't.
>
>
> However
>> I may be way off base here. I have no objection to retaining the BUH.
>>
>
> I would greatly appreciate seeing this resolved before the final release.
> As suggested earlier, we could gain time for a proper redesign by avoiding
> the calls to the GraphUtil replacement functions, or changing those
> functions so that they call graph.getBulkUpdateHandler() for the time
> being, and possibly undeprecate BUH for now.
>
> Thanks,
> Holger
>
>
>
>> Claude
>>
>>
>> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <holger@knublauch.com
>> >wrote:
>>
>> Hi group,
>>>
>>> I did not see any response to my question below, which is usual for this
>>> list where responses are usually fast and competent. As this is quite a
>>> crucial issue for our upgrade right now, I would like to ask again, and
>>> rephrase my question. I understand SDB is rather unsupported, but the
>>> issue
>>> is really a question on the core API.
>>>
>>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>>> database such as Oracle RDF (the Jena adapter of which implements its own
>>> BUH right now). Granted, the class is not gone yet, but some existing API
>>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe
>>> this
>>> was premature (revision 1419595). My suggestion is to continue to
>>> delegate
>>> Model.add through the BulkUpdateHandler for the upcoming release until
>>> the
>>> interface has been truly removed/replaced with something else. BUH does
>>> not
>>> represent much implementation overhead for Graph implementers, because
>>> they
>>> can simply use the default implementation. The current implementation is
>>> too inefficient for our product.
>>>
>>> If there is a cleaner mechanism to get the same performance, then I'd be
>>> happy to hear about it.
>>>
>>> Thanks
>>> Holger
>>>
>>>
>>>
>>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>>
>>> SDB currently implements its own BulkUpdateHandler, and I just made some
>>>> tests that indicate that it is significantly faster than using
>>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>>> GraphUtil.add, what call sequence are we supposed to use to retain the
>>>> good
>>>> performance of the BulkUpdateHandler? Could a method
>>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>>> behavior for specific Graph types?
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>>
>>>>
>>
>
--
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Holger Knublauch <ho...@knublauch.com>.
On 9/4/2013 3:15, Claude Warren wrote:
> As I recall the discuss around this topic dealt with the idea that you
> could add each triple inside a transaction and when the transaction
> committed transaction code would do the bulk update if supported.
If this were the case, then the code in the GraphUtil helper functions
should probably wrap the individual performUpdate calls with a
transaction, but they don't.
> However
> I may be way off base here. I have no objection to retaining the BUH.
I would greatly appreciate seeing this resolved before the final
release. As suggested earlier, we could gain time for a proper redesign
by avoiding the calls to the GraphUtil replacement functions, or
changing those functions so that they call graph.getBulkUpdateHandler()
for the time being, and possibly undeprecate BUH for now.
Thanks,
Holger
>
> Claude
>
>
> On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <ho...@knublauch.com>wrote:
>
>> Hi group,
>>
>> I did not see any response to my question below, which is usual for this
>> list where responses are usually fast and competent. As this is quite a
>> crucial issue for our upgrade right now, I would like to ask again, and
>> rephrase my question. I understand SDB is rather unsupported, but the issue
>> is really a question on the core API.
>>
>> Deprecating the BulkUpdateHandler will not only affect SDB but any other
>> database such as Oracle RDF (the Jena adapter of which implements its own
>> BUH right now). Granted, the class is not gone yet, but some existing API
>> calls (Model.add) already bypass the BulkUpdateHandler, and I believe this
>> was premature (revision 1419595). My suggestion is to continue to delegate
>> Model.add through the BulkUpdateHandler for the upcoming release until the
>> interface has been truly removed/replaced with something else. BUH does not
>> represent much implementation overhead for Graph implementers, because they
>> can simply use the default implementation. The current implementation is
>> too inefficient for our product.
>>
>> If there is a cleaner mechanism to get the same performance, then I'd be
>> happy to hear about it.
>>
>> Thanks
>> Holger
>>
>>
>>
>> On 8/29/2013 9:39, Holger Knublauch wrote:
>>
>>> SDB currently implements its own BulkUpdateHandler, and I just made some
>>> tests that indicate that it is significantly faster than using
>>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>>> BulkUpdateHandler has been deprecated, and Model.add is already using
>>> GraphUtil.add, what call sequence are we supposed to use to retain the good
>>> performance of the BulkUpdateHandler? Could a method
>>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>>> behavior for specific Graph types?
>>>
>>> Thanks
>>> Holger
>>>
>>>
>
Re: History of BulkUpdateHandler changes
Posted by Alan Wu <al...@oracle.com>.
Hi Andy,
Yes. GraphOracleSem and DatasetGraphOracleSem allow addition/deletion of triples and quads, respectively.
Users can choose to commit or rollback transactions at any time. Without committing, one session can of course
see its own changes. But other sessions won't.
We will use http://jena.apache.org in our document, if we haven't already done so :)
Cheers,
Zhe
On 9/4/2013 2:48 PM, Andy Seaborne wrote:
> On 04/09/13 17:02, Alan Wu wrote:
>> Hi Andy,
>>
>> FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top
>> Quadrant's tools and
>> customer's applications.
>>
>> Thanks,
>>
>> Zhe Wu
>> Oracle Spatial and Graph
>
> Zhe,
>
> Thanks for the information. It's beginning to look like transaction boundaries are a significant factor, for example, deleting and adding triples as a signal ACID action. How's that handled? Via oracle.spatial.rdf.client.jena.GraphOracleSem?
>
> Andy
>
> PS Hopefully the docuemnt will change http://jena.sourceforge.net/ to http://jena.apache.org/ as well :-) Your lawyers can advise on the legal side.
>
>>
>> On 9/4/2013 2:15 AM, Andy Seaborne wrote:
>>> The Jena project works in public. The history of the discussions for
>>> BulkUpdateHandler and SDB are in various public archives.
>>>
>>> I would like to see acknowledgement of prior discussions and the
>>> intentions behind the changes.
>>>
>>> We made the graph-level bulk update handler change at 2.10.0 and we've
>>> had 2.10.1 since then.
>>>
>>> There was a message on the users list Nov 2012
>>>
>>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>>>
>>>
>>> and the dev list a year ago:
>>>
>>> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>>>
>>>
>>> Oracle are aware of the changes:
>>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>>>
>>>
>>> Oracle do not track Jena versions.
>>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>>> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>>>
>>> I do know that the complexities arising in Jena lead to costs for
>>> storage implementers. I want to reduce those costs in the long term.
>>>
>>> Andy
>>
>
Re: History of BulkUpdateHandler changes
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/13 17:02, Alan Wu wrote:
> Hi Andy,
>
> FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top
> Quadrant's tools and
> customer's applications.
>
> Thanks,
>
> Zhe Wu
> Oracle Spatial and Graph
Zhe,
Thanks for the information. It's beginning to look like transaction
boundaries are a significant factor, for example, deleting and adding
triples as a signal ACID action. How's that handled? Via
oracle.spatial.rdf.client.jena.GraphOracleSem?
Andy
PS Hopefully the docuemnt will change http://jena.sourceforge.net/ to
http://jena.apache.org/ as well :-) Your lawyers can advise on the
legal side.
>
> On 9/4/2013 2:15 AM, Andy Seaborne wrote:
>> The Jena project works in public. The history of the discussions for
>> BulkUpdateHandler and SDB are in various public archives.
>>
>> I would like to see acknowledgement of prior discussions and the
>> intentions behind the changes.
>>
>> We made the graph-level bulk update handler change at 2.10.0 and we've
>> had 2.10.1 since then.
>>
>> There was a message on the users list Nov 2012
>>
>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>>
>>
>> and the dev list a year ago:
>>
>> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>>
>>
>> Oracle are aware of the changes:
>> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>>
>>
>> Oracle do not track Jena versions.
>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>>
>> I do know that the complexities arising in Jena lead to costs for
>> storage implementers. I want to reduce those costs in the long term.
>>
>> Andy
>
Re: History of BulkUpdateHandler changes
Posted by Alan Wu <al...@oracle.com>.
Hi Andy,
FYI, Oracle has recently moved to Apache Jena 2.7.2 to align with Top Quadrant's tools and
customer's applications.
Thanks,
Zhe Wu
Oracle Spatial and Graph
On 9/4/2013 2:15 AM, Andy Seaborne wrote:
> The Jena project works in public. The history of the discussions for BulkUpdateHandler and SDB are in various public archives.
>
> I would like to see acknowledgement of prior discussions and the intentions behind the changes.
>
> We made the graph-level bulk update handler change at 2.10.0 and we've had 2.10.1 since then.
>
> There was a message on the users list Nov 2012
>
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>
> and the dev list a year ago:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>
> Oracle are aware of the changes:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>
> Oracle do not track Jena versions.
> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>
> I do know that the complexities arising in Jena lead to costs for storage implementers. I want to reduce those costs in the long term.
>
> Andy
Re: History of BulkUpdateHandler changes
Posted by Holger Knublauch <ho...@knublauch.com>.
Hi Claude,
yes this may work, but I don't think SDB's TransactionHandler does this right now. I am not (at all) familiar with SDB, but my observations as described in a parallel thread indicate that the SDB BulkUpdateHandler has the best performance. My concern is that the work needed to bring SDB up to speed outweighs the potential benefits of simplifying the API.
Thanks,
Holger
On Sep 13, 2013, at 4:29 PM, Claude Warren wrote:
> Holger,
>
> Would it not make sense for the TransactionHandler to track all the updates
> and delete that occur within a transaction and submit them to the
> underlying db in blocks while calling the listener methods on the graph at
> commit? Does this provide the path you are looking for to keep bulk
> updates?
>
> Claude
>
>
> On Thu, Sep 5, 2013 at 5:50 AM, Holger Knublauch <ho...@knublauch.com>wrote:
>
>> Hi Andy,
>>
>> thanks for pointing at the old discussions. Reading through them, I notice
>> that TopQuadrant should have responded earlier. I don't know whether I
>> actually noticed this email, or whether I didn't understand the
>> implications at the time, or whether tracking the low level details of Jena
>> was outside of my responsibility at the time. In either case it was an
>> oversight and I would like to give my input, albeit late.
>>
>> On 9/4/2013 19:15, Andy Seaborne wrote:
>>
>>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>>> 201211.mbox/%3C50B660D4.**6070306%40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E>
>>>
>>
>>> "[the BulkUpdateHandler] is not used"
>>
>> This is not correct as SDB and OracleRDF are using it, possibly others.
>>
>>
>>
>> and the dev list a year ago:
>>>
>>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201209.**
>>> mbox/%3C5044E9F3.8060705%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E>
>>>
>>
>> Remove BulkUpdateHandler interface
>>>
>> Migrate it's few useful operation to Graph.
>>
>>
>> Yes, migrating the useful operations to Graph would IMHO have made sense,
>> but this has not happened yet - instead the suggestion is to use
>> transactions.
>>
>>
>>> UpdateHandlerSDB / A few of it's operations are useful but most turn
>> into nothing but loops to call add(Triple)/delete(Triple).
>>
>> The SDB implementation is very useful and makes significant performance
>> differences. I assume likewise for Oracle.
>>
>>
>>
>>> Oracle are aware of the changes:
>>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>>> 201211.mbox/%3C50B688D8.**9040600%40oracle.com%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E>
>>>
>>
>> Zhe responded that BUH is used, but judging from the archive, the
>> discussion seems to have ended without a proper conclusion.
>>
>>
>>
>>> Oracle do not track Jena versions.
>>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>>> http://docs.oracle.com/cd/**E18283_01/appdev.112/e11828/**sem_jena.htm<http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm>
>>>
>>> I do know that the complexities arising in Jena lead to costs for storage
>>> implementers. I want to reduce those costs in the long term.
>>>
>>
>> The latter argument is IMHO very weak. There are probably less than 10
>> Jena Graph database implementations (SDB, TDB, Oracle etc). They already
>> have BUH implementations. Even if 10 more Graph implementations are added,
>> it would mean that those 10 developers need to add approximately three
>> lines of code:
>>
>> public BulkUpdateHandler getBulkUpdateHandler() {
>> return new SimpleBulkUpdateHandler(this);
>> }
>>
>> OTOH by removing BulkUpdateHandler, you will see every user of this API
>> affected, certainly more than 10. The overhead of adjusting SDB alone seems
>> to far outweigh the cost savings (unless my previous observations about SDB
>> were incorrect).
>>
>> BTW I do agree that the number of event listener methods should be greatly
>> reduced. Maybe only have notifyAddTriple and notifyAddIterable (taking an
>> Iterable instead of a List). I am not 100% sure that only having
>> notifyAddTriple would be sufficient for our use cases, so I'd rather see
>> one form of bulk event preserved and Iterable seems the most generic one.
>>
>> Thanks,
>> Holger
>>
>>
>
>
> --
> I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
Re: History of BulkUpdateHandler changes
Posted by Claude Warren <cl...@xenei.com>.
Holger,
Would it not make sense for the TransactionHandler to track all the updates
and delete that occur within a transaction and submit them to the
underlying db in blocks while calling the listener methods on the graph at
commit? Does this provide the path you are looking for to keep bulk
updates?
Claude
On Thu, Sep 5, 2013 at 5:50 AM, Holger Knublauch <ho...@knublauch.com>wrote:
> Hi Andy,
>
> thanks for pointing at the old discussions. Reading through them, I notice
> that TopQuadrant should have responded earlier. I don't know whether I
> actually noticed this email, or whether I didn't understand the
> implications at the time, or whether tracking the low level details of Jena
> was outside of my responsibility at the time. In either case it was an
> oversight and I would like to give my input, albeit late.
>
> On 9/4/2013 19:15, Andy Seaborne wrote:
>
>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>> 201211.mbox/%3C50B660D4.**6070306%40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E>
>>
>
> > "[the BulkUpdateHandler] is not used"
>
> This is not correct as SDB and OracleRDF are using it, possibly others.
>
>
>
> and the dev list a year ago:
>>
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201209.**
>> mbox/%3C5044E9F3.8060705%**40apache.org%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E>
>>
>
> Remove BulkUpdateHandler interface
>>
> Migrate it's few useful operation to Graph.
>
>
> Yes, migrating the useful operations to Graph would IMHO have made sense,
> but this has not happened yet - instead the suggestion is to use
> transactions.
>
>
> > UpdateHandlerSDB / A few of it's operations are useful but most turn
> into nothing but loops to call add(Triple)/delete(Triple).
>
> The SDB implementation is very useful and makes significant performance
> differences. I assume likewise for Oracle.
>
>
>
>> Oracle are aware of the changes:
>> http://mail-archives.apache.**org/mod_mbox/jena-users/**
>> 201211.mbox/%3C50B688D8.**9040600%40oracle.com%3E<http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E>
>>
>
> Zhe responded that BUH is used, but judging from the archive, the
> discussion seems to have ended without a proper conclusion.
>
>
>
>> Oracle do not track Jena versions.
>> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
>> http://docs.oracle.com/cd/**E18283_01/appdev.112/e11828/**sem_jena.htm<http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm>
>>
>> I do know that the complexities arising in Jena lead to costs for storage
>> implementers. I want to reduce those costs in the long term.
>>
>
> The latter argument is IMHO very weak. There are probably less than 10
> Jena Graph database implementations (SDB, TDB, Oracle etc). They already
> have BUH implementations. Even if 10 more Graph implementations are added,
> it would mean that those 10 developers need to add approximately three
> lines of code:
>
> public BulkUpdateHandler getBulkUpdateHandler() {
> return new SimpleBulkUpdateHandler(this);
> }
>
> OTOH by removing BulkUpdateHandler, you will see every user of this API
> affected, certainly more than 10. The overhead of adjusting SDB alone seems
> to far outweigh the cost savings (unless my previous observations about SDB
> were incorrect).
>
> BTW I do agree that the number of event listener methods should be greatly
> reduced. Maybe only have notifyAddTriple and notifyAddIterable (taking an
> Iterable instead of a List). I am not 100% sure that only having
> notifyAddTriple would be sufficient for our use cases, so I'd rather see
> one form of bulk event preserved and Iterable seems the most generic one.
>
> Thanks,
> Holger
>
>
--
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Re: History of BulkUpdateHandler changes
Posted by Holger Knublauch <ho...@knublauch.com>.
Hi Andy,
thanks for pointing at the old discussions. Reading through them, I
notice that TopQuadrant should have responded earlier. I don't know
whether I actually noticed this email, or whether I didn't understand
the implications at the time, or whether tracking the low level details
of Jena was outside of my responsibility at the time. In either case it
was an oversight and I would like to give my input, albeit late.
On 9/4/2013 19:15, Andy Seaborne wrote:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
>
> "[the BulkUpdateHandler] is not used"
This is not correct as SDB and OracleRDF are using it, possibly others.
> and the dev list a year ago:
>
> http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
>
> Remove BulkUpdateHandler interface
Migrate it's few useful operation to Graph.
Yes, migrating the useful operations to Graph would IMHO have made
sense, but this has not happened yet - instead the suggestion is to use
transactions.
> UpdateHandlerSDB / A few of it's operations are useful but most turn
into nothing but loops to call add(Triple)/delete(Triple).
The SDB implementation is very useful and makes significant performance
differences. I assume likewise for Oracle.
>
> Oracle are aware of the changes:
> http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
>
Zhe responded that BUH is used, but judging from the archive, the
discussion seems to have ended without a proper conclusion.
>
> Oracle do not track Jena versions.
> Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
> http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
>
> I do know that the complexities arising in Jena lead to costs for
> storage implementers. I want to reduce those costs in the long term.
The latter argument is IMHO very weak. There are probably less than 10
Jena Graph database implementations (SDB, TDB, Oracle etc). They already
have BUH implementations. Even if 10 more Graph implementations are
added, it would mean that those 10 developers need to add approximately
three lines of code:
public BulkUpdateHandler getBulkUpdateHandler() {
return new SimpleBulkUpdateHandler(this);
}
OTOH by removing BulkUpdateHandler, you will see every user of this API
affected, certainly more than 10. The overhead of adjusting SDB alone
seems to far outweigh the cost savings (unless my previous observations
about SDB were incorrect).
BTW I do agree that the number of event listener methods should be
greatly reduced. Maybe only have notifyAddTriple and notifyAddIterable
(taking an Iterable instead of a List). I am not 100% sure that only
having notifyAddTriple would be sufficient for our use cases, so I'd
rather see one form of bulk event preserved and Iterable seems the most
generic one.
Thanks,
Holger
History of BulkUpdateHandler changes
Posted by Andy Seaborne <an...@apache.org>.
The Jena project works in public. The history of the discussions for
BulkUpdateHandler and SDB are in various public archives.
I would like to see acknowledgement of prior discussions and the
intentions behind the changes.
We made the graph-level bulk update handler change at 2.10.0 and we've
had 2.10.1 since then.
There was a message on the users list Nov 2012
http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B660D4.6070306%40apache.org%3E
and the dev list a year ago:
http://mail-archives.apache.org/mod_mbox/jena-dev/201209.mbox/%3C5044E9F3.8060705%40apache.org%3E
Oracle are aware of the changes:
http://mail-archives.apache.org/mod_mbox/jena-users/201211.mbox/%3C50B688D8.9040600%40oracle.com%3E
Oracle do not track Jena versions.
Oracle (at least 11g) is for Jena 2.6.2 (2009-10-16)
http://docs.oracle.com/cd/E18283_01/appdev.112/e11828/sem_jena.htm
I do know that the complexities arising in Jena lead to costs for
storage implementers. I want to reduce those costs in the long term.
Andy
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Claude Warren <cl...@xenei.com>.
As I recall the discuss around this topic dealt with the idea that you
could add each triple inside a transaction and when the transaction
committed transaction code would do the bulk update if supported. However
I may be way off base here. I have no objection to retaining the BUH.
Claude
On Tue, Sep 3, 2013 at 12:17 AM, Holger Knublauch <ho...@knublauch.com>wrote:
> Hi group,
>
> I did not see any response to my question below, which is usual for this
> list where responses are usually fast and competent. As this is quite a
> crucial issue for our upgrade right now, I would like to ask again, and
> rephrase my question. I understand SDB is rather unsupported, but the issue
> is really a question on the core API.
>
> Deprecating the BulkUpdateHandler will not only affect SDB but any other
> database such as Oracle RDF (the Jena adapter of which implements its own
> BUH right now). Granted, the class is not gone yet, but some existing API
> calls (Model.add) already bypass the BulkUpdateHandler, and I believe this
> was premature (revision 1419595). My suggestion is to continue to delegate
> Model.add through the BulkUpdateHandler for the upcoming release until the
> interface has been truly removed/replaced with something else. BUH does not
> represent much implementation overhead for Graph implementers, because they
> can simply use the default implementation. The current implementation is
> too inefficient for our product.
>
> If there is a cleaner mechanism to get the same performance, then I'd be
> happy to hear about it.
>
> Thanks
> Holger
>
>
>
> On 8/29/2013 9:39, Holger Knublauch wrote:
>
>> SDB currently implements its own BulkUpdateHandler, and I just made some
>> tests that indicate that it is significantly faster than using
>> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
>> BulkUpdateHandler has been deprecated, and Model.add is already using
>> GraphUtil.add, what call sequence are we supposed to use to retain the good
>> performance of the BulkUpdateHandler? Could a method
>> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
>> behavior for specific Graph types?
>>
>> Thanks
>> Holger
>>
>>
>
--
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Impact on deprecation of BulkUpdateHandler on SDB
Posted by Holger Knublauch <ho...@knublauch.com>.
Hi group,
I did not see any response to my question below, which is usual for this
list where responses are usually fast and competent. As this is quite a
crucial issue for our upgrade right now, I would like to ask again, and
rephrase my question. I understand SDB is rather unsupported, but the
issue is really a question on the core API.
Deprecating the BulkUpdateHandler will not only affect SDB but any other
database such as Oracle RDF (the Jena adapter of which implements its
own BUH right now). Granted, the class is not gone yet, but some
existing API calls (Model.add) already bypass the BulkUpdateHandler, and
I believe this was premature (revision 1419595). My suggestion is to
continue to delegate Model.add through the BulkUpdateHandler for the
upcoming release until the interface has been truly removed/replaced
with something else. BUH does not represent much implementation overhead
for Graph implementers, because they can simply use the default
implementation. The current implementation is too inefficient for our
product.
If there is a cleaner mechanism to get the same performance, then I'd be
happy to hear about it.
Thanks
Holger
On 8/29/2013 9:39, Holger Knublauch wrote:
> SDB currently implements its own BulkUpdateHandler, and I just made
> some tests that indicate that it is significantly faster than using
> GraphUtil.add (2 seconds versus 40 seconds for 10k triples). Now that
> BulkUpdateHandler has been deprecated, and Model.add is already using
> GraphUtil.add, what call sequence are we supposed to use to retain the
> good performance of the BulkUpdateHandler? Could a method
> Graph.add(Iterable<Triple>) be added to allow graphs to optimize the
> behavior for specific Graph types?
>
> Thanks
> Holger
>