You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2012/09/03 19:33:39 UTC
Evolution: BulkUpdateHandler / Reification / QueryHandler
As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
propose we
Remove BulkUpdateHandler interface
Migrate it's few useful operation to Graph.
Start to provide reification with "standard" only.
graph.QueryHandler only used to support reification.
== BulkUpdateHandler
The two implementations I know of are
SimpleBulkUpdateHandler
UpdateHandlerSDB
A few of it's operations are useful but most turn into nothing but loops
to call add(Triple)/delete(Triple).
Event handling details each operation kind but, as far as I can see,
this becomes individual calls to an "addedStatement"/"removedStatement"
at the Model level i.e. the different between adding by array or list or
iterator gets lost.
The useful operations are:
add(Graph)
delete(Graph)
removeAll()
remove(s,p,o)
and the slightly bizarre:
add(Graph, withReifications)
delete(Graph, withReifications)
(see below about reification)
and the less useful (because they don't relate to the way the storage
might properly batch changes - the provider shouldn't decide the batch
boundaries) which turn into add(Triple)/delete(Triple)
add(Triple [])
add( List<Triple>)
add( Iterator<Triple>)
delete(Triple [])
delete( List<Triple>)
delete( Iterator<Triple>)
The only calls to these "add" operations are from ARP which batches it's
changes into units of 1000, but not a whole parser run. As the
SimpleBulkUpdate handler turns these into single calls, nothing gained.
My proposal is that the useful operations are moved to Graph, the code
for the withReifications forms migrate to the only callers in ModelCom.
UpdateHandlerSDB:
This only uses the UpdateHandler interface to wrap the calls in
start/finish bulk update to implicitly increase the scope of bulk
updates. But it isn't
== Reification
The intent is to only support the default standard eventually.
Standard can be provided by code, with no retained state (partial
reificiations). TDB and SDB do not support anything except "standard".
This leads to ....
(graph.)QueryHandler:
It's main use is with reification. I think we can remove it when
reification is replaced by a straight code implications.
Andy
See also JENA-189
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/12 07:29, Claude Warren wrote:
> +1
>
> I have to agree that this is a nice simplification of the jena complexity.
> It would be nice to know why they were created in the first place, just to
> ensure that those issues are accounted for. However, I don't see any
> reason not to do this and several reasons to proceed.
>
> Claude
Good question.
What I want to do is simply and reduce the Graph layer.
Graph/Triple/Node is a key abstraction for extension both downwards
(storage, inference) and upwards (Model, query, client).
I can give my personal, looking-back perspective and remembering I
wasn't there right at the beginning of the Model API.
And we learn - sometimes things looked to be the right thing at the time
but don't always turn out as expected either because a design didn't
work out (internal) or the world has gone in a different direction
(external).
These features here aren't used or are used so little that they create
complexity for an extension and for maintenance with very little benefit.
BulkUpdateHandler falls into the internal category. Batching changes
was obviously important right from the very first database backed
storage layer (before even RDB) because doing in a batch can be cheaper
than doing them one at a time (e.g. JDBC commit around a batch is much
cheaper that a commit for every triple).
BulkUpdateHandler does not meet the needs for that:
1/ The batch size is driven from the client but the correct size is a
matter for the storage if batching matters at all.
2/ It complicates each application to manage the batching when it could
be done once in the graph implementation if it matters. For a library
function, like a parser, to know the right batching is hard and probably
messes up it's API.
Streaming + storage-side internal batching is better.
So keep the operations that have some practical use, for example, adding
Graph.removeAll, and don't put it off to one side. It can still be
overridden.
Reification:
Semweb has moved on and reification is not important - quoting one
triple leaves the issue of grouping of quoted triples together and often
fact-units come in the form of more than one triple. Named graphs are
playing the role for quoted facts - named graph post date reification.
The number of uses of it outside "standard" is very low. "standard" can
be done in code over a store of triples; the other modes "minimal" and
"convenient" need some state to be kept.
http://jena.apache.org/documentation/notes/reification.html#reification-styles
(most of the rest of the documentation remains - the Model API is onyl
affected in that there is only one style).
Keeping the state is an implementation cost and complexity especially
for persistent storage layers. Quite a lot of effort for the RDB layer
went into reification.
So maintain the interface at the Model level - make Graph simpler.
graph.QueryHandler (qQH):
Once up to a time there was RDQL and an RDQL query is, in SPARQL terms,
a basic graph patterns, a filter and a projection and nothing else. qQH
does that. SPARQL is a bit more complicated. qQH isn't the right
building block for SPARQL - it's execution API doesn't extend well into
a larger framework so we have ended up with some duplication.
So remove it. It all goes to making graph simpler - and Graph is a key
abstraction for extension.
Andy
>
> On Mon, Sep 3, 2012 at 6:33 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>> propose we
>>
>> Remove BulkUpdateHandler interface
>> Migrate it's few useful operation to Graph.
>>
>> Start to provide reification with "standard" only.
>> graph.QueryHandler only used to support reification.
>>
>>
>> == BulkUpdateHandler
>>
>> The two implementations I know of are
>>
>> SimpleBulkUpdateHandler
>> UpdateHandlerSDB
>>
>> A few of it's operations are useful but most turn into nothing but loops
>> to call add(Triple)/delete(Triple).
>>
>> Event handling details each operation kind but, as far as I can see, this
>> becomes individual calls to an "addedStatement"/"**removedStatement" at
>> the Model level i.e. the different between adding by array or list or
>> iterator gets lost.
>>
>> The useful operations are:
>> add(Graph)
>> delete(Graph)
>> removeAll()
>> remove(s,p,o)
>>
>> and the slightly bizarre:
>>
>> add(Graph, withReifications)
>> delete(Graph, withReifications)
>>
>> (see below about reification)
>>
>> and the less useful (because they don't relate to the way the storage
>> might properly batch changes - the provider shouldn't decide the batch
>> boundaries) which turn into add(Triple)/delete(Triple)
>>
>> add(Triple [])
>> add( List<Triple>)
>> add( Iterator<Triple>)
>> delete(Triple [])
>> delete( List<Triple>)
>> delete( Iterator<Triple>)
>>
>> The only calls to these "add" operations are from ARP which batches it's
>> changes into units of 1000, but not a whole parser run. As the
>> SimpleBulkUpdate handler turns these into single calls, nothing gained.
>>
>> My proposal is that the useful operations are moved to Graph, the code for
>> the withReifications forms migrate to the only callers in ModelCom.
>>
>> UpdateHandlerSDB:
>>
>> This only uses the UpdateHandler interface to wrap the calls in
>> start/finish bulk update to implicitly increase the scope of bulk updates.
>> But it isn't
>>
>> == Reification
>>
>> The intent is to only support the default standard eventually.
>>
>> Standard can be provided by code, with no retained state (partial
>> reificiations). TDB and SDB do not support anything except "standard".
>>
>> This leads to ....
>>
>> (graph.)QueryHandler:
>> It's main use is with reification. I think we can remove it when
>> reification is replaced by a straight code implications.
>>
>> Andy
>>
>> See also JENA-189
>>
>
>
>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Claude Warren <cl...@xenei.com>.
+1
I have to agree that this is a nice simplification of the jena complexity.
It would be nice to know why they were created in the first place, just to
ensure that those issues are accounted for. However, I don't see any
reason not to do this and several reasons to proceed.
Claude
On Mon, Sep 3, 2012 at 6:33 PM, Andy Seaborne <an...@apache.org> wrote:
> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
> propose we
>
> Remove BulkUpdateHandler interface
> Migrate it's few useful operation to Graph.
>
> Start to provide reification with "standard" only.
> graph.QueryHandler only used to support reification.
>
>
> == BulkUpdateHandler
>
> The two implementations I know of are
>
> SimpleBulkUpdateHandler
> UpdateHandlerSDB
>
> A few of it's operations are useful but most turn into nothing but loops
> to call add(Triple)/delete(Triple).
>
> Event handling details each operation kind but, as far as I can see, this
> becomes individual calls to an "addedStatement"/"**removedStatement" at
> the Model level i.e. the different between adding by array or list or
> iterator gets lost.
>
> The useful operations are:
> add(Graph)
> delete(Graph)
> removeAll()
> remove(s,p,o)
>
> and the slightly bizarre:
>
> add(Graph, withReifications)
> delete(Graph, withReifications)
>
> (see below about reification)
>
> and the less useful (because they don't relate to the way the storage
> might properly batch changes - the provider shouldn't decide the batch
> boundaries) which turn into add(Triple)/delete(Triple)
>
> add(Triple [])
> add( List<Triple>)
> add( Iterator<Triple>)
> delete(Triple [])
> delete( List<Triple>)
> delete( Iterator<Triple>)
>
> The only calls to these "add" operations are from ARP which batches it's
> changes into units of 1000, but not a whole parser run. As the
> SimpleBulkUpdate handler turns these into single calls, nothing gained.
>
> My proposal is that the useful operations are moved to Graph, the code for
> the withReifications forms migrate to the only callers in ModelCom.
>
> UpdateHandlerSDB:
>
> This only uses the UpdateHandler interface to wrap the calls in
> start/finish bulk update to implicitly increase the scope of bulk updates.
> But it isn't
>
> == Reification
>
> The intent is to only support the default standard eventually.
>
> Standard can be provided by code, with no retained state (partial
> reificiations). TDB and SDB do not support anything except "standard".
>
> This leads to ....
>
> (graph.)QueryHandler:
> It's main use is with reification. I think we can remove it when
> reification is replaced by a straight code implications.
>
> Andy
>
> See also JENA-189
>
--
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
Mario - thanks for the details.
graph.QueryHandler is not related to SPARQL execution (and it changed in
May in an incompatible way so we have a fairly good idea no one external
is using it).
Andy
On 06/09/12 10:37, Mario Ds Briggs wrote:
> We do implement a number of the interfaces since we support using DB2 via
> the JENA API,
> the standard XXXFactory and related ones for query execution and then
> resultset handling
> the Dataset related ones - Dataset, DatasetGraph, Transactional,
> GraphStore
>
>>>
> Presumably, you don't do anything in special support of reification as
> it's handled
> <<
> Yes.
>
> Mario
>
>
>
> From: Andy Seaborne <an...@apache.org>
> To: dev@jena.apache.org
> Date: 09/05/2012 07:22 PM
> Subject: Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
> Sent by: Andy Seaborne <an...@gmail.com>
>
>
>
> On 05/09/12 12:37, Mario Ds Briggs wrote:
>> Andy,
>>
>> In DB2 we extend GraphBase and then override some of the methods... so
> just
>> clarifying
>
> That's useful to know. Do you hook into Jena in other ways as well?
>
>>>>
>> My proposal is that the useful operations are moved to Graph, the code
>> for the withReifications forms migrate to the only callers in ModelCom.
>> <<
>>
>> Today when Model.add(Model) is called by end user, the code flows to
>> ModelCom.add(Model)
>> ModelCom.add(Model,boolean)
>> BulkUpdateHandler().add(Graph, boolean)
>>
>> So you are saying that ModelCom will now call Graph.add(Graph) and so as
>> long as one overrides the new Graph.add(Graph) method, ModelCom would
>> invoke it.
>
> Yes - the functionality of Model.add(Model) remains the same but the
> Graph operation migrates.
>
> Presumably, you don't do anything in special support of reification as
> it's handled (if there is anything to do - there isn't in the default
> 'Standard' mode) in GraphBase.
>
> Andy
>
>>
>> thanks
>> Mario
>
>
>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Mario Ds Briggs <ma...@in.ibm.com>.
We do implement a number of the interfaces since we support using DB2 via
the JENA API,
the standard XXXFactory and related ones for query execution and then
resultset handling
the Dataset related ones - Dataset, DatasetGraph, Transactional,
GraphStore
>>
Presumably, you don't do anything in special support of reification as
it's handled
<<
Yes.
Mario
From: Andy Seaborne <an...@apache.org>
To: dev@jena.apache.org
Date: 09/05/2012 07:22 PM
Subject: Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by: Andy Seaborne <an...@gmail.com>
On 05/09/12 12:37, Mario Ds Briggs wrote:
> Andy,
>
> In DB2 we extend GraphBase and then override some of the methods... so
just
> clarifying
That's useful to know. Do you hook into Jena in other ways as well?
>>>
> My proposal is that the useful operations are moved to Graph, the code
> for the withReifications forms migrate to the only callers in ModelCom.
> <<
>
> Today when Model.add(Model) is called by end user, the code flows to
> ModelCom.add(Model)
> ModelCom.add(Model,boolean)
> BulkUpdateHandler().add(Graph, boolean)
>
> So you are saying that ModelCom will now call Graph.add(Graph) and so as
> long as one overrides the new Graph.add(Graph) method, ModelCom would
> invoke it.
Yes - the functionality of Model.add(Model) remains the same but the
Graph operation migrates.
Presumably, you don't do anything in special support of reification as
it's handled (if there is anything to do - there isn't in the default
'Standard' mode) in GraphBase.
Andy
>
> thanks
> Mario
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
On 05/09/12 12:37, Mario Ds Briggs wrote:
> Andy,
>
> In DB2 we extend GraphBase and then override some of the methods... so just
> clarifying
That's useful to know. Do you hook into Jena in other ways as well?
>>>
> My proposal is that the useful operations are moved to Graph, the code
> for the withReifications forms migrate to the only callers in ModelCom.
> <<
>
> Today when Model.add(Model) is called by end user, the code flows to
> ModelCom.add(Model)
> ModelCom.add(Model,boolean)
> BulkUpdateHandler().add(Graph, boolean)
>
> So you are saying that ModelCom will now call Graph.add(Graph) and so as
> long as one overrides the new Graph.add(Graph) method, ModelCom would
> invoke it.
Yes - the functionality of Model.add(Model) remains the same but the
Graph operation migrates.
Presumably, you don't do anything in special support of reification as
it's handled (if there is anything to do - there isn't in the default
'Standard' mode) in GraphBase.
Andy
>
> thanks
> Mario
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Mario Ds Briggs <ma...@in.ibm.com>.
Andy,
In DB2 we extend GraphBase and then override some of the methods... so just
clarifying
>>
My proposal is that the useful operations are moved to Graph, the code
for the withReifications forms migrate to the only callers in ModelCom.
<<
Today when Model.add(Model) is called by end user, the code flows to
ModelCom.add(Model)
ModelCom.add(Model,boolean)
BulkUpdateHandler().add(Graph, boolean)
So you are saying that ModelCom will now call Graph.add(Graph) and so as
long as one overrides the new Graph.add(Graph) method, ModelCom would
invoke it.
thanks
Mario
From: Andy Seaborne <an...@apache.org>
To: dev@jena.apache.org
Date: 09/03/2012 11:04 PM
Subject: Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by: Andy Seaborne <an...@gmail.com>
As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
propose we
Remove BulkUpdateHandler interface
Migrate it's few useful operation to Graph.
Start to provide reification with "standard" only.
graph.QueryHandler only used to support reification.
== BulkUpdateHandler
The two implementations I know of are
SimpleBulkUpdateHandler
UpdateHandlerSDB
A few of it's operations are useful but most turn into nothing but loops
to call add(Triple)/delete(Triple).
Event handling details each operation kind but, as far as I can see,
this becomes individual calls to an "addedStatement"/"removedStatement"
at the Model level i.e. the different between adding by array or list or
iterator gets lost.
The useful operations are:
add(Graph)
delete(Graph)
removeAll()
remove(s,p,o)
and the slightly bizarre:
add(Graph, withReifications)
delete(Graph, withReifications)
(see below about reification)
and the less useful (because they don't relate to the way the storage
might properly batch changes - the provider shouldn't decide the batch
boundaries) which turn into add(Triple)/delete(Triple)
add(Triple [])
add( List<Triple>)
add( Iterator<Triple>)
delete(Triple [])
delete( List<Triple>)
delete( Iterator<Triple>)
The only calls to these "add" operations are from ARP which batches it's
changes into units of 1000, but not a whole parser run. As the
SimpleBulkUpdate handler turns these into single calls, nothing gained.
My proposal is that the useful operations are moved to Graph, the code
for the withReifications forms migrate to the only callers in ModelCom.
UpdateHandlerSDB:
This only uses the UpdateHandler interface to wrap the calls in
start/finish bulk update to implicitly increase the scope of bulk
updates. But it isn't
== Reification
The intent is to only support the default standard eventually.
Standard can be provided by code, with no retained state (partial
reificiations). TDB and SDB do not support anything except "standard".
This leads to ....
(graph.)QueryHandler:
It's main use is with reification. I think we can remove it when
reification is replaced by a straight code implications.
Andy
See also JENA-189
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Mario Ds Briggs <ma...@in.ibm.com>.
I forgot to mention that but anyways u got to it seems.... our Bulkhandler
implementation extended SimpleBulkUpdateHandler and overrode methods OTHER
than the list below (becuase i guess we didn't want to have the below
methods in our code that simply routed back to the ones we overrode). The
only other remnant is this manager.notifyAddXXX(...) call we make at the
end before returning. I am not sure what the latter did
Mario
From: Andy Seaborne <an...@apache.org>
To: dev@jena.apache.org
Date: 09/06/2012 07:13 PM
Subject: Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by: Andy Seaborne <an...@gmail.com>
I'll put some deprecations into the codebase on BulkUpdateHandler for
> add(Triple [])
> add( List<Triple>)
> add( Iterator<Triple>)
> delete(Triple [])
> delete( List<Triple>)
> delete( Iterator<Triple>)
(see also SimpleBulkUpdateHandler ... which isn't so simple because of
events)
and we can see what need changing and how much.
A first pass didn't look too bad at all - the event handling needs
checking and I'm not convinced that the current complexity adds
anything, or if it is even used by anything other than the test suite.
e.g. StatementListener converts all different the calls into multiple
calls of addedStatement or removedStatement.
Andy
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
I'll put some deprecations into the codebase on BulkUpdateHandler for
> add(Triple [])
> add( List<Triple>)
> add( Iterator<Triple>)
> delete(Triple [])
> delete( List<Triple>)
> delete( Iterator<Triple>)
(see also SimpleBulkUpdateHandler ... which isn't so simple because of
events)
and we can see what need changing and how much.
A first pass didn't look too bad at all - the event handling needs
checking and I'm not convinced that the current complexity adds
anything, or if it is even used by anything other than the test suite.
e.g. StatementListener converts all different the calls into multiple
calls of addedStatement or removedStatement.
Andy
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Stephen Allen <sa...@apache.org>.
On Wed, Sep 5, 2012 at 3:25 AM, Andy Seaborne <an...@apache.org> wrote:
> On 04/09/12 19:13, Stephen Allen wrote:
>>
>> On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> On 04/09/12 08:30, Dave Reynolds wrote:
>>>>
>>>>
>>>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>>>
>>>>>
>>>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like
>>>>> to
>>>>> propose we
>>>>>
>>>>> Remove BulkUpdateHandler interface
>>>>> Migrate it's few useful operation to Graph.
>>>>>
>>>>> Start to provide reification with "standard" only.
>>>>> graph.QueryHandler only used to support reification.
>>>>
>>>>
>>
>> +1 on both of these. They are annoying to code for, and as you say,
>> force batch triple changes into the client code where it doesn't
>> really belong.
>>
>> How about removing TransactionHandler as well?
>
>
> One step at a time!
>
> TransactionHandler is not quite the right abstraction. It's begin() does not
> indicate read or write intentions and this is reflected in the Model
> transactional interface.
>
> It might be possible to add promotable transactions in TDB but noting
> everywhere a update can occur so read->write if necessary is not trivial.
>
> If another transaction has committed, and the reader is looking at the old
> DB state, then it's not possible (no locks on parts of the DB). So at the
> point of promotion, the transaction may abort. Not a nice programming
> paradigm. It's a side effect of having triples - what has been updated
> where is not coupled to the application data model. Triple level locking
> struck me as going to be very expensive for not a lot of benefit.
>
> My preference is to change Model.begin (i.e. Model implements Transactional
> after transactional moved to somewhere general).
Agreed. I think having the user specify READ or WRITE at transaction
begin() is the right way to go. Promoting transactions is fraught
with all the issues you brought up.
>
> But that will cause certain users "some issues" :-)
>
>
>> Also make Dataset extend Transactional instead of copying the methods?
>
>
> That's merely to isolate Dataset from the internal/graph level Transactional
> interface - they ended up the same operations.
>
> From experience, using abstractions at Graph and at Model (DatasetGraph and
> Dataset) levels can lead to problems renaming things later. So it has
> started as separate - deciding to merge later is possible.
>
>
>>>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>>>> assumption that a store *might* optimize the updates but at least some
>>>> of those cases are, or could be made, add(Graph) calls.
>>>>
>>>> Presumably this would this be a normal deprecate-then-remove-later
>>>> cycle?
>>>
>>>
>>>
>>> The Model operations remain -
>>>
>>> Model.add(Statement[])
>>> Model.add(StmtIterator iter)
>>> Model.add(List<Statement> statements)
>>>
>>> We could also consider simplification at the Model level- that wasn't on
>>> my
>>> list.
>>>
>>> It's interesting if you're using Graph level in an application - I'd like
>>> to
>>> promote the Graph "SPI" as a more formally API after the cleaning up.
>>>
>>
>> I tend to use the Graph SPI in my application as the objects are
>> immutable.
>
>
> I switch to which ever is most convenient ... but this will all help in
> making the SPI more formally a public API. It's not far off that today.
>
>
>>> Is that code still in use?
>>>
>>> At the SPI level (Graph API), there is less of a contract on migration.
>>>
>>> The reasoner does use the bulk update handler - but that's in-codebase so
>>> that will just be cleaned up as part of the change. (Elsewhere in the
>>> main
>>> code base most calls to getBulkUpdateHandler are ... implementations of
>>> getBulkUpdateHandler over another graph!)
>>>
>>> TDB has a bulk update handler to do removeAll and remove(s,p,o) without
>>> the
>>> problems isomorphic to CCME.
>>>
>>> SDB has a bulk update handler but the implementation of removeAll is in
>>> the
>>> graph anyway.
>>>
>>> In executing on this, I'd do at least a local deprecation cycle to track
>>> down and migrate the current code without needing a local big bang.
>>>
>>> Then at least a minor number version change to 2.10.0 would be good when
>>> the
>>> change happens.
>>>
>>> Observation on the deprecate-remove cycle: we know people don't upgrade
>>> incrementally because they don't need to.
>>>
>>> Another issue on the release pipeline is that some users are not testing
>>> the
>>> development builds, only checking after a release. That's bad for them
>>> and
>>> less than helpful for us. I don't want the effect of this to be making
>>> work
>>> for the project. We make the deliverables in exactly the form we release
>>> in
>>> every night so the only difference is location.
>>>
>>> Andy
>>>
>>>>
>>>> Dave
>>>>
>>>
>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/12 19:13, Stephen Allen wrote:
> On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
>> On 04/09/12 08:30, Dave Reynolds wrote:
>>>
>>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>>
>>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>>> propose we
>>>>
>>>> Remove BulkUpdateHandler interface
>>>> Migrate it's few useful operation to Graph.
>>>>
>>>> Start to provide reification with "standard" only.
>>>> graph.QueryHandler only used to support reification.
>>>
>
> +1 on both of these. They are annoying to code for, and as you say,
> force batch triple changes into the client code where it doesn't
> really belong.
>
> How about removing TransactionHandler as well?
One step at a time!
TransactionHandler is not quite the right abstraction. It's begin() does
not indicate read or write intentions and this is reflected in the Model
transactional interface.
It might be possible to add promotable transactions in TDB but noting
everywhere a update can occur so read->write if necessary is not trivial.
If another transaction has committed, and the reader is looking at the
old DB state, then it's not possible (no locks on parts of the DB). So
at the point of promotion, the transaction may abort. Not a nice
programming paradigm. It's a side effect of having triples - what has
been updated where is not coupled to the application data model. Triple
level locking struck me as going to be very expensive for not a lot of
benefit.
My preference is to change Model.begin (i.e. Model implements
Transactional after transactional moved to somewhere general).
But that will cause certain users "some issues" :-)
> Also make Dataset extend Transactional instead of copying the methods?
That's merely to isolate Dataset from the internal/graph level
Transactional interface - they ended up the same operations.
From experience, using abstractions at Graph and at Model (DatasetGraph
and Dataset) levels can lead to problems renaming things later. So it
has started as separate - deciding to merge later is possible.
>>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>>> assumption that a store *might* optimize the updates but at least some
>>> of those cases are, or could be made, add(Graph) calls.
>>>
>>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>>
>>
>> The Model operations remain -
>>
>> Model.add(Statement[])
>> Model.add(StmtIterator iter)
>> Model.add(List<Statement> statements)
>>
>> We could also consider simplification at the Model level- that wasn't on my
>> list.
>>
>> It's interesting if you're using Graph level in an application - I'd like to
>> promote the Graph "SPI" as a more formally API after the cleaning up.
>>
>
> I tend to use the Graph SPI in my application as the objects are immutable.
I switch to which ever is most convenient ... but this will all help in
making the SPI more formally a public API. It's not far off that today.
>> Is that code still in use?
>>
>> At the SPI level (Graph API), there is less of a contract on migration.
>>
>> The reasoner does use the bulk update handler - but that's in-codebase so
>> that will just be cleaned up as part of the change. (Elsewhere in the main
>> code base most calls to getBulkUpdateHandler are ... implementations of
>> getBulkUpdateHandler over another graph!)
>>
>> TDB has a bulk update handler to do removeAll and remove(s,p,o) without the
>> problems isomorphic to CCME.
>>
>> SDB has a bulk update handler but the implementation of removeAll is in the
>> graph anyway.
>>
>> In executing on this, I'd do at least a local deprecation cycle to track
>> down and migrate the current code without needing a local big bang.
>>
>> Then at least a minor number version change to 2.10.0 would be good when the
>> change happens.
>>
>> Observation on the deprecate-remove cycle: we know people don't upgrade
>> incrementally because they don't need to.
>>
>> Another issue on the release pipeline is that some users are not testing the
>> development builds, only checking after a release. That's bad for them and
>> less than helpful for us. I don't want the effect of this to be making work
>> for the project. We make the deliverables in exactly the form we release in
>> every night so the only difference is location.
>>
>> Andy
>>
>>>
>>> Dave
>>>
>>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Stephen Allen <sa...@apache.org>.
On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>>
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>> propose we
>>>
>>> Remove BulkUpdateHandler interface
>>> Migrate it's few useful operation to Graph.
>>>
>>> Start to provide reification with "standard" only.
>>> graph.QueryHandler only used to support reification.
>>
+1 on both of these. They are annoying to code for, and as you say,
force batch triple changes into the client code where it doesn't
really belong.
How about removing TransactionHandler as well?
Also make Dataset extend Transactional instead of copying the methods?
>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on my
> list.
>
> It's interesting if you're using Graph level in an application - I'd like to
> promote the Graph "SPI" as a more formally API after the cleaning up.
>
I tend to use the Graph SPI in my application as the objects are immutable.
> Is that code still in use?
>
> At the SPI level (Graph API), there is less of a contract on migration.
>
> The reasoner does use the bulk update handler - but that's in-codebase so
> that will just be cleaned up as part of the change. (Elsewhere in the main
> code base most calls to getBulkUpdateHandler are ... implementations of
> getBulkUpdateHandler over another graph!)
>
> TDB has a bulk update handler to do removeAll and remove(s,p,o) without the
> problems isomorphic to CCME.
>
> SDB has a bulk update handler but the implementation of removeAll is in the
> graph anyway.
>
> In executing on this, I'd do at least a local deprecation cycle to track
> down and migrate the current code without needing a local big bang.
>
> Then at least a minor number version change to 2.10.0 would be good when the
> change happens.
>
> Observation on the deprecate-remove cycle: we know people don't upgrade
> incrementally because they don't need to.
>
> Another issue on the release pipeline is that some users are not testing the
> development builds, only checking after a release. That's bad for them and
> less than helpful for us. I don't want the effect of this to be making work
> for the project. We make the deliverables in exactly the form we release in
> every night so the only difference is location.
>
> Andy
>
>>
>> Dave
>>
>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Simon Helsen <sh...@ca.ibm.com>.
Guys,
It would make our lives a lot more painful if you don't follow a
deprecate-then-remove-later cycle for the BulkUpdateHandler interface. We
have client code programmed against this and in our db2 implementation
there is also a custom implementation (for obvious reasons - that was
probably the original motivation for the API in the first place). If you
remove (or even move) the operations in a big bang, we can't gracefully
adopt.
Btw, we *always* test unreleased snapshots as much as possible. An absence
of a deprecate-then-remove-later cycle would make this more difficult.
Also, there are many separate dependant groups in our organization that we
cannot possibly ever handle big bang changes.
Finally, this brings me to a question I have posed before. Jena, so far,
has not been very clear on what is public API and what is not. I read here
that the Graph API sits at the SPI level but we have countless clients
which program against this level as well since it is readily available and
easy to reach. There is no indication anywhere (e.g. in the javadoc or
packaging) that these APIs are not public. The first step towards
improving this situation is to declare, perhaps at the package level, what
APIs are public and what not so clients know what kind of contract they
get themselves into
thanks
Simon
From:
Dave Reynolds <da...@gmail.com>
To:
dev@jena.apache.org
Date:
09/04/2012 05:46 AM
Subject:
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
On 04/09/12 10:21, Andy Seaborne wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like
to
>>> propose we
>>>
>>> Remove BulkUpdateHandler interface
>>> Migrate it's few useful operation to Graph.
>>>
>>> Start to provide reification with "standard" only.
>>> graph.QueryHandler only used to support reification.
>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later
cycle?
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on
> my list.
>
> It's interesting if you're using Graph level in an application - I'd
> like to promote the Graph "SPI" as a more formally API after the
> cleaning up.
>
> Is that code still in use?
I thought it was but having now checked then no, none of my currently
in-use projects use BulkUpdateHandler directly.
> At the SPI level (Graph API), there is less of a contract on migration.
Agreed. I'm fine with going ahead with the change.
> The reasoner does use the bulk update handler - but that's in-codebase
> so that will just be cleaned up as part of the change.
Sure.
Dave
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Dave Reynolds <da...@gmail.com>.
On 04/09/12 10:21, Andy Seaborne wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>> propose we
>>>
>>> Remove BulkUpdateHandler interface
>>> Migrate it's few useful operation to Graph.
>>>
>>> Start to provide reification with "standard" only.
>>> graph.QueryHandler only used to support reification.
>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on
> my list.
>
> It's interesting if you're using Graph level in an application - I'd
> like to promote the Graph "SPI" as a more formally API after the
> cleaning up.
>
> Is that code still in use?
I thought it was but having now checked then no, none of my currently
in-use projects use BulkUpdateHandler directly.
> At the SPI level (Graph API), there is less of a contract on migration.
Agreed. I'm fine with going ahead with the change.
> The reasoner does use the bulk update handler - but that's in-codebase
> so that will just be cleaned up as part of the change.
Sure.
Dave
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/12 08:30, Dave Reynolds wrote:
> On 03/09/12 18:33, Andy Seaborne wrote:
>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>> propose we
>>
>> Remove BulkUpdateHandler interface
>> Migrate it's few useful operation to Graph.
>>
>> Start to provide reification with "standard" only.
>> graph.QueryHandler only used to support reification.
>
> Seems reasonable. I've used BulkUpdateHandler in client code on the
> assumption that a store *might* optimize the updates but at least some
> of those cases are, or could be made, add(Graph) calls.
>
> Presumably this would this be a normal deprecate-then-remove-later cycle?
The Model operations remain -
Model.add(Statement[])
Model.add(StmtIterator iter)
Model.add(List<Statement> statements)
We could also consider simplification at the Model level- that wasn't on
my list.
It's interesting if you're using Graph level in an application - I'd
like to promote the Graph "SPI" as a more formally API after the
cleaning up.
Is that code still in use?
At the SPI level (Graph API), there is less of a contract on migration.
The reasoner does use the bulk update handler - but that's in-codebase
so that will just be cleaned up as part of the change. (Elsewhere in
the main code base most calls to getBulkUpdateHandler are ...
implementations of getBulkUpdateHandler over another graph!)
TDB has a bulk update handler to do removeAll and remove(s,p,o) without
the problems isomorphic to CCME.
SDB has a bulk update handler but the implementation of removeAll is in
the graph anyway.
In executing on this, I'd do at least a local deprecation cycle to track
down and migrate the current code without needing a local big bang.
Then at least a minor number version change to 2.10.0 would be good when
the change happens.
Observation on the deprecate-remove cycle: we know people don't upgrade
incrementally because they don't need to.
Another issue on the release pipeline is that some users are not testing
the development builds, only checking after a release. That's bad for
them and less than helpful for us. I don't want the effect of this to
be making work for the project. We make the deliverables in exactly the
form we release in every night so the only difference is location.
Andy
>
> Dave
>
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Posted by Dave Reynolds <da...@gmail.com>.
On 03/09/12 18:33, Andy Seaborne wrote:
> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
> propose we
>
> Remove BulkUpdateHandler interface
> Migrate it's few useful operation to Graph.
>
> Start to provide reification with "standard" only.
> graph.QueryHandler only used to support reification.
Seems reasonable. I've used BulkUpdateHandler in client code on the
assumption that a store *might* optimize the updates but at least some
of those cases are, or could be made, add(Graph) calls.
Presumably this would this be a normal deprecate-then-remove-later cycle?
Dave