You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2012/09/03 19:33:39 UTC

Evolution: BulkUpdateHandler / Reification / QueryHandler

As part of wanting to tidy up and reduce the "core" of Jena, I'd like to 
propose we

   Remove BulkUpdateHandler interface
     Migrate it's few useful operation to Graph.

   Start to provide reification with "standard" only.
     graph.QueryHandler only used to support reification.


== BulkUpdateHandler

The two implementations I know of are

  SimpleBulkUpdateHandler
  UpdateHandlerSDB

A few of it's operations are useful but most turn into nothing but loops 
to call add(Triple)/delete(Triple).

Event handling details each operation kind but, as far as I can see, 
this becomes individual calls to an "addedStatement"/"removedStatement" 
at the Model level i.e. the different between adding by array or list or 
iterator gets lost.

The useful operations are:
   add(Graph)
   delete(Graph)
   removeAll()
   remove(s,p,o)

and the slightly bizarre:

   add(Graph, withReifications)
   delete(Graph, withReifications)

(see below about reification)

and the less useful (because they don't relate to the way the storage 
might properly batch changes - the provider shouldn't decide the batch 
boundaries) which turn into add(Triple)/delete(Triple)

   add(Triple [])
   add( List<Triple>)
   add( Iterator<Triple>)
   delete(Triple [])
   delete( List<Triple>)
   delete( Iterator<Triple>)

The only calls to these "add" operations are from ARP which batches it's 
changes into units of 1000, but not a whole parser run. As the 
SimpleBulkUpdate handler turns these into single calls, nothing gained.

My proposal is that the useful operations are moved to Graph, the code 
for the withReifications forms migrate to the only callers in ModelCom.

UpdateHandlerSDB:

This only uses the UpdateHandler interface to wrap the calls in 
start/finish bulk update to implicitly increase the scope of bulk 
updates.  But it isn't

== Reification

The intent is to only support the default standard eventually.

Standard can be provided by code, with no retained state (partial 
reificiations).  TDB and SDB do not support anything except "standard".

This leads to ....

(graph.)QueryHandler:
It's main use is with reification.  I think we can remove it when 
reification is replaced by a straight code implications.

	Andy

See also JENA-189

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/12 07:29, Claude Warren wrote:
> +1
>
> I have to agree that this is a nice simplification of the jena complexity.
>   It would be nice to know why they were created in the first place, just to
> ensure that those issues are accounted for.  However, I don't see any
> reason not to do this and several reasons to proceed.
>
> Claude

Good question.

What I want to do is simply and reduce the Graph layer. 
Graph/Triple/Node is a key abstraction for extension both downwards 
(storage, inference) and upwards (Model, query, client).

I can give my personal, looking-back perspective and remembering I 
wasn't there right at the beginning of the Model API.

And we learn - sometimes things looked to be the right thing at the time 
but don't always turn out as expected either because a design didn't 
work out (internal) or the world has gone in a different direction 
(external).

These features here aren't used or are used so little that they create 
complexity for an extension and for maintenance with very little benefit.

BulkUpdateHandler falls into the internal category.  Batching changes 
was obviously important right from the very first database backed 
storage layer (before even RDB) because doing in a batch can be cheaper 
than doing them one at a time (e.g. JDBC commit around a batch is much 
cheaper that a commit for every triple).

BulkUpdateHandler does not meet the needs for that:

1/ The batch size is driven from the client but the correct size is a 
matter for the storage if batching matters at all.

2/ It complicates each application to manage the batching when it could 
be done once in the graph implementation if it matters.  For a library 
function, like a parser, to know the right batching is hard and probably 
messes up it's API.

Streaming + storage-side internal batching is better.

So keep the operations that have some practical use, for example, adding 
Graph.removeAll, and don't put it off to one side.  It can still be 
overridden.

Reification:

Semweb has moved on and reification is not important - quoting one 
triple leaves the issue of grouping of quoted triples together and often 
fact-units come in the form of more than one triple.  Named graphs are 
playing the role for quoted facts - named graph post date reification.

The number of uses of it outside "standard" is very low.  "standard" can 
be done in code over a store of triples; the other modes "minimal" and 
"convenient" need some state to be kept.

http://jena.apache.org/documentation/notes/reification.html#reification-styles

(most of the rest of the documentation remains - the Model API is onyl 
affected in that there is only one style).

Keeping the state is an implementation cost and complexity especially 
for persistent storage layers.  Quite a lot of effort for the RDB layer 
went into reification.

So maintain the interface at the Model level - make Graph simpler.

graph.QueryHandler (qQH):

Once up to a time there was RDQL and an RDQL query is, in SPARQL terms, 
a basic graph patterns, a filter and a projection and nothing else.  qQH 
does that.  SPARQL is a bit more complicated.  qQH isn't the right 
building block for SPARQL - it's execution API doesn't extend well into 
a larger framework so we have ended up with some duplication.

So remove it.  It all goes to making graph simpler - and Graph is a key 
abstraction for extension.

	Andy

>
> On Mon, Sep 3, 2012 at 6:33 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>> propose we
>>
>>    Remove BulkUpdateHandler interface
>>      Migrate it's few useful operation to Graph.
>>
>>    Start to provide reification with "standard" only.
>>      graph.QueryHandler only used to support reification.
>>
>>
>> == BulkUpdateHandler
>>
>> The two implementations I know of are
>>
>>   SimpleBulkUpdateHandler
>>   UpdateHandlerSDB
>>
>> A few of it's operations are useful but most turn into nothing but loops
>> to call add(Triple)/delete(Triple).
>>
>> Event handling details each operation kind but, as far as I can see, this
>> becomes individual calls to an "addedStatement"/"**removedStatement" at
>> the Model level i.e. the different between adding by array or list or
>> iterator gets lost.
>>
>> The useful operations are:
>>    add(Graph)
>>    delete(Graph)
>>    removeAll()
>>    remove(s,p,o)
>>
>> and the slightly bizarre:
>>
>>    add(Graph, withReifications)
>>    delete(Graph, withReifications)
>>
>> (see below about reification)
>>
>> and the less useful (because they don't relate to the way the storage
>> might properly batch changes - the provider shouldn't decide the batch
>> boundaries) which turn into add(Triple)/delete(Triple)
>>
>>    add(Triple [])
>>    add( List<Triple>)
>>    add( Iterator<Triple>)
>>    delete(Triple [])
>>    delete( List<Triple>)
>>    delete( Iterator<Triple>)
>>
>> The only calls to these "add" operations are from ARP which batches it's
>> changes into units of 1000, but not a whole parser run. As the
>> SimpleBulkUpdate handler turns these into single calls, nothing gained.
>>
>> My proposal is that the useful operations are moved to Graph, the code for
>> the withReifications forms migrate to the only callers in ModelCom.
>>
>> UpdateHandlerSDB:
>>
>> This only uses the UpdateHandler interface to wrap the calls in
>> start/finish bulk update to implicitly increase the scope of bulk updates.
>>   But it isn't
>>
>> == Reification
>>
>> The intent is to only support the default standard eventually.
>>
>> Standard can be provided by code, with no retained state (partial
>> reificiations).  TDB and SDB do not support anything except "standard".
>>
>> This leads to ....
>>
>> (graph.)QueryHandler:
>> It's main use is with reification.  I think we can remove it when
>> reification is replaced by a straight code implications.
>>
>>          Andy
>>
>> See also JENA-189
>>
>
>
>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Claude Warren <cl...@xenei.com>.

+1

I have to agree that this is a nice simplification of the jena complexity.
 It would be nice to know why they were created in the first place, just to
ensure that those issues are accounted for.  However, I don't see any
reason not to do this and several reasons to proceed.

Claude

On Mon, Sep 3, 2012 at 6:33 PM, Andy Seaborne <an...@apache.org> wrote:

> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
> propose we
>
>   Remove BulkUpdateHandler interface
>     Migrate it's few useful operation to Graph.
>
>   Start to provide reification with "standard" only.
>     graph.QueryHandler only used to support reification.
>
>
> == BulkUpdateHandler
>
> The two implementations I know of are
>
>  SimpleBulkUpdateHandler
>  UpdateHandlerSDB
>
> A few of it's operations are useful but most turn into nothing but loops
> to call add(Triple)/delete(Triple).
>
> Event handling details each operation kind but, as far as I can see, this
> becomes individual calls to an "addedStatement"/"**removedStatement" at
> the Model level i.e. the different between adding by array or list or
> iterator gets lost.
>
> The useful operations are:
>   add(Graph)
>   delete(Graph)
>   removeAll()
>   remove(s,p,o)
>
> and the slightly bizarre:
>
>   add(Graph, withReifications)
>   delete(Graph, withReifications)
>
> (see below about reification)
>
> and the less useful (because they don't relate to the way the storage
> might properly batch changes - the provider shouldn't decide the batch
> boundaries) which turn into add(Triple)/delete(Triple)
>
>   add(Triple [])
>   add( List<Triple>)
>   add( Iterator<Triple>)
>   delete(Triple [])
>   delete( List<Triple>)
>   delete( Iterator<Triple>)
>
> The only calls to these "add" operations are from ARP which batches it's
> changes into units of 1000, but not a whole parser run. As the
> SimpleBulkUpdate handler turns these into single calls, nothing gained.
>
> My proposal is that the useful operations are moved to Graph, the code for
> the withReifications forms migrate to the only callers in ModelCom.
>
> UpdateHandlerSDB:
>
> This only uses the UpdateHandler interface to wrap the calls in
> start/finish bulk update to implicitly increase the scope of bulk updates.
>  But it isn't
>
> == Reification
>
> The intent is to only support the default standard eventually.
>
> Standard can be provided by code, with no retained state (partial
> reificiations).  TDB and SDB do not support anything except "standard".
>
> This leads to ....
>
> (graph.)QueryHandler:
> It's main use is with reification.  I think we can remove it when
> reification is replaced by a straight code implications.
>
>         Andy
>
> See also JENA-189
>



-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

Mario - thanks for the details.

graph.QueryHandler is not related to SPARQL execution (and it changed in 
May in an incompatible way so we have a fairly good idea no one external 
is using it).

	Andy


On 06/09/12 10:37, Mario Ds Briggs wrote:
> We do implement a number of the interfaces since we support using DB2 via
> the JENA API,
>    the standard XXXFactory and related ones for query execution and then
> resultset handling
>    the Dataset related ones - Dataset, DatasetGraph, Transactional,
> GraphStore
>
>>>
> Presumably, you don't do anything in special support of reification as
> it's handled
> <<
> Yes.
>
> Mario
>
>
>
> From:	Andy Seaborne <an...@apache.org>
> To:	dev@jena.apache.org
> Date:	09/05/2012 07:22 PM
> Subject:	Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
> Sent by:	Andy Seaborne <an...@gmail.com>
>
>
>
> On 05/09/12 12:37, Mario Ds Briggs wrote:
>> Andy,
>>
>> In DB2 we extend GraphBase and then override some of the methods... so
> just
>> clarifying
>
> That's useful to know.  Do you hook into Jena in other ways as well?
>
>>>>
>> My proposal is that the useful operations are moved to Graph, the code
>> for the withReifications forms migrate to the only callers in ModelCom.
>> <<
>>
>> Today when Model.add(Model) is called by end user, the code flows to
>>    ModelCom.add(Model)
>>    ModelCom.add(Model,boolean)
>>    BulkUpdateHandler().add(Graph, boolean)
>>
>> So you are saying that ModelCom will now call Graph.add(Graph) and so as
>> long as one overrides the new Graph.add(Graph) method, ModelCom would
>> invoke it.
>
> Yes - the functionality of Model.add(Model) remains the same but the
> Graph operation migrates.
>
> Presumably, you don't do anything in special support of reification as
> it's handled (if there is anything to do - there isn't in the default
> 'Standard' mode) in GraphBase.
>
> 		 Andy
>
>>
>> thanks
>> Mario
>
>
>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Mario Ds Briggs <ma...@in.ibm.com>.

We do implement a number of the interfaces since we support using DB2 via
the JENA API,
  the standard XXXFactory and related ones for query execution and then
resultset handling
  the Dataset related ones - Dataset, DatasetGraph, Transactional,
GraphStore

>>
Presumably, you don't do anything in special support of reification as
it's handled
<<
Yes.

Mario



From:	Andy Seaborne <an...@apache.org>
To:	dev@jena.apache.org
Date:	09/05/2012 07:22 PM
Subject:	Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by:	Andy Seaborne <an...@gmail.com>



On 05/09/12 12:37, Mario Ds Briggs wrote:
> Andy,
>
> In DB2 we extend GraphBase and then override some of the methods... so
just
> clarifying

That's useful to know.  Do you hook into Jena in other ways as well?

>>>
> My proposal is that the useful operations are moved to Graph, the code
> for the withReifications forms migrate to the only callers in ModelCom.
> <<
>
> Today when Model.add(Model) is called by end user, the code flows to
>   ModelCom.add(Model)
>   ModelCom.add(Model,boolean)
>   BulkUpdateHandler().add(Graph, boolean)
>
> So you are saying that ModelCom will now call Graph.add(Graph) and so as
> long as one overrides the new Graph.add(Graph) method, ModelCom would
> invoke it.

Yes - the functionality of Model.add(Model) remains the same but the
Graph operation migrates.

Presumably, you don't do anything in special support of reification as
it's handled (if there is anything to do - there isn't in the default
'Standard' mode) in GraphBase.

		 Andy

>
> thanks
> Mario

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

On 05/09/12 12:37, Mario Ds Briggs wrote:
> Andy,
>
> In DB2 we extend GraphBase and then override some of the methods... so just
> clarifying

That's useful to know.  Do you hook into Jena in other ways as well?

>>>
> My proposal is that the useful operations are moved to Graph, the code
> for the withReifications forms migrate to the only callers in ModelCom.
> <<
>
> Today when Model.add(Model) is called by end user, the code flows to
>   ModelCom.add(Model)
>   ModelCom.add(Model,boolean)
>   BulkUpdateHandler().add(Graph, boolean)
>
> So you are saying that ModelCom will now call Graph.add(Graph) and so as
> long as one overrides the new Graph.add(Graph) method, ModelCom would
> invoke it.

Yes - the functionality of Model.add(Model) remains the same but the 
Graph operation migrates.

Presumably, you don't do anything in special support of reification as 
it's handled (if there is anything to do - there isn't in the default 
'Standard' mode) in GraphBase.

	Andy	

>
> thanks
> Mario

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Mario Ds Briggs <ma...@in.ibm.com>.

Andy,

In DB2 we extend GraphBase and then override some of the methods... so just
clarifying
>>
My proposal is that the useful operations are moved to Graph, the code
for the withReifications forms migrate to the only callers in ModelCom.
<<

Today when Model.add(Model) is called by end user, the code flows to
 ModelCom.add(Model)
 ModelCom.add(Model,boolean)
 BulkUpdateHandler().add(Graph, boolean)

So you are saying that ModelCom will now call Graph.add(Graph) and so as
long as one overrides the new Graph.add(Graph) method, ModelCom would
invoke it.

thanks
Mario



From:	Andy Seaborne <an...@apache.org>
To:	dev@jena.apache.org
Date:	09/03/2012 11:04 PM
Subject:	Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by:	Andy Seaborne <an...@gmail.com>



As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
propose we

   Remove BulkUpdateHandler interface
     Migrate it's few useful operation to Graph.

   Start to provide reification with "standard" only.
     graph.QueryHandler only used to support reification.


== BulkUpdateHandler

The two implementations I know of are

  SimpleBulkUpdateHandler
  UpdateHandlerSDB

A few of it's operations are useful but most turn into nothing but loops
to call add(Triple)/delete(Triple).

Event handling details each operation kind but, as far as I can see,
this becomes individual calls to an "addedStatement"/"removedStatement"
at the Model level i.e. the different between adding by array or list or
iterator gets lost.

The useful operations are:
   add(Graph)
   delete(Graph)
   removeAll()
   remove(s,p,o)

and the slightly bizarre:

   add(Graph, withReifications)
   delete(Graph, withReifications)

(see below about reification)

and the less useful (because they don't relate to the way the storage
might properly batch changes - the provider shouldn't decide the batch
boundaries) which turn into add(Triple)/delete(Triple)

   add(Triple [])
   add( List<Triple>)
   add( Iterator<Triple>)
   delete(Triple [])
   delete( List<Triple>)
   delete( Iterator<Triple>)

The only calls to these "add" operations are from ARP which batches it's
changes into units of 1000, but not a whole parser run. As the
SimpleBulkUpdate handler turns these into single calls, nothing gained.

My proposal is that the useful operations are moved to Graph, the code
for the withReifications forms migrate to the only callers in ModelCom.

UpdateHandlerSDB:

This only uses the UpdateHandler interface to wrap the calls in
start/finish bulk update to implicitly increase the scope of bulk
updates.  But it isn't

== Reification

The intent is to only support the default standard eventually.

Standard can be provided by code, with no retained state (partial
reificiations).  TDB and SDB do not support anything except "standard".

This leads to ....

(graph.)QueryHandler:
It's main use is with reification.  I think we can remove it when
reification is replaced by a straight code implications.

		 Andy

See also JENA-189

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Mario Ds Briggs <ma...@in.ibm.com>.

I forgot to mention that but anyways u got to it seems.... our Bulkhandler
implementation extended SimpleBulkUpdateHandler and overrode methods OTHER
than the list below (becuase i guess we didn't want to have the below
methods in our code that simply routed back to the ones we overrode). The
only other remnant is this manager.notifyAddXXX(...) call we make at the
end before returning. I am not sure what the latter did

Mario




From:	Andy Seaborne <an...@apache.org>
To:	dev@jena.apache.org
Date:	09/06/2012 07:13 PM
Subject:	Re: Evolution: BulkUpdateHandler / Reification / QueryHandler
Sent by:	Andy Seaborne <an...@gmail.com>



I'll put some deprecations into the codebase on BulkUpdateHandler for

>   add(Triple [])
>   add( List<Triple>)
>   add( Iterator<Triple>)
>   delete(Triple [])
>   delete( List<Triple>)
>   delete( Iterator<Triple>)

(see also SimpleBulkUpdateHandler ... which isn't so simple because of
events)

and we can see what need changing and how much.

A first pass didn't look too bad at all - the event handling needs
checking and I'm not convinced that the current complexity adds
anything, or if it is even  used by anything other than the test suite.

e.g. StatementListener converts all different the calls into multiple
calls of addedStatement or removedStatement.

		 Andy

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

I'll put some deprecations into the codebase on BulkUpdateHandler for

>   add(Triple [])
>   add( List<Triple>)
>   add( Iterator<Triple>)
>   delete(Triple [])
>   delete( List<Triple>)
>   delete( Iterator<Triple>)

(see also SimpleBulkUpdateHandler ... which isn't so simple because of 
events)

and we can see what need changing and how much.

A first pass didn't look too bad at all - the event handling needs 
checking and I'm not convinced that the current complexity adds 
anything, or if it is even  used by anything other than the test suite.

e.g. StatementListener converts all different the calls into multiple 
calls of addedStatement or removedStatement.

	Andy

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Stephen Allen <sa...@apache.org>.

On Wed, Sep 5, 2012 at 3:25 AM, Andy Seaborne <an...@apache.org> wrote:
> On 04/09/12 19:13, Stephen Allen wrote:
>>
>> On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> On 04/09/12 08:30, Dave Reynolds wrote:
>>>>
>>>>
>>>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>>>
>>>>>
>>>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like
>>>>> to
>>>>> propose we
>>>>>
>>>>>     Remove BulkUpdateHandler interface
>>>>>       Migrate it's few useful operation to Graph.
>>>>>
>>>>>     Start to provide reification with "standard" only.
>>>>>       graph.QueryHandler only used to support reification.
>>>>
>>>>
>>
>> +1 on both of these.  They are annoying to code for, and as you say,
>> force batch triple changes into the client code where it doesn't
>> really belong.
>>
>> How about removing TransactionHandler as well?
>
>
> One step at a time!
>
> TransactionHandler is not quite the right abstraction. It's begin() does not
> indicate read or write intentions and this is reflected in the Model
> transactional interface.
>
> It might be possible to add promotable transactions in TDB but noting
> everywhere a update can occur so read->write if necessary is not trivial.
>
> If another transaction has committed, and the reader is looking at the old
> DB state, then it's not possible (no locks on parts of the DB).  So at the
> point of promotion, the transaction may abort.  Not a nice programming
> paradigm.  It's a side effect of having triples - what has been updated
> where is not coupled to the application data model.  Triple level locking
> struck me as going to be very expensive for not a lot of benefit.
>
> My preference is to change Model.begin (i.e. Model implements Transactional
> after transactional moved to somewhere general).

Agreed.  I think having the user specify READ or WRITE at transaction
begin() is the right way to go.  Promoting transactions is fraught
with all the issues you brought up.


>
> But that will cause certain users "some issues" :-)
>
>
>> Also make Dataset extend Transactional instead of copying the methods?
>
>
> That's merely to isolate Dataset from the internal/graph level Transactional
> interface - they ended up the same operations.
>
> From experience, using abstractions at Graph and at Model (DatasetGraph and
> Dataset) levels can lead to problems renaming things later.  So it has
> started as separate - deciding to merge later is possible.
>
>
>>>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>>>> assumption that a store *might* optimize the updates but at least some
>>>> of those cases are, or could be made, add(Graph) calls.
>>>>
>>>> Presumably this would this be a normal deprecate-then-remove-later
>>>> cycle?
>>>
>>>
>>>
>>> The Model operations remain -
>>>
>>> Model.add(Statement[])
>>> Model.add(StmtIterator iter)
>>> Model.add(List<Statement> statements)
>>>
>>> We could also consider simplification at the Model level- that wasn't on
>>> my
>>> list.
>>>
>>> It's interesting if you're using Graph level in an application - I'd like
>>> to
>>> promote the Graph "SPI" as a more formally API after the cleaning up.
>>>
>>
>> I tend to use the Graph SPI in my application as the objects are
>> immutable.
>
>
> I switch to which ever is most convenient ... but this will all help in
> making the SPI more formally a public API.  It's not far off that today.
>
>
>>> Is that code still in use?
>>>
>>> At the SPI level (Graph API), there is less of a contract on migration.
>>>
>>> The reasoner does use the bulk update handler - but that's in-codebase so
>>> that will just be cleaned up as part of the change.  (Elsewhere in the
>>> main
>>> code base most calls to getBulkUpdateHandler are ... implementations of
>>> getBulkUpdateHandler over another graph!)
>>>
>>> TDB has a bulk update handler to do removeAll and remove(s,p,o) without
>>> the
>>> problems isomorphic to CCME.
>>>
>>> SDB has a bulk update handler but the implementation of removeAll is in
>>> the
>>> graph anyway.
>>>
>>> In executing on this, I'd do at least a local deprecation cycle to track
>>> down and migrate the current code without needing a local big bang.
>>>
>>> Then at least a minor number version change to 2.10.0 would be good when
>>> the
>>> change happens.
>>>
>>> Observation on the deprecate-remove cycle: we know people don't upgrade
>>> incrementally because they don't need to.
>>>
>>> Another issue on the release pipeline is that some users are not testing
>>> the
>>> development builds, only checking after a release.  That's bad for them
>>> and
>>> less than helpful for us.  I don't want the effect of this to be making
>>> work
>>> for the project.  We make the deliverables in exactly the form we release
>>> in
>>> every night so the only difference is location.
>>>
>>>          Andy
>>>
>>>>
>>>> Dave
>>>>
>>>
>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/12 19:13, Stephen Allen wrote:
> On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
>> On 04/09/12 08:30, Dave Reynolds wrote:
>>>
>>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>>
>>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>>> propose we
>>>>
>>>>     Remove BulkUpdateHandler interface
>>>>       Migrate it's few useful operation to Graph.
>>>>
>>>>     Start to provide reification with "standard" only.
>>>>       graph.QueryHandler only used to support reification.
>>>
>
> +1 on both of these.  They are annoying to code for, and as you say,
> force batch triple changes into the client code where it doesn't
> really belong.
>
> How about removing TransactionHandler as well?

One step at a time!

TransactionHandler is not quite the right abstraction. It's begin() does 
not indicate read or write intentions and this is reflected in the Model 
transactional interface.

It might be possible to add promotable transactions in TDB but noting 
everywhere a update can occur so read->write if necessary is not trivial.

If another transaction has committed, and the reader is looking at the 
old DB state, then it's not possible (no locks on parts of the DB).  So 
at the point of promotion, the transaction may abort.  Not a nice 
programming paradigm.  It's a side effect of having triples - what has 
been updated where is not coupled to the application data model.  Triple 
level locking struck me as going to be very expensive for not a lot of 
benefit.

My preference is to change Model.begin (i.e. Model implements 
Transactional after transactional moved to somewhere general).

But that will cause certain users "some issues" :-)

> Also make Dataset extend Transactional instead of copying the methods?

That's merely to isolate Dataset from the internal/graph level 
Transactional interface - they ended up the same operations.

 From experience, using abstractions at Graph and at Model (DatasetGraph 
and Dataset) levels can lead to problems renaming things later.  So it 
has started as separate - deciding to merge later is possible.

>>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>>> assumption that a store *might* optimize the updates but at least some
>>> of those cases are, or could be made, add(Graph) calls.
>>>
>>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>>
>>
>> The Model operations remain -
>>
>> Model.add(Statement[])
>> Model.add(StmtIterator iter)
>> Model.add(List<Statement> statements)
>>
>> We could also consider simplification at the Model level- that wasn't on my
>> list.
>>
>> It's interesting if you're using Graph level in an application - I'd like to
>> promote the Graph "SPI" as a more formally API after the cleaning up.
>>
>
> I tend to use the Graph SPI in my application as the objects are immutable.

I switch to which ever is most convenient ... but this will all help in 
making the SPI more formally a public API.  It's not far off that today.

>> Is that code still in use?
>>
>> At the SPI level (Graph API), there is less of a contract on migration.
>>
>> The reasoner does use the bulk update handler - but that's in-codebase so
>> that will just be cleaned up as part of the change.  (Elsewhere in the main
>> code base most calls to getBulkUpdateHandler are ... implementations of
>> getBulkUpdateHandler over another graph!)
>>
>> TDB has a bulk update handler to do removeAll and remove(s,p,o) without the
>> problems isomorphic to CCME.
>>
>> SDB has a bulk update handler but the implementation of removeAll is in the
>> graph anyway.
>>
>> In executing on this, I'd do at least a local deprecation cycle to track
>> down and migrate the current code without needing a local big bang.
>>
>> Then at least a minor number version change to 2.10.0 would be good when the
>> change happens.
>>
>> Observation on the deprecate-remove cycle: we know people don't upgrade
>> incrementally because they don't need to.
>>
>> Another issue on the release pipeline is that some users are not testing the
>> development builds, only checking after a release.  That's bad for them and
>> less than helpful for us.  I don't want the effect of this to be making work
>> for the project.  We make the deliverables in exactly the form we release in
>> every night so the only difference is location.
>>
>>          Andy
>>
>>>
>>> Dave
>>>
>>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Stephen Allen <sa...@apache.org>.

On Tue, Sep 4, 2012 at 2:21 AM, Andy Seaborne <an...@apache.org> wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>>
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>>
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>> propose we
>>>
>>>    Remove BulkUpdateHandler interface
>>>      Migrate it's few useful operation to Graph.
>>>
>>>    Start to provide reification with "standard" only.
>>>      graph.QueryHandler only used to support reification.
>>

+1 on both of these.  They are annoying to code for, and as you say,
force batch triple changes into the client code where it doesn't
really belong.

How about removing TransactionHandler as well?

Also make Dataset extend Transactional instead of copying the methods?


>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on my
> list.
>
> It's interesting if you're using Graph level in an application - I'd like to
> promote the Graph "SPI" as a more formally API after the cleaning up.
>

I tend to use the Graph SPI in my application as the objects are immutable.


> Is that code still in use?
>
> At the SPI level (Graph API), there is less of a contract on migration.
>
> The reasoner does use the bulk update handler - but that's in-codebase so
> that will just be cleaned up as part of the change.  (Elsewhere in the main
> code base most calls to getBulkUpdateHandler are ... implementations of
> getBulkUpdateHandler over another graph!)
>
> TDB has a bulk update handler to do removeAll and remove(s,p,o) without the
> problems isomorphic to CCME.
>
> SDB has a bulk update handler but the implementation of removeAll is in the
> graph anyway.
>
> In executing on this, I'd do at least a local deprecation cycle to track
> down and migrate the current code without needing a local big bang.
>
> Then at least a minor number version change to 2.10.0 would be good when the
> change happens.
>
> Observation on the deprecate-remove cycle: we know people don't upgrade
> incrementally because they don't need to.
>
> Another issue on the release pipeline is that some users are not testing the
> development builds, only checking after a release.  That's bad for them and
> less than helpful for us.  I don't want the effect of this to be making work
> for the project.  We make the deliverables in exactly the form we release in
> every night so the only difference is location.
>
>         Andy
>
>>
>> Dave
>>
>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Simon Helsen <sh...@ca.ibm.com>.

Guys,

It would make our lives a lot more painful if you don't follow a 
deprecate-then-remove-later cycle for the BulkUpdateHandler interface. We 
have client code programmed against this and in our db2 implementation 
there is also a custom implementation (for obvious reasons - that was 
probably the original motivation for the API in the first place). If you 
remove (or even move) the operations in a big bang, we can't gracefully 
adopt.

Btw, we *always* test unreleased snapshots as much as possible. An absence 
of a deprecate-then-remove-later cycle would make this more difficult. 
Also, there are many separate dependant groups in our organization that we 
cannot possibly ever handle big bang changes. 

Finally, this brings me to a question I have posed before. Jena, so far, 
has not been very clear on what is public API and what is not. I read here 
that the Graph API sits at the SPI level but we have countless clients 
which program against this level as well since it is readily available and 
easy to reach. There is no indication anywhere (e.g. in the javadoc or 
packaging) that these APIs are not public. The first step towards 
improving this situation is to declare, perhaps at the package level, what 
APIs are public and what not so clients know what kind of contract they 
get themselves into

thanks

Simon

From:
Dave Reynolds <da...@gmail.com>
To:
dev@jena.apache.org
Date:
09/04/2012 05:46 AM
Subject:
Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

On 04/09/12 10:21, Andy Seaborne wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like 
to
>>> propose we
>>>
>>>    Remove BulkUpdateHandler interface
>>>      Migrate it's few useful operation to Graph.
>>>
>>>    Start to provide reification with "standard" only.
>>>      graph.QueryHandler only used to support reification.
>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later 
cycle?
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on
> my list.
>
> It's interesting if you're using Graph level in an application - I'd
> like to promote the Graph "SPI" as a more formally API after the
> cleaning up.
>
> Is that code still in use?

I thought it was but having now checked then no, none of my currently 
in-use projects use BulkUpdateHandler directly.

> At the SPI level (Graph API), there is less of a contract on migration.

Agreed. I'm fine with going ahead with the change.

> The reasoner does use the bulk update handler - but that's in-codebase
> so that will just be cleaned up as part of the change.

Sure.

Dave

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Dave Reynolds <da...@gmail.com>.

On 04/09/12 10:21, Andy Seaborne wrote:
> On 04/09/12 08:30, Dave Reynolds wrote:
>> On 03/09/12 18:33, Andy Seaborne wrote:
>>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>>> propose we
>>>
>>>    Remove BulkUpdateHandler interface
>>>      Migrate it's few useful operation to Graph.
>>>
>>>    Start to provide reification with "standard" only.
>>>      graph.QueryHandler only used to support reification.
>>
>> Seems reasonable. I've used BulkUpdateHandler in client code on the
>> assumption that a store *might* optimize the updates but at least some
>> of those cases are, or could be made, add(Graph) calls.
>>
>> Presumably this would this be a normal deprecate-then-remove-later cycle?
>
> The Model operations remain -
>
> Model.add(Statement[])
> Model.add(StmtIterator iter)
> Model.add(List<Statement> statements)
>
> We could also consider simplification at the Model level- that wasn't on
> my list.
>
> It's interesting if you're using Graph level in an application - I'd
> like to promote the Graph "SPI" as a more formally API after the
> cleaning up.
>
> Is that code still in use?

I thought it was but having now checked then no, none of my currently 
in-use projects use BulkUpdateHandler directly.

> At the SPI level (Graph API), there is less of a contract on migration.

Agreed. I'm fine with going ahead with the change.

> The reasoner does use the bulk update handler - but that's in-codebase
> so that will just be cleaned up as part of the change.

Sure.

Dave

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/12 08:30, Dave Reynolds wrote:
> On 03/09/12 18:33, Andy Seaborne wrote:
>> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
>> propose we
>>
>>    Remove BulkUpdateHandler interface
>>      Migrate it's few useful operation to Graph.
>>
>>    Start to provide reification with "standard" only.
>>      graph.QueryHandler only used to support reification.
>
> Seems reasonable. I've used BulkUpdateHandler in client code on the
> assumption that a store *might* optimize the updates but at least some
> of those cases are, or could be made, add(Graph) calls.
>
> Presumably this would this be a normal deprecate-then-remove-later cycle?

The Model operations remain -

Model.add(Statement[])
Model.add(StmtIterator iter)
Model.add(List<Statement> statements)

We could also consider simplification at the Model level- that wasn't on 
my list.

It's interesting if you're using Graph level in an application - I'd 
like to promote the Graph "SPI" as a more formally API after the 
cleaning up.

Is that code still in use?

At the SPI level (Graph API), there is less of a contract on migration.

The reasoner does use the bulk update handler - but that's in-codebase 
so that will just be cleaned up as part of the change.  (Elsewhere in 
the main code base most calls to getBulkUpdateHandler are ... 
implementations of getBulkUpdateHandler over another graph!)

TDB has a bulk update handler to do removeAll and remove(s,p,o) without 
the problems isomorphic to CCME.

SDB has a bulk update handler but the implementation of removeAll is in 
the graph anyway.

In executing on this, I'd do at least a local deprecation cycle to track 
down and migrate the current code without needing a local big bang.

Then at least a minor number version change to 2.10.0 would be good when 
the change happens.

Observation on the deprecate-remove cycle: we know people don't upgrade 
incrementally because they don't need to.

Another issue on the release pipeline is that some users are not testing 
the development builds, only checking after a release.  That's bad for 
them and less than helpful for us.  I don't want the effect of this to 
be making work for the project.  We make the deliverables in exactly the 
form we release in every night so the only difference is location.

	Andy

>
> Dave
>

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Posted by Dave Reynolds <da...@gmail.com>.

On 03/09/12 18:33, Andy Seaborne wrote:
> As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
> propose we
>
>    Remove BulkUpdateHandler interface
>      Migrate it's few useful operation to Graph.
>
>    Start to provide reification with "standard" only.
>      graph.QueryHandler only used to support reification.

Seems reasonable. I've used BulkUpdateHandler in client code on the 
assumption that a store *might* optimize the updates but at least some 
of those cases are, or could be made, add(Graph) calls.

Presumably this would this be a normal deprecate-then-remove-later cycle?

Dave