You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commonsrdf.apache.org by Andy Seaborne <an...@apache.org> on 2015/03/30 21:46:32 UTC

Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

 > -	void add(Triple triple);
 > +	Triple add(Triple triple);

Returning something from add() does not work for me.

(And even if it did, I would have expected "boolean" as to whether the 
triple was actually added or not.)

1/ I don't understand "possibly mapping" being in the API.  A triple is 
a triple. What is the client supposed to do differently with .equals one 
returned (it is .equals?)

2/ I am finding that having a flow back from add operations difficult to 
deal with.

A sequence of add(Triple) can be batched up and only need to be 
performed before another operation is called that can observe the change.

In the case a remote destination, the overhead per add is significant 
(network rounds).  But if it is delayed, then the return of anything is 
not available.

For a general interface, this should be

     void add(Triple)

	Andy



Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

Posted by Stian Soiland-Reyes <st...@apache.org>.
Yes, a Triple lookup would work fine just like that - but I am not
sure what this is useful for (unless you want to cast it to an
implementation subclass that provides additional methods).



So that only works if it is OK to query with the previously inserted
BlankNodes - which I would have hoped for to work, but which I can see
can get tricky if those BlankNodes came from a different
implementation and needs to be 'mapped' somehow.


Let's say there is an independent parser that produces Triples (using
simple.* instances) for

_:b1 foaf:knows _:b2 .
_:b2 foaf:knows _:b1 .

which you then add to a Graph from a secondary implementation, which
has its own BlankNode, Triple etc.

It might do its own BlankNode substitution on insert, e.g. like Jena
TDB who uses 64-bits integer to map to its disk-based indexes.
Obviously this would have to consistently map both statements to the
same substituted b1 and b2.

The substituted b1 and b2 (if you looked them up through getTriples)
might or might not have the same internalIdentifier() as the originals
- I don't think we require either now.


Now at a later stage you want add an additional statement about _:b1 -
I assume it would be OK to simply keep the original _:b1 instance in
memory and re-use that?

As we don't have a graph.addAll(Set<Triples>) or any kind of
'transaction' - then the Graph would have to support both of the above
and really can't tell which one it is.


Now with getTriples() I should presumably get back (in some form) the
expected triples I just added if I query/filter with that original b1
instance, which would be included in the results for statements with
predicate foaf:knows.

Is the original b1 .equal() to the substituted b1 instance returned
from that query? One should expect so - otherwise how did it pass the
filter?

Triple.equals() is defined from its constituent parts.


Now if all of the above is true - then why do you need to look up the
substituted Triple instance?


On 31 March 2015 at 12:02, Andy Seaborne <an...@apache.org> wrote:
> On 31/03/15 11:43, Stian Soiland-Reyes wrote:
>>
>> It might be that the Graph needs a method to look up blank nodes by
>> some measure, without having to query for a big graph pattern of what
>> was just inserted.
>
>
> You mean shorthand for
>
> graph
>   .getTriples(t.getSubject(), t.getPrediate(), t.getObject())
>   .findFirst.get() ;
>
> as the 3 elements are grounded, I'd expect that to be very fast.
>
>
>>
>> So now if you do
>>
>> String s = blankNode.internalIdentifier();
>> graph.add(blankNode, p, blankNode)
>>
>> and look it up again - than do I understand the contract correctly in
>> that Graph is not required to return the same internalIdentifier()?
>>
>> If I get those BlankNodes out again - are they still .equal() to the
>> inserted blankNode?
>>
>>
>> On 31 March 2015 at 10:52, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> On 30/03/15 23:10, Peter Ansell wrote:
>>>>
>>>>
>>>> All of the BlankNode's in the Triple that was added may be internally
>>>> remapped during the add operation. Hence, the .equals will change when
>>>> the remapping occurs. Returning the mapped values enables a user to
>>>> get access to that information.
>>>>
>>>> Removing the return value is fine with me if it can't be supported in
>>>> a performant way. It just removes the simple way for a user to know
>>>> what BlankNode remapping occurred.
>>>
>>>
>>>
>>> Peter,
>>>
>>> Great - and I see the change in git.
>>>
>>> A simpler example would have been a parser not wanting to use the return
>>> from add() and an implementation that does not store Triple objects, but
>>> has
>>> data structures using the subject/predicate/object directly. Returned
>>> triples are just object churn albeit probably quite efficient churn.
>>>
>>> (hmm - interesting - so once mapped, the mapping is stable.)
>>>
>>>          Andy
>>>
>>>
>>>>
>>>> On 31 March 2015 at 06:46, Andy Seaborne <an...@apache.org> wrote:
>>>>>>
>>>>>>
>>>>>> -     void add(Triple triple);
>>>>>> +     Triple add(Triple triple);
>>>>>
>>>>>
>>>>>
>>>>> Returning something from add() does not work for me.
>>>>>
>>>>> (And even if it did, I would have expected "boolean" as to whether the
>>>>> triple was actually added or not.)
>>>>>
>>>>> 1/ I don't understand "possibly mapping" being in the API.  A triple is
>>>>> a
>>>>> triple. What is the client supposed to do differently with .equals one
>>>>> returned (it is .equals?)
>>>>>
>>>>> 2/ I am finding that having a flow back from add operations difficult
>>>>> to
>>>>> deal with.
>>>>>
>>>>> A sequence of add(Triple) can be batched up and only need to be
>>>>> performed
>>>>> before another operation is called that can observe the change.
>>>>>
>>>>> In the case a remote destination, the overhead per add is significant
>>>>> (network rounds).  But if it is delayed, then the return of anything is
>>>>> not
>>>>> available.
>>>>>
>>>>> For a general interface, this should be
>>>>>
>>>>>       void add(Triple)
>>>>>
>>>>>           Andy
>>>>>
>>>>>
>>>
>>
>>
>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

Posted by Andy Seaborne <an...@apache.org>.
On 31/03/15 11:43, Stian Soiland-Reyes wrote:
> It might be that the Graph needs a method to look up blank nodes by
> some measure, without having to query for a big graph pattern of what
> was just inserted.

You mean shorthand for

graph
   .getTriples(t.getSubject(), t.getPrediate(), t.getObject())
   .findFirst.get() ;

as the 3 elements are grounded, I'd expect that to be very fast.

>
> So now if you do
>
> String s = blankNode.internalIdentifier();
> graph.add(blankNode, p, blankNode)
>
> and look it up again - than do I understand the contract correctly in
> that Graph is not required to return the same internalIdentifier()?
>
> If I get those BlankNodes out again - are they still .equal() to the
> inserted blankNode?
>
>
> On 31 March 2015 at 10:52, Andy Seaborne <an...@apache.org> wrote:
>> On 30/03/15 23:10, Peter Ansell wrote:
>>>
>>> All of the BlankNode's in the Triple that was added may be internally
>>> remapped during the add operation. Hence, the .equals will change when
>>> the remapping occurs. Returning the mapped values enables a user to
>>> get access to that information.
>>>
>>> Removing the return value is fine with me if it can't be supported in
>>> a performant way. It just removes the simple way for a user to know
>>> what BlankNode remapping occurred.
>>
>>
>> Peter,
>>
>> Great - and I see the change in git.
>>
>> A simpler example would have been a parser not wanting to use the return
>> from add() and an implementation that does not store Triple objects, but has
>> data structures using the subject/predicate/object directly. Returned
>> triples are just object churn albeit probably quite efficient churn.
>>
>> (hmm - interesting - so once mapped, the mapping is stable.)
>>
>>          Andy
>>
>>
>>>
>>> On 31 March 2015 at 06:46, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>> -     void add(Triple triple);
>>>>> +     Triple add(Triple triple);
>>>>
>>>>
>>>> Returning something from add() does not work for me.
>>>>
>>>> (And even if it did, I would have expected "boolean" as to whether the
>>>> triple was actually added or not.)
>>>>
>>>> 1/ I don't understand "possibly mapping" being in the API.  A triple is a
>>>> triple. What is the client supposed to do differently with .equals one
>>>> returned (it is .equals?)
>>>>
>>>> 2/ I am finding that having a flow back from add operations difficult to
>>>> deal with.
>>>>
>>>> A sequence of add(Triple) can be batched up and only need to be performed
>>>> before another operation is called that can observe the change.
>>>>
>>>> In the case a remote destination, the overhead per add is significant
>>>> (network rounds).  But if it is delayed, then the return of anything is
>>>> not
>>>> available.
>>>>
>>>> For a general interface, this should be
>>>>
>>>>       void add(Triple)
>>>>
>>>>           Andy
>>>>
>>>>
>>
>
>
>


Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

Posted by Stian Soiland-Reyes <st...@apache.org>.
It might be that the Graph needs a method to look up blank nodes by
some measure, without having to query for a big graph pattern of what
was just inserted.

So now if you do

String s = blankNode.internalIdentifier();
graph.add(blankNode, p, blankNode)

and look it up again - than do I understand the contract correctly in
that Graph is not required to return the same internalIdentifier()?

If I get those BlankNodes out again - are they still .equal() to the
inserted blankNode?


On 31 March 2015 at 10:52, Andy Seaborne <an...@apache.org> wrote:
> On 30/03/15 23:10, Peter Ansell wrote:
>>
>> All of the BlankNode's in the Triple that was added may be internally
>> remapped during the add operation. Hence, the .equals will change when
>> the remapping occurs. Returning the mapped values enables a user to
>> get access to that information.
>>
>> Removing the return value is fine with me if it can't be supported in
>> a performant way. It just removes the simple way for a user to know
>> what BlankNode remapping occurred.
>
>
> Peter,
>
> Great - and I see the change in git.
>
> A simpler example would have been a parser not wanting to use the return
> from add() and an implementation that does not store Triple objects, but has
> data structures using the subject/predicate/object directly. Returned
> triples are just object churn albeit probably quite efficient churn.
>
> (hmm - interesting - so once mapped, the mapping is stable.)
>
>         Andy
>
>
>>
>> On 31 March 2015 at 06:46, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> -     void add(Triple triple);
>>>> +     Triple add(Triple triple);
>>>
>>>
>>> Returning something from add() does not work for me.
>>>
>>> (And even if it did, I would have expected "boolean" as to whether the
>>> triple was actually added or not.)
>>>
>>> 1/ I don't understand "possibly mapping" being in the API.  A triple is a
>>> triple. What is the client supposed to do differently with .equals one
>>> returned (it is .equals?)
>>>
>>> 2/ I am finding that having a flow back from add operations difficult to
>>> deal with.
>>>
>>> A sequence of add(Triple) can be batched up and only need to be performed
>>> before another operation is called that can observe the change.
>>>
>>> In the case a remote destination, the overhead per add is significant
>>> (network rounds).  But if it is delayed, then the return of anything is
>>> not
>>> available.
>>>
>>> For a general interface, this should be
>>>
>>>      void add(Triple)
>>>
>>>          Andy
>>>
>>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

Posted by Andy Seaborne <an...@apache.org>.
On 30/03/15 23:10, Peter Ansell wrote:
> All of the BlankNode's in the Triple that was added may be internally
> remapped during the add operation. Hence, the .equals will change when
> the remapping occurs. Returning the mapped values enables a user to
> get access to that information.
>
> Removing the return value is fine with me if it can't be supported in
> a performant way. It just removes the simple way for a user to know
> what BlankNode remapping occurred.

Peter,

Great - and I see the change in git.

A simpler example would have been a parser not wanting to use the return 
from add() and an implementation that does not store Triple objects, but 
has data structures using the subject/predicate/object directly. 
Returned triples are just object churn albeit probably quite efficient 
churn.

(hmm - interesting - so once mapped, the mapping is stable.)

	Andy

>
> On 31 March 2015 at 06:46, Andy Seaborne <an...@apache.org> wrote:
>>> -     void add(Triple triple);
>>> +     Triple add(Triple triple);
>>
>> Returning something from add() does not work for me.
>>
>> (And even if it did, I would have expected "boolean" as to whether the
>> triple was actually added or not.)
>>
>> 1/ I don't understand "possibly mapping" being in the API.  A triple is a
>> triple. What is the client supposed to do differently with .equals one
>> returned (it is .equals?)
>>
>> 2/ I am finding that having a flow back from add operations difficult to
>> deal with.
>>
>> A sequence of add(Triple) can be batched up and only need to be performed
>> before another operation is called that can observe the change.
>>
>> In the case a remote destination, the overhead per add is significant
>> (network rounds).  But if it is delayed, then the return of anything is not
>> available.
>>
>> For a general interface, this should be
>>
>>      void add(Triple)
>>
>>          Andy
>>
>>


Re: [14/18] incubator-commonsrdf git commit: Return the Triple that was actually added to Graph, including any mapped components as necessary

Posted by Peter Ansell <an...@gmail.com>.
All of the BlankNode's in the Triple that was added may be internally
remapped during the add operation. Hence, the .equals will change when
the remapping occurs. Returning the mapped values enables a user to
get access to that information.

Removing the return value is fine with me if it can't be supported in
a performant way. It just removes the simple way for a user to know
what BlankNode remapping occurred.

On 31 March 2015 at 06:46, Andy Seaborne <an...@apache.org> wrote:
>> -     void add(Triple triple);
>> +     Triple add(Triple triple);
>
> Returning something from add() does not work for me.
>
> (And even if it did, I would have expected "boolean" as to whether the
> triple was actually added or not.)
>
> 1/ I don't understand "possibly mapping" being in the API.  A triple is a
> triple. What is the client supposed to do differently with .equals one
> returned (it is .equals?)
>
> 2/ I am finding that having a flow back from add operations difficult to
> deal with.
>
> A sequence of add(Triple) can be batched up and only need to be performed
> before another operation is called that can observe the change.
>
> In the case a remote destination, the overhead per add is significant
> (network rounds).  But if it is delayed, then the return of anything is not
> available.
>
> For a general interface, this should be
>
>     void add(Triple)
>
>         Andy
>
>