You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Claude Warren <cl...@xenei.com> on 2017/12/24 10:46:45 UTC

Round Trip blank values and alternatives.

Howdy,

I recently asked a question about consistent blank id values in SPARQL
queries and Andy provided a patch that solves the problem for query
(provided the server consistently calls the blank node by the same ID
across queries).  This is a follow on to that with a slightly different
twist.

As far as I can tell the SPARQL does not specify that blank node IDs in
queries are to match blank node IDs in the data store, but rather that
blank node positions must match (a much tricker matching).

So if we assume that a portion (P) of a graph (G) is stored in a smaller
remote graph and that consistent blank node ids are provided by queries.

if a blank node property is updated/added in the P graph is there a
mechanism that can match the blank node in the G graph so that it can be
updated?

I think that it should be possible using hamming distances to determine
which node in G should be modified but I am not certain that (a) this is
correct and (b) that it can be done with SPARQL.

given quads in graph G

G <s1> <p1> _:1
G <s1> <p1> _:2
G <_:1> <p2> <o1>
G <_:1> <p3> <o2>
G <_:2> <p2> <o3>
G <_:2> <p3> <o4>

and quads in graph P

P <s1> <p1> _:1
P <_:1> <p2> <o1>
P <_:1> <p3> <o3>

where
P <_:1> <p3> <o2>
has been  changed to
P <_:1> <p3> <o3>

If I track the change in P I can locate _:1 in G and update it

But it seem like there are a lot of pit falls here in that changes to _:1
could make it indistinguishable from _:2

Does anyone have any pointers for how this might be resolved?  Is there any
good research in this area and if so what is this research topic called and
where can I find papers talking about it?

Claude
-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Round Trip blank values and alternatives.

Posted by Andy Seaborne <an...@apache.org>.

On 24/12/17 16:46, Claude Warren wrote:
 > Is the use of pseudo URI <_:label> specified in the SPARQL 
recommendation or is it a Fuseki/Jena specific usage?

Specific to Jena - it's not even a legal URI.

 >
 >
 > Section 6.6 of the SPARQL recommendation 
(https://www.w3.org/TR/rdf-concepts/#section-blank-nodes) states:
 >
 > /RDF makes no reference to any internal structure of blank nodes. 
Given two blank nodes, it is *possible* to determine whether or not they 
are the same./
 >
 > (emphasis mine)
 >
 > is the word "possible" a typo in the above sentence?  It seems that 
the preceding sentence would indicate the word should be "*impossible*".

No - the text is correct.

How do you tell any two things apart?  Different names don't prove anything.

They look different or they behave different.

So add a triple with one as subject - does the other one also get that 
triple or not?

They may logically not denote different things when symmetric but that's 
only until the graph changes and also is the *interpretation* of blank 
nodes, not the nodes themselves (which are a syntactic node in a 
graph/datastructure).

_:a :p 1 .
_:b :p 1 .

Two blank nodes - but logically (interpretation) no extra information in 
the domain of discourse over one triple.

     Andy


On 24/12/17 16:54, Andy Seaborne wrote:
> 1/ Use a pseudo URI <_:label> in the query pattern.
> 
> IRI(bnode) generates <_:label>
> 
> (They seem to print as "_:Blabel" in expressions but the label is really 
> "label").
> 
> SELECT (<_:abc> AS ?B) (iri(?B) as ?U) (str(?U) AS ?S) {}
> 
> 
> NodeFunctions.str(Node) has currently-commented-out code for str(bnode) 
> -> label so, if enabled, the query can scan-filter for it.
> 
> 
> 2/ For really detailed tracking, rdf-delta and RDF Patch gives you a 
> stream of changes, and it gives blank node labels.
> 
> JENA-1435 means you can add whatever services to Fuseki that you like. 
> Pushing a locally calculated  RDF patch is one possibility (there's a 
> module in rdf-delta for that) - full, low level set of adds and deletes 
> of triples to a remove dataset.
> 
> That what we use in $job for having a client-side copy of a shared 
> database.  Actually, we do away with the shared database and only keep 
> the change log as the cache is persistent.
> 
>      Andy
> 
> On 24/12/17 10:46, Claude Warren wrote:
>> Howdy,
>>
>> I recently asked a question about consistent blank id values in SPARQL
>> queries and Andy provided a patch that solves the problem for query
>> (provided the server consistently calls the blank node by the same ID
>> across queries).  This is a follow on to that with a slightly different
>> twist.
>>
>> As far as I can tell the SPARQL does not specify that blank node IDs in
>> queries are to match blank node IDs in the data store, but rather that
>> blank node positions must match (a much tricker matching).
>>
>> So if we assume that a portion (P) of a graph (G) is stored in a smaller
>> remote graph and that consistent blank node ids are provided by queries.
>>
>> if a blank node property is updated/added in the P graph is there a
>> mechanism that can match the blank node in the G graph so that it can be
>> updated?
>>
>> I think that it should be possible using hamming distances to determine
>> which node in G should be modified but I am not certain that (a) this is
>> correct and (b) that it can be done with SPARQL.
>>
>> given quads in graph G
>>
>> G <s1> <p1> _:1
>> G <s1> <p1> _:2
>> G <_:1> <p2> <o1>
>> G <_:1> <p3> <o2>
>> G <_:2> <p2> <o3>
>> G <_:2> <p3> <o4>
>>
>> and quads in graph P
>>
>> P <s1> <p1> _:1
>> P <_:1> <p2> <o1>
>> P <_:1> <p3> <o3>
>>
>> where
>> P <_:1> <p3> <o2>
>> has been  changed to
>> P <_:1> <p3> <o3>
>>
>> If I track the change in P I can locate _:1 in G and update it
>>
>> But it seem like there are a lot of pit falls here in that changes to _:1
>> could make it indistinguishable from _:2
>>
>> Does anyone have any pointers for how this might be resolved?  Is 
>> there any
>> good research in this area and if so what is this research topic 
>> called and
>> where can I find papers talking about it?
>>
>> Claude
>>

Re: Round Trip blank values and alternatives.

Posted by Andy Seaborne <an...@apache.org>.

1/ Use a pseudo URI <_:label> in the query pattern.

IRI(bnode) generates <_:label>

(They seem to print as "_:Blabel" in expressions but the label is really 
"label").

SELECT (<_:abc> AS ?B) (iri(?B) as ?U) (str(?U) AS ?S) {}


NodeFunctions.str(Node) has currently-commented-out code for str(bnode) 
-> label so, if enabled, the query can scan-filter for it.


2/ For really detailed tracking, rdf-delta and RDF Patch gives you a 
stream of changes, and it gives blank node labels.

JENA-1435 means you can add whatever services to Fuseki that you like. 
Pushing a locally calculated  RDF patch is one possibility (there's a 
module in rdf-delta for that) - full, low level set of adds and deletes 
of triples to a remove dataset.

That what we use in $job for having a client-side copy of a shared 
database.  Actually, we do away with the shared database and only keep 
the change log as the cache is persistent.

     Andy

On 24/12/17 10:46, Claude Warren wrote:
> Howdy,
> 
> I recently asked a question about consistent blank id values in SPARQL
> queries and Andy provided a patch that solves the problem for query
> (provided the server consistently calls the blank node by the same ID
> across queries).  This is a follow on to that with a slightly different
> twist.
> 
> As far as I can tell the SPARQL does not specify that blank node IDs in
> queries are to match blank node IDs in the data store, but rather that
> blank node positions must match (a much tricker matching).
> 
> So if we assume that a portion (P) of a graph (G) is stored in a smaller
> remote graph and that consistent blank node ids are provided by queries.
> 
> if a blank node property is updated/added in the P graph is there a
> mechanism that can match the blank node in the G graph so that it can be
> updated?
> 
> I think that it should be possible using hamming distances to determine
> which node in G should be modified but I am not certain that (a) this is
> correct and (b) that it can be done with SPARQL.
> 
> given quads in graph G
> 
> G <s1> <p1> _:1
> G <s1> <p1> _:2
> G <_:1> <p2> <o1>
> G <_:1> <p3> <o2>
> G <_:2> <p2> <o3>
> G <_:2> <p3> <o4>
> 
> and quads in graph P
> 
> P <s1> <p1> _:1
> P <_:1> <p2> <o1>
> P <_:1> <p3> <o3>
> 
> where
> P <_:1> <p3> <o2>
> has been  changed to
> P <_:1> <p3> <o3>
> 
> If I track the change in P I can locate _:1 in G and update it
> 
> But it seem like there are a lot of pit falls here in that changes to _:1
> could make it indistinguishable from _:2
> 
> Does anyone have any pointers for how this might be resolved?  Is there any
> good research in this area and if so what is this research topic called and
> where can I find papers talking about it?
> 
> Claude
>