You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Claude Warren <cl...@xenei.com> on 2017/12/26 18:17:54 UTC

RDF Diff/patch

Howdy,

I am working on a tool that can create UpdateRequests that will convert one
Dataset into another.

The basic idea is to extract the quads sorted by (g,s,p,o) and then perform
a diff on the lists (like a text diff but each quad is a "line").

The result is that I can create statements to delete insert and delete one
dataset to make it "identical" to the other.  Identical in this case means
that each model in the two datasets are isomorphic.

Is anyone else interested in this?

Claude

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: RDF Diff/patch

Posted by Claude Warren <cl...@xenei.com>.
Currently I am using https://github.com/Claudenw/java-diff-utils (forked
from https://github.com/dnaumenko/java-diff-utils -- no changes yet).

I start with the assumption that the datastore will always produce the same
ID for the blank node across queries.  I assume they will change if deleted
and reinserted but as long as there is no change I assume they are the same
id.  If that assumption does not hold the diff probably won't work
correctly.

I basically perform a query against the 2 datasets to producer ordered
g,s,p,o quads.

I feed the results into diff/patch routine.

Currently if the blank nodes have different ids they would be deleted and
reinserted in the first case and just one deleted in the second case.

The code is at https://github.com/Claudenw/rdf-diff-patch (sorry Andy got
"rdf" and "patch" in the name -- I'll change it if I can find another good
descriptor -- alternatively, we might be able to generate RDF-patch format
output).

Use PatchFactory to create the patch object and UpdateFactory to create the
UpdateRequest.

This code does need the recent fixes for jena-querybuilder 3.7.0-SNAPSHOT.

I have only been working on this for a couple of days and there are several
places to improve it.


   1. I think the diff/patch routine has some equality plugin points that
   might make matching different blank node ids within a graph possible in the
   diff processing.
   2. Since the patch generated by java-diff-utils would have both the
   delete and the insert quads it should be possible to create models for each
   named graph in the quad list, perform some queries against them to remove
   any blank nodes that are the "same" (your choice of definition for "same")
   and perform mapping between old and new node ids.

There are lots of edge cases to explore here.

Claude


On Wed, Dec 27, 2017 at 4:26 PM, ajs6f <aj...@apache.org> wrote:

> I'm curious too, Claude. Is the idea that one assumes that bnodes are
> already using the same pool of labels, or something like that? IOW, if I
> have dataset1:
>
> _:a a my:type .
> _:b a my:type .
>
> and dataset2:
>
> _:c a my:type .
>
> and I want to convert dataset1 into dataset2, will your algorithm delete
> both triples and add a new one, or just remove a triple, and if so, is that
> deterministic? If dataset2 is instead:
>
> _:a a my:type .
>
> will the algorithm only remove one triple and be done, or remove both and
> add a new one?
>
> ajs6f
>
> > On Dec 27, 2017, at 11:00 AM, Andy Seaborne <an...@apache.org> wrote:
> >
> > It would be interesting to see especially the handling of blank nodes
> cycles and other structures.
> >
> > Please don't call it "RDF Patch" or a names similar to that - that term
> is already used.
> >
> >    Andy
> >
> > On 26/12/17 18:17, Claude Warren wrote:
> >> Howdy,
> >> I am working on a tool that can create UpdateRequests that will convert
> one
> >> Dataset into another.
> >> The basic idea is to extract the quads sorted by (g,s,p,o) and then
> perform
> >> a diff on the lists (like a text diff but each quad is a "line").
> >> The result is that I can create statements to delete insert and delete
> one
> >> dataset to make it "identical" to the other.  Identical in this case
> means
> >> that each model in the two datasets are isomorphic.
> >> Is anyone else interested in this?
> >> Claude
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: RDF Diff/patch

Posted by ajs6f <aj...@apache.org>.
I'm curious too, Claude. Is the idea that one assumes that bnodes are already using the same pool of labels, or something like that? IOW, if I have dataset1:

_:a a my:type .
_:b a my:type .

and dataset2:

_:c a my:type .

and I want to convert dataset1 into dataset2, will your algorithm delete both triples and add a new one, or just remove a triple, and if so, is that deterministic? If dataset2 is instead:

_:a a my:type .

will the algorithm only remove one triple and be done, or remove both and add a new one?

ajs6f

> On Dec 27, 2017, at 11:00 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> It would be interesting to see especially the handling of blank nodes cycles and other structures.
> 
> Please don't call it "RDF Patch" or a names similar to that - that term is already used.
> 
>    Andy
> 
> On 26/12/17 18:17, Claude Warren wrote:
>> Howdy,
>> I am working on a tool that can create UpdateRequests that will convert one
>> Dataset into another.
>> The basic idea is to extract the quads sorted by (g,s,p,o) and then perform
>> a diff on the lists (like a text diff but each quad is a "line").
>> The result is that I can create statements to delete insert and delete one
>> dataset to make it "identical" to the other.  Identical in this case means
>> that each model in the two datasets are isomorphic.
>> Is anyone else interested in this?
>> Claude


Re: RDF Diff/patch

Posted by Andy Seaborne <an...@apache.org>.
It would be interesting to see especially the handling of blank nodes 
cycles and other structures.

Please don't call it "RDF Patch" or a names similar to that - that term 
is already used.

     Andy

On 26/12/17 18:17, Claude Warren wrote:
> Howdy,
> 
> I am working on a tool that can create UpdateRequests that will convert one
> Dataset into another.
> 
> The basic idea is to extract the quads sorted by (g,s,p,o) and then perform
> a diff on the lists (like a text diff but each quad is a "line").
> 
> The result is that I can create statements to delete insert and delete one
> dataset to make it "identical" to the other.  Identical in this case means
> that each model in the two datasets are isomorphic.
> 
> Is anyone else interested in this?
> 
> Claude
>