You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Chris Tomlinson <ch...@gmail.com> on 2019/05/16 17:35:16 UTC

question about RDFPatch headers

Hi,

We’re building an editing service for our RDF Linked Data Service and are thinking to use at least some of the features of RDFPatch/RDFDelta.

We use named graphs for the various Entities that we model: Works, Persons, Places, Lineages and so on. We are wanting to include in the patch some headers indicating the graphs that are being updated in the patch and the graphs that are created in the patch. We want this information to help the editing service have easy access to this information w/o analyzing the patch and doing other work to discover what’s being created and so on.

At first we thought of using a couple of keywords like, graph and create:

H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
H id … .

but org.seaborne.patch.PatchHeader uses a Map so we can only one H graph … and one H create … in the patch. Two alternatives we’ve considered are to use a String of comma separated graphIds:

H graph "0686c69d-8f89-4496-acb5-744f0157a8db , 3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1” .
H create "b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a , 0157a8db-acb5-4496-8f89-0686c69d744f” .

which is plausible but in some cases the list of graphIds could become quite long and so this could be an issue down the line with very large strings.

A second idea was to add the notion of a preamble to the patch using PS, for preamble start, and PE, for preamble end, which would separate our extensions from the defined RDFPatch structure:

PS .
H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
PE .
H id … .
TX .
…

We would then pre-parse the patch payload up to the PE and submit the remainder to RDFPatch, and so on.

A 3rd possibility is to consider some extension to RDFPatch to use a different signature for the Map in PatchHeader. This seems rather involved.

So we’re asking what approaches others might have taken for this sort of use-case or how best to accommodate this in RDFPatch as is.

Thanks very much,
Chris

Re: question about RDFPatch headers

Posted by Chris Tomlinson <ch...@gmail.com>.

Hi Andy,

We appreciate the ideas. If we go with a linking approach we’ll have to add some more machinery, which is fine.

Thanks again,
Chris


> On May 17, 2019, at 10:40 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> Hi Chris,
> 
> If the "meta" part becomes complicated, it might be better to put a link in the header that goes to another file.  There is balance to be struck between arbitrary structures and simple processing.
> 
> It does make some sense to have a multi-valued patch header.
> 
> (all one line with or without separating comma)
> 
> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
> 
> Having the header as a map makes mixing header entries and reprocessing them work better.  No assumed order creeps in and no confusion about duplicates for things that must be unique (like id). c.f. HTTP headers.
> 
> If you think the meta data is going to get large, then a link to elsewhere may be better for other reasons like using the metadata without needing to access the patch in the log.
> 
>    Andy
> 
> On 16/05/2019 18:35, Chris Tomlinson wrote:
>> Hi,
>> We’re building an editing service for our RDF Linked Data Service and are thinking to use at least some of the features of RDFPatch/RDFDelta.
>> We use named graphs for the various Entities that we model: Works, Persons, Places, Lineages and so on. We are wanting to include in the patch some headers indicating the graphs that are being updated in the patch and the graphs that are created in the patch. We want this information to help the editing service have easy access to this information w/o analyzing the patch and doing other work to discover what’s being created and so on.
>> At first we thought of using a couple of keywords like, graph and create:
>> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
>> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
>> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
>> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
>> H id … .
>> but org.seaborne.patch.PatchHeader uses a Map so we can only one H graph … and one H create … in the patch. Two alternatives we’ve considered are to use a String of comma separated graphIds:
>> H graph "0686c69d-8f89-4496-acb5-744f0157a8db , 3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1” .
>> H create "b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a , 0157a8db-acb5-4496-8f89-0686c69d744f” .
>> which is plausible but in some cases the list of graphIds could become quite long and so this could be an issue down the line with very large strings.
>> A second idea was to add the notion of a preamble to the patch using PS, for preamble start, and PE, for preamble end, which would separate our extensions from the defined RDFPatch structure:
>> PS .
>> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
>> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
>> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
>> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
>> PE .
>> H id … .
>> TX .
>> …
>> We would then pre-parse the patch payload up to the PE and submit the remainder to RDFPatch, and so on.
>> A 3rd possibility is to consider some extension to RDFPatch to use a different signature for the Map in PatchHeader. This seems rather involved.
>> So we’re asking what approaches others might have taken for this sort of use-case or how best to accommodate this in RDFPatch as is.
>> Thanks very much,
>> Chris

Re: question about RDFPatch headers

Posted by Andy Seaborne <an...@apache.org>.

Hi Chris,

If the "meta" part becomes complicated, it might be better to put a link 
in the header that goes to another file.  There is balance to be struck 
between arbitrary structures and simple processing.

It does make some sense to have a multi-valued patch header.

(all one line with or without separating comma)

H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> 
<http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .

Having the header as a map makes mixing header entries and reprocessing 
them work better.  No assumed order creeps in and no confusion about 
duplicates for things that must be unique (like id). c.f. HTTP headers.

If you think the meta data is going to get large, then a link to 
elsewhere may be better for other reasons like using the metadata 
without needing to access the patch in the log.

     Andy

On 16/05/2019 18:35, Chris Tomlinson wrote:
> Hi,
> 
> We’re building an editing service for our RDF Linked Data Service and are thinking to use at least some of the features of RDFPatch/RDFDelta.
> 
> We use named graphs for the various Entities that we model: Works, Persons, Places, Lineages and so on. We are wanting to include in the patch some headers indicating the graphs that are being updated in the patch and the graphs that are created in the patch. We want this information to help the editing service have easy access to this information w/o analyzing the patch and doing other work to discover what’s being created and so on.
> 
> At first we thought of using a couple of keywords like, graph and create:
> 
> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
> H id … .
> 
> but org.seaborne.patch.PatchHeader uses a Map so we can only one H graph … and one H create … in the patch. Two alternatives we’ve considered are to use a String of comma separated graphIds:
> 
> H graph "0686c69d-8f89-4496-acb5-744f0157a8db , 3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1” .
> H create "b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a , 0157a8db-acb5-4496-8f89-0686c69d744f” .
> 
> which is plausible but in some cases the list of graphIds could become quite long and so this could be an issue down the line with very large strings.
> 
> A second idea was to add the notion of a preamble to the patch using PS, for preamble start, and PE, for preamble end, which would separate our extensions from the defined RDFPatch structure:
> 
> PS .
> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
> PE .
> H id … .
> TX .
> …
> 
> We would then pre-parse the patch payload up to the PE and submit the remainder to RDFPatch, and so on.
> 
> A 3rd possibility is to consider some extension to RDFPatch to use a different signature for the Map in PatchHeader. This seems rather involved.
> 
> So we’re asking what approaches others might have taken for this sort of use-case or how best to accommodate this in RDFPatch as is.
> 
> Thanks very much,
> Chris
> 
>