You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Patrick Hoeffel <pa...@issinc.com> on 2015/09/01 20:37:39 UTC

Tripe/Quad/Quint

In looking at Jena source code it is clear that Jena natively supports Triples and Quads throughout the persistence and query layers. I was recently asked if Jena could support the notion of a Quint, and I didn't know the answer. I do see some use of Node[] in various places in the source, suggesting that "beyond-quads" might have been thought about, but it really isn't something you hear much talk of, especially when a simple reification can solve the problem. Really, the only reason to consider a quint over reification is purely for raw performance, and that is precisely the use case that is driving my question. I have a very large graph application with a specific need to extend the triple to contain qualifying information, and from what research I've been able to find, it seems to indicate that a quint will perform better than a reified quad or triple.

Thanks for any insight you may be able to offer,

Patrick Hoeffel
Software Engineer
Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)

Re: Tripe/Quad/Quint

Posted by Stian Soiland-Reyes <st...@apache.org>.

I guess you could also create inference rules for constructing the
reified triple, and that could be applied without direct Jena
modifications. In such an approach the graph would not contain the
actual statement, only its reification (and other statements about the
statement provenance).

That would obviously give a performance hit on any 'normal' queries on
those only-inferrable triples. The inference cache would help if the
graph is small, though.




On 2 September 2015 at 14:37, Andy Seaborne <an...@apache.org> wrote:
> On 01/09/15 19:37, Patrick Hoeffel wrote:
>>
>> In looking at Jena source code it is clear that Jena natively supports
>> Triples and Quads throughout the persistence and query layers. I was
>> recently asked if Jena could support the notion of a Quint, and I didn't
>> know the answer. I do see some use of Node[] in various places in the
>> source, suggesting that "beyond-quads" might have been thought about, but it
>> really isn't something you hear much talk of, especially when a simple
>> reification can solve the problem. Really, the only reason to consider a
>> quint over reification is purely for raw performance, and that is precisely
>> the use case that is driving my question. I have a very large graph
>> application with a specific need to extend the triple to contain qualifying
>> information, and from what research I've been able to find, it seems to
>> indicate that a quint will perform better than a reified quad or triple.
>>
>
> (in addition to Stian's answer - especially the temporal aspects)
>
> "could support" - yes
> "does support" - no
>
> There is no assumption that just because there is a reified statement is in
> the data.
>
> Some systems store a unique id for every triple or quad and that the quad is
> asserted.  Quints usually only have the case where it is in the data as well
> reified.  If used with time, that gets very messy.
>
> And what happens is that quad is added twice? Buy different people? Which
> provenance applies if there is only one quint.
>
> Another thing about reification is that it can be partial: the data only
> contains
>
> <#reif1> rdf:subject <http://example/r> .
> <#reif1> rdf:predicate <http://example/prop> .
>
> and is used to say someone made a statement but you don't know the object
> (yet?).
>
> In the past, Jena has had some quite complicated reification support. Being
> unusual and not used much, it was costly in terms of dev overhead and
> performance and storage space.  Hence, removing from the core system.
>
> What could be done is a specialised (compact) tables for complete
> reifications but that isn't in the codebase.
>
> (early SPARQL grammars even had syntax for reificiation!)
>
>         Andy
>
>
>
>
>
>> Thanks for any insight you may be able to offer,
>>
>> Patrick Hoeffel
>> Software Engineer
>> Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>>
>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Tripe/Quad/Quint

Posted by Andy Seaborne <an...@apache.org>.

On 01/09/15 19:37, Patrick Hoeffel wrote:
> In looking at Jena source code it is clear that Jena natively supports Triples and Quads throughout the persistence and query layers. I was recently asked if Jena could support the notion of a Quint, and I didn't know the answer. I do see some use of Node[] in various places in the source, suggesting that "beyond-quads" might have been thought about, but it really isn't something you hear much talk of, especially when a simple reification can solve the problem. Really, the only reason to consider a quint over reification is purely for raw performance, and that is precisely the use case that is driving my question. I have a very large graph application with a specific need to extend the triple to contain qualifying information, and from what research I've been able to find, it seems to indicate that a quint will perform better than a reified quad or triple.
>

(in addition to Stian's answer - especially the temporal aspects)

"could support" - yes
"does support" - no

There is no assumption that just because there is a reified statement is 
in the data.

Some systems store a unique id for every triple or quad and that the 
quad is asserted.  Quints usually only have the case where it is in the 
data as well reified.  If used with time, that gets very messy.

And what happens is that quad is added twice? Buy different people? 
Which provenance applies if there is only one quint.

Another thing about reification is that it can be partial: the data only 
contains

<#reif1> rdf:subject <http://example/r> .
<#reif1> rdf:predicate <http://example/prop> .

and is used to say someone made a statement but you don't know the 
object (yet?).

In the past, Jena has had some quite complicated reification support. 
Being unusual and not used much, it was costly in terms of dev overhead 
and performance and storage space.  Hence, removing from the core system.

What could be done is a specialised (compact) tables for complete 
reifications but that isn't in the codebase.

(early SPARQL grammars even had syntax for reificiation!)

	Andy

> Thanks for any insight you may be able to offer,
>
> Patrick Hoeffel
> Software Engineer
> Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>
>

Re: Tripe/Quad/Quint

Posted by Stian Soiland-Reyes <st...@apache.org>.

You can have multiple TDB stores, each holding Quads. So in that sense
you can have Quints. However to query across them you would have to do
a federated query, which sounds overkill.

Unless you start mixing in permissions and privacy (which can also be
achieved at graph level) then I don't see a big reason for needing
Quints.

For storing qualifying information, would a Nanopub structure work
well? http://nanopub.org/wordpress/?page_id=65

In nanopublications you would have 4 graphs:

In the main graph (or whatever is your graph of choice), we declare a
Nanopublication:

@prefix nanopub: <http://www.nanopub.org/nschema#> .

<nanopub1> a nanopub:Nanopublication ;
    nanopub:hasAssertion <assertion1> ;
    nanopub:hasProvenance <provenance1> ;
    nanopub:hasPublicationInfo <publication1> .

In the Assertion graph <assertion1> - the thing that has been stated/claimed:

  <http://dbpedia.org/resource/Coffee> :causes
<http://dbpedia.org/resource/Thirst> .

Obviously as a separate graph you can extend this to multiple triples.

In the Provenance graph <provenance1>, you can say things like who
made the claim, e.g. the provenance of <assertion1>:

  <assertion1> prov:wasAttributedTo <http://orcid.org/0000-0002-5711-4872> ;
       pav:createdAt "2014-12-02T15:00:00Z"^^xsd:dateTime .
  <http://orcid.org/0000-0002-5711-4872> foaf:name "Alasdair J G Gray"

In this provenance it does not matter how <assertion1> was made, e.g.
Alasdair could simply have said it out loud in a coffee shop.
(prov:location to the rescue!) Nanopublications are commonly used to
structure assertions in RDF that are otherwise not structured, e.g.
from textual claims in a medical journal paper.

In the Publication graph <publication1> you provide the metadata about
the nanopublication itself, e.g. it could have been made by an
automatic annotation software, and then curated by Stian.

  <nanopub1> pav:curatededBy <http://orcid.org/0000-0001-9842-9718> ;
    pav:createdWith <https://github.com/stain/superannnotator> ;
    pav:createdAt "2015-12-02T15:00:00Z"^^xsd:dateTime .

  <http://orcid.org/0000-0001-9842-9718> foaf:name "Stian Soiland-Reyes" .

In this way it's clean separation of what was said, who said it where,
and how this description as RDF was made. E.g. I won't claim that
Alasdair said his name was "Alasdair J G Gray" - and we know the
utterance was done in 2014 even though the RDF was made in 2015.

Note that nanopub.org examples has not been updated in a while, and
uses an outdated prefix for pav, the correct namespace should be
<http://purl.org/pav/>

On 1 September 2015 at 19:37, Patrick Hoeffel
<pa...@issinc.com> wrote:
> In looking at Jena source code it is clear that Jena natively supports Triples and Quads throughout the persistence and query layers. I was recently asked if Jena could support the notion of a Quint, and I didn't know the answer. I do see some use of Node[] in various places in the source, suggesting that "beyond-quads" might have been thought about, but it really isn't something you hear much talk of, especially when a simple reification can solve the problem. Really, the only reason to consider a quint over reification is purely for raw performance, and that is precisely the use case that is driving my question. I have a very large graph application with a specific need to extend the triple to contain qualifying information, and from what research I've been able to find, it seems to indicate that a quint will perform better than a reified quad or triple.
>
> Thanks for any insight you may be able to offer,
>
> Patrick Hoeffel
> Software Engineer
> Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>

-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718