You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Perry, Kevin" <ke...@etm.at> on 2021/07/13 05:35:53 UTC

Include named graph data when not using a GRAPH clause

Hi!

We're currently extending an application using both Metaphactory and
Blazegraph to also support Jena, or more specifically Fuseki.

In a perfect world, we'd be using the exact same SPARQL queries for
both Fuseki and Blazegraph.
Unfortunately we hit a bit of a snag when it comes to the handling of
the default and named graphs[1].

We are used to not including a GRAPH clause in our queries and
receiving results from all graphs (i.e. both from the default and named
graphs).
With Fuseki, we only get results from either the default graph or any
named graph matching a GRAPH clause.

Is there some *configuration setting* (or maybe something equally
trivial) to have Fuseki behave similarly to Blazegraph in this respect?
At least according to the article[1], there is supposed to be a setting
("[...] Apache Jena offer options to change their default behavior"),
but it unfortunately doesn't go into more detail.

On the long run, we do intend to rework the queries and how we store
data (i.e. putting everything into named graphs) but unfortunately
that's not something we can afford to do right away.

Kevin

[1] https://blog.metaphacts.com/the-default-graph-demystified

Re: Include named graph data when not using a GRAPH clause

Posted by Andy Seaborne <an...@apache.org>.


On 14/07/2021 07:02, Perry, Kevin wrote:
> Thank you, that already helps a lot.
> It is not _quite_ what we had in mind, though.
> 
> While `unionDefaultGraph` makes the default graph a union of all the
> named graphs, we then no longer can query data from the actual default
> graph.
> 
> Assuming we add three people, two of them in named graphs:
>      ex:Alice a foaf:Person
>      GRAPH ex:graph1 { ex:Bob a foaf:Person }
>      GRAPH ex:graph2 { ex:Carl a foaf:Person }
> 
> When we run the query
>      SELECT * WHERE { ?person a foaf:Person }
> 
> without `unionDefaultGraph` we only get `ex:Alice`.
> with `unionDefaultGraph` we get `ex:Bob` and `ex:Carl`.
> 
> We would like to get _all three_ people as a result, i.e. a union of
> the default graph and all the named graphs.

There is no simple configuration way to do that. The default graph is 
not a graph with a hidden name.

It can be accessed as <urn:x-arq:DefaultGraph> but it is not stored like 
that.

unionDefaultGraph is focused on the use of managing data using named 
graph but presenting to query as one graph.

In SPARQL Update, this is the difference between NAMED and ALL in say, 
CLEAR ALL or CLEAR NAMED.

The query

SELECT *
FROM <urn:x-arq:DefaultGraph>
FROM <urn:x-arq:UnionGraph>
WHERE {
   ?s ?p ?o
}

works and this is equivalent to adding to the HTTP request with 
default-graph-uri.

wget -O - 
'http://localhost:3030/ds?default-graph-uri=urn:x-arq:DefaultGraph&&default-graph-uri=urn:x-arq:UnionGraph&query=SELECT 
* { ?s ?p ?o }'

so making the endpoint 
"host?default-graph-uri=...&default-graph-uri=..." which works if the 
calling software library accepts endpoints where there is already a 
query string.

String URL = 
"http://localhost:3030/ds?default-graph-uri=urn:x-arq:DefaultGraph&default-graph-uri=urn:x-arq:UnionGraph";

try ( QueryExecution qExec = QueryExecutionFactory.sparqlService(URL, 
"SELECT * { ?s ?p ?o }") ) {
       QueryExecUtils.executeQuery(qExec);
}

A similar setup can be written in assemblers for TDB1 - this will not 
work for TDB2 because of the way transactions work.

PREFIX fuseki: <http://jena.apache.org/fuseki#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tdb1: <http://jena.hpl.hp.com/2008/tdb#>
PREFIX ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
PREFIX sdb: <http://jena.hpl.hp.com/2007/sdb#>

[] rdf:type fuseki:Server ;
    .

<#tdb> rdf:type fuseki:Service ;
     fuseki:name "ds" ;
     fuseki:endpoint  [ fuseki:operation fuseki:query ; ] ;
     fuseki:dataset <#dataset> ;
.

<#dataset> rdf:type ja:RDFDataset ;
     ja:defaultGraph <#dftGraph>
     .

<#dftGraph> rdf:type ja:UnionModel ;
    ja:subModel <#graph1> ;
    ja:subModel <#graph2> ;
    .

<#graph1> rdf:type tdb1:GraphTDB ;
     tdb1:dataset <#base> ;
     .

<#graph2> rdf:type tdb1:GraphTDB ;
     tdb1:dataset <#base> ;
     tdb1:graphName <urn:x-arq:UnionGraph> ;
     .


<#base> rdf:type    tdb1:DatasetTDB ;
     tdb1:location "DB1" ;
     .

All of these have a loss in performance over tdb:unionDefaultGraph

     Andy

> 
> Kevin
> 
> On Tue, 2021-07-13 at 09:41 +0100, Andy Seaborne wrote:
>> Hi Kevin,
>>
>> The configuration setting you are looking for is "union default graph"
>>
>> With this, the default graph of the dataset is a view of the union of
>> all the named graphs.
>>
>> for TDB2 (TDB1 is similar), the server configuration would include
>> something like:
>>
>> :dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
>>       tdb2:location "DB2" ;
>>       tdb2:unionDefaultGraph true ;
>>       .
>>
>>       Andy
>>
>> https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html
>>
>> On 13/07/2021 06:35, Perry, Kevin wrote:
>>> Hi!
>>>
>>> We're currently extending an application using both Metaphactory and
>>> Blazegraph to also support Jena, or more specifically Fuseki.
>>>
>>> In a perfect world, we'd be using the exact same SPARQL queries for
>>> both Fuseki and Blazegraph.
>>> Unfortunately we hit a bit of a snag when it comes to the handling of
>>> the default and named graphs[1].
>>>
>>> We are used to not including a GRAPH clause in our queries and
>>> receiving results from all graphs (i.e. both from the default and
>>> named
>>> graphs).
>>> With Fuseki, we only get results from either the default graph or any
>>> named graph matching a GRAPH clause.
>>>
>>> Is there some *configuration setting* (or maybe something equally
>>> trivial) to have Fuseki behave similarly to Blazegraph in this
>>> respect?
>>> At least according to the article[1], there is supposed to be a
>>> setting
>>> ("[...] Apache Jena offer options to change their default behavior"),
>>> but it unfortunately doesn't go into more detail.
>>>
>>> On the long run, we do intend to rework the queries and how we store
>>> data (i.e. putting everything into named graphs) but unfortunately
>>> that's not something we can afford to do right away.
>>>
>>> Kevin
>>>
>>> [1]https://blog.metaphacts.com/the-default-graph-demystified
>>>
>

Re: Include named graph data when not using a GRAPH clause

Posted by "Perry, Kevin" <ke...@etm.at>.

Thank you, that already helps a lot.
It is not _quite_ what we had in mind, though.

While `unionDefaultGraph` makes the default graph a union of all the
named graphs, we then no longer can query data from the actual default
graph.

Assuming we add three people, two of them in named graphs:
    ex:Alice a foaf:Person
    GRAPH ex:graph1 { ex:Bob a foaf:Person }
    GRAPH ex:graph2 { ex:Carl a foaf:Person }

When we run the query
    SELECT * WHERE { ?person a foaf:Person }

without `unionDefaultGraph` we only get `ex:Alice`.
with `unionDefaultGraph` we get `ex:Bob` and `ex:Carl`.

We would like to get _all three_ people as a result, i.e. a union of
the default graph and all the named graphs.

Kevin

On Tue, 2021-07-13 at 09:41 +0100, Andy Seaborne wrote:
> Hi Kevin,
> 
> The configuration setting you are looking for is "union default graph"
> 
> With this, the default graph of the dataset is a view of the union of
> all the named graphs.
> 
> for TDB2 (TDB1 is similar), the server configuration would include 
> something like:
> 
> :dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
>      tdb2:location "DB2" ;
>      tdb2:unionDefaultGraph true ;
>      .
> 
>      Andy
> 
> https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html
> 
> On 13/07/2021 06:35, Perry, Kevin wrote:
> > Hi!
> > 
> > We're currently extending an application using both Metaphactory and
> > Blazegraph to also support Jena, or more specifically Fuseki.
> > 
> > In a perfect world, we'd be using the exact same SPARQL queries for
> > both Fuseki and Blazegraph.
> > Unfortunately we hit a bit of a snag when it comes to the handling of
> > the default and named graphs[1].
> > 
> > We are used to not including a GRAPH clause in our queries and
> > receiving results from all graphs (i.e. both from the default and
> > named
> > graphs).
> > With Fuseki, we only get results from either the default graph or any
> > named graph matching a GRAPH clause.
> > 
> > Is there some *configuration setting* (or maybe something equally
> > trivial) to have Fuseki behave similarly to Blazegraph in this
> > respect?
> > At least according to the article[1], there is supposed to be a
> > setting
> > ("[...] Apache Jena offer options to change their default behavior"),
> > but it unfortunately doesn't go into more detail.
> > 
> > On the long run, we do intend to rework the queries and how we store
> > data (i.e. putting everything into named graphs) but unfortunately
> > that's not something we can afford to do right away.
> > 
> > Kevin
> > 
> > [1]https://blog.metaphacts.com/the-default-graph-demystified
> >

Re: Include named graph data when not using a GRAPH clause

Posted by Andy Seaborne <an...@apache.org>.

Hi Kevin,

The configuration setting you are looking for is "union default graph"

With this, the default graph of the dataset is a view of the union of 
all the named graphs.

for TDB2 (TDB1 is similar), the server configuration would include 
something like:

:dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
     tdb2:location "DB2" ;
     tdb2:unionDefaultGraph true ;
     .

     Andy

https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html

On 13/07/2021 06:35, Perry, Kevin wrote:
> Hi!
> 
> We're currently extending an application using both Metaphactory and
> Blazegraph to also support Jena, or more specifically Fuseki.
> 
> In a perfect world, we'd be using the exact same SPARQL queries for
> both Fuseki and Blazegraph.
> Unfortunately we hit a bit of a snag when it comes to the handling of
> the default and named graphs[1].
> 
> We are used to not including a GRAPH clause in our queries and
> receiving results from all graphs (i.e. both from the default and named
> graphs).
> With Fuseki, we only get results from either the default graph or any
> named graph matching a GRAPH clause.
> 
> Is there some *configuration setting* (or maybe something equally
> trivial) to have Fuseki behave similarly to Blazegraph in this respect?
> At least according to the article[1], there is supposed to be a setting
> ("[...] Apache Jena offer options to change their default behavior"),
> but it unfortunately doesn't go into more detail.
> 
> On the long run, we do intend to rework the queries and how we store
> data (i.e. putting everything into named graphs) but unfortunately
> that's not something we can afford to do right away.
> 
> Kevin
> 
> [1] https://blog.metaphacts.com/the-default-graph-demystified
>