You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Olivier Rossel <ol...@gmail.com> on 2012/08/01 17:50:24 UTC

Re: Basic federation in Jena

> 4/ You can use a subselect to restrict the remote query part:
>
>
> SERVICE <...> {
>    SELECT * {
>    ...
>    } LIMIT 300
> }

I tried this query:
SELECT DISTINCT ?comment WHERE {
SERVICE
<http://api.talis.com/stores/bbc-backstage/services/sparql>
{ ?thCenturyClassicalComposers0
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/class/yago/20thCenturyClassicalComposers>
 }
SERVICE <http://dbpedia.org/sparql> {SELECT
?thCenturyClassicalComposers0 ?comment WHERE {
?thCenturyClassicalComposers0
<http://www.w3.org/2000/01/rdf-schema#comment> ?comment   } }
}

It returns results in a very correct time.

Then I remove ?thCenturyClassicalComposers0 from the sub-SELECT:


SELECT DISTINCT ?comment WHERE {
SERVICE
<http://api.talis.com/stores/bbc-backstage/services/sparql>
{ ?thCenturyClassicalComposers0
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/class/yago/20thCenturyClassicalComposers>
 }
SERVICE <http://dbpedia.org/sparql> {SELECT ?comment WHERE {
?thCenturyClassicalComposers0
<http://www.w3.org/2000/01/rdf-schema#comment> ?comment   } }
}

This query now takes MUCH MUCH longer. And eventually fizzles in a 509
HttpException.

Any idea why the query plan goes so wrong when
?thCenturyClassicalComposers0 is absent of the sub-SELECT.
?

Re: Basic federation in Jena

Posted by Andy Seaborne <an...@apache.org>.
On 01/08/12 16:50, Olivier Rossel wrote:
>> 4/ You can use a subselect to restrict the remote query part:
>>
>>
>> SERVICE <...> {
>>     SELECT * {
>>     ...
>>     } LIMIT 300
>> }
>
> I tried this query:
> SELECT DISTINCT ?comment WHERE {
> SERVICE
> <http://api.talis.com/stores/bbc-backstage/services/sparql>
> { ?thCenturyClassicalComposers0
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://dbpedia.org/class/yago/20thCenturyClassicalComposers>
>   }
> SERVICE <http://dbpedia.org/sparql> {SELECT
> ?thCenturyClassicalComposers0 ?comment WHERE {
> ?thCenturyClassicalComposers0
> <http://www.w3.org/2000/01/rdf-schema#comment> ?comment   } }
> }
>
> It returns results in a very correct time.
>
> Then I remove ?thCenturyClassicalComposers0 from the sub-SELECT:
>
>
> SELECT DISTINCT ?comment WHERE {
> SERVICE
> <http://api.talis.com/stores/bbc-backstage/services/sparql>
> { ?thCenturyClassicalComposers0
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://dbpedia.org/class/yago/20thCenturyClassicalComposers>
>   }
> SERVICE <http://dbpedia.org/sparql> {SELECT ?comment WHERE {
> ?thCenturyClassicalComposers0
> <http://www.w3.org/2000/01/rdf-schema#comment> ?comment   } }
> }
>
> This query now takes MUCH MUCH longer. And eventually fizzles in a 509
> HttpException.
>
> Any idea why the query plan goes so wrong when
> ?thCenturyClassicalComposers0 is absent of the sub-SELECT.
> ?
>

Because in the second query you are joining the intermediate results of

SERVICE 1:
?thCenturyClassicalComposers0

with

SERVICE 2:
?comment

i.e. an unconstrained join which happens to be done inefficiently.

The inner SERVICE/2 ?thCenturyClassicalComposers0 is not the same as one 
in SERVICE/1 if you remove it from the sub-select.

Try looking at it with

http://www.sparql.org/query-validator.html

and set "SPARQL algebra (general optimizations)" and you will see the
?/thCenturyClassicalComposers0 (note the ?/) which is a 
renamed-because-its-hidden variable).

Any chance of readable queries?  A few prefixed perhaps?

	Andy