You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Rakebul HASAN <ha...@inria.fr> on 2014/02/04 17:26:11 UTC

Federated queries with bounded blank nodes

Hi,
I have been experimenting with federated SPARQL queries and I ran into a possible bug.

I am using Fuseki 1.0.0. I have two SPARQL endpoints.

The first endpoint contains data about books (modified from the book example from Fuseki Data directory). Endpoint: http://localhost:3030/books/query 

@prefix dc:    <http://purl.org/dc/elements/1.1/> .
@prefix :      <http://example.org/book/> .
@prefix ns:    <http://example.org/ns#> .
@prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .

:book5  dc:creator  "J.K. Rowling" ;
        dc:title    "Harry Potter and the Order of the Phoenix" .

:book3  dc:creator  _:b0 ;
        dc:title    "Harry Potter and the Prisoner Of Azkaban" .

:book8  dc:creator  <http://www-sop.inria.fr/members/Alice> ;
        dc:title    "Distributed Query Processing for Linked Data" .

:book1  dc:creator  "J.K. Rowling" ;
        dc:title    "Harry Potter and the Philosopher's Stone" .

:book6  dc:creator  "J.K. Rowling" ;
        dc:title    "Harry Potter and the Half-Blood Prince" .

:book4  dc:title  "Harry Potter and the Goblet of Fire" .

_:b0    vcard:FN  "J.K. Rowling" ;
        vcard:N   [ vcard:Family  "Rowling" ;
                    vcard:Given   "Joanna"
                  ] .

:book2  dc:creator  _:b0 ;
        dc:title    "Harry Potter and the Chamber of Secrets" .

:book7  dc:creator  "J.K. Rowling" ;
        dc:title    "Harry Potter and the Deathly Hallows" .


The second endpoint contains data about people: Endpoint: ocalhost:3031/persons/query

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix cert:  <http://www.w3.org/ns/auth/cert#> .
@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://www-sop.inria.fr/members/Charlie>
        a           foaf:Person ;
        foaf:knows  <http://www-sop.inria.fr/members/Alice> , <http://www-sop.inria.fr/members/Bob> ;
        foaf:name   "Charlie" .

<http://www-sop.inria.fr/members/Alice>
        a           foaf:Person ;
        foaf:knows  <http://www-sop.inria.fr/members/Charlie> , <http://www-sop.inria.fr/members/Bob> ;
        foaf:name   "Alice" .

<http://www-sop.inria.fr/members/Bob>
        a           foaf:Person ;
        foaf:knows  <http://www-sop.inria.fr/members/Charlie> , <http://www-sop.inria.fr/members/Alice> ;
        foaf:name   "Bob" .


I am trying to run a federated SPARQL query with the SERVICE option to find the books for which the author’s names are specified in the second endpoint. The query looks like this:

PREFIX : <http://example/>
PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * where
{
  ?book dc:title ?title .
  ?book dc:creator ?author.
 
  SERVICE <http://localhost:3031/persons/query>
     { ?author foaf:name ?name  }

}

While this query should return only one result 

<http://example.org/book/book8> "Distributed Query Processing for Linked Data" <http://www-sop.inria.fr/members/Alice>  "Alice"   


It incorrectly returns:


------------------------------------------------------------------------------------------------------------------------------------------
| book                            | title                                          | author                                  | name      |
==========================================================================================================================================
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Bob"     |
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Alice"   |
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Charlie" |
| <http://example.org/book/book8> | "Distributed Query Processing for Linked Data" | <http://www-sop.inria.fr/members/Alice> | "Alice"   |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Bob"     |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Alice"   |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Charlie" |
------------------------------------------------------------------------------------------------------------------------------------------

This problem occurs because subqueries with each variable bindings from the first endpoint are sent to the second endpoint, resulting the following subqueries for the second endpoint:

SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { <http://www-sop.inria.fr/members/Alice> <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name }

(taken from fuseki log)


The problem is because of the subqueries:
SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }

As blank nodes in this case are treated like variables, this subquery in the second endpoint returns Bob, Alice, and Charlie:

-------------
| name      |
=============
| "Bob"     |
| "Alice"   |
| "Charlie" |
-------------
Then this is joined with the first part of the query and produces the incorrect results below:
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Bob"     |
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Alice"   |
| <http://example.org/book/book2> | "Harry Potter and the Chamber of Secrets"      | _:b0                                    | "Charlie" |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Bob"     |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Alice"   |
| <http://example.org/book/book3> | "Harry Potter and the Prisoner Of Azkaban"     | _:b0                                    | "Charlie" |

I think the bounded blank nodes from the first endpoint should not have been sent to second endpoint in the subqueries. The scope of blank nodes is local and this should be considered also in this case.


Finally, if I change the oder of the service part in the query, it does return the correct result.

PREFIX : <http://example/>
PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * where
{
 
  SERVICE <http://localhost:3031/persons/query>
     { ?author foaf:name ?name  }
  ?book dc:title ?title .
  ?book dc:creator ?author.

}


----------------------------------------------------------------------------------------------------------------------------------------
| author                                  | name    | book                            | title                                          |
========================================================================================================================================
| <http://www-sop.inria.fr/members/Alice> | "Alice" | <http://example.org/book/book8> | "Distributed Query Processing for Linked Data" |
----------------------------------------------------------------------------------------------------------------------------------------



I was wondering if it is a bug or if I am missing something here.


Best regards,
Rakebul Hasan


Re: Federated queries with bounded blank nodes

Posted by Paul Gearon <ge...@ieee.org>.
Going on the Federated Query document, at the end of section 4:

"Note that blank nodes are unique to any document which serializes them.
Also, SERVICE calls depend on the SPARQL Protocol [SPROT] which transfers
serialized RDF documents making blank nodes unique between service calls."

http://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/#variableService

So your reported behavior does indeed look (to me) like a bug.

A workaround is to modify the original query to:

PREFIX.....
SELECT * where
{
  ?book dc:title ?title .
  ?book dc:creator ?author .
  FILTER (!isBlank(?author)) .
  SERVICE <http://localhost:3031/persons/query>
     { ?author foaf:name ?name  }
}

In this case isIRI could be used instead of !isBlank. However, the latter
is more general for variables being passed through to a service, and should
be the basis for the internal fix.

Regards,
Paul


On Tue, Feb 4, 2014 at 11:26 AM, Rakebul HASAN <ha...@inria.fr>wrote:

>
> Hi,
> I have been experimenting with federated SPARQL queries and I ran into a
> possible bug.
>
> I am using Fuseki 1.0.0. I have two SPARQL endpoints.
>
> The first endpoint contains data about books (modified from the book
> example from Fuseki Data directory). Endpoint:
> http://localhost:3030/books/query
>
> @prefix dc:    <http://purl.org/dc/elements/1.1/> .
> @prefix :      <http://example.org/book/> .
> @prefix ns:    <http://example.org/ns#> .
> @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
>
> :book5  dc:creator  "J.K. Rowling" ;
>         dc:title    "Harry Potter and the Order of the Phoenix" .
>
> :book3  dc:creator  _:b0 ;
>         dc:title    "Harry Potter and the Prisoner Of Azkaban" .
>
> :book8  dc:creator  <http://www-sop.inria.fr/members/Alice> ;
>         dc:title    "Distributed Query Processing for Linked Data" .
>
> :book1  dc:creator  "J.K. Rowling" ;
>         dc:title    "Harry Potter and the Philosopher's Stone" .
>
> :book6  dc:creator  "J.K. Rowling" ;
>         dc:title    "Harry Potter and the Half-Blood Prince" .
>
> :book4  dc:title  "Harry Potter and the Goblet of Fire" .
>
> _:b0    vcard:FN  "J.K. Rowling" ;
>         vcard:N   [ vcard:Family  "Rowling" ;
>                     vcard:Given   "Joanna"
>                   ] .
>
> :book2  dc:creator  _:b0 ;
>         dc:title    "Harry Potter and the Chamber of Secrets" .
>
> :book7  dc:creator  "J.K. Rowling" ;
>         dc:title    "Harry Potter and the Deathly Hallows" .
>
>
> The second endpoint contains data about people: Endpoint:
> ocalhost:3031/persons/query
>
> @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix cert:  <http://www.w3.org/ns/auth/cert#> .
> @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
> @prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
> @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>
> <http://www-sop.inria.fr/members/Charlie>
>         a           foaf:Person ;
>         foaf:knows  <http://www-sop.inria.fr/members/Alice> , <
> http://www-sop.inria.fr/members/Bob> ;
>         foaf:name   "Charlie" .
>
> <http://www-sop.inria.fr/members/Alice>
>         a           foaf:Person ;
>         foaf:knows  <http://www-sop.inria.fr/members/Charlie> , <
> http://www-sop.inria.fr/members/Bob> ;
>         foaf:name   "Alice" .
>
> <http://www-sop.inria.fr/members/Bob>
>         a           foaf:Person ;
>         foaf:knows  <http://www-sop.inria.fr/members/Charlie> , <
> http://www-sop.inria.fr/members/Alice> ;
>         foaf:name   "Bob" .
>
>
> I am trying to run a federated SPARQL query with the SERVICE option to
> find the books for which the author’s names are specified in the second
> endpoint. The query looks like this:
>
> PREFIX : <http://example/>
> PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>
> SELECT * where
> {
>   ?book dc:title ?title .
>   ?book dc:creator ?author.
>
>   SERVICE <http://localhost:3031/persons/query>
>      { ?author foaf:name ?name  }
>
> }
>
> While this query should return only one result
>
> <http://example.org/book/book8> "Distributed Query Processing for Linked
> Data" <http://www-sop.inria.fr/members/Alice>  "Alice"
>
>
> It incorrectly returns:
>
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------
> | book                            | title
>          | author                                  | name      |
>
> ==========================================================================================================================================
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Bob"     |
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Alice"   |
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Charlie" |
> | <http://example.org/book/book8> | "Distributed Query Processing for
> Linked Data" | <http://www-sop.inria.fr/members/Alice> | "Alice"   |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Bob"     |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Alice"   |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Charlie" |
>
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> This problem occurs because subqueries with each variable bindings from
> the first endpoint are sent to the second endpoint, resulting the following
> subqueries for the second endpoint:
>
> SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name
> }
> SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
> SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name
> }
> SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name
> }
> SELECT  * WHERE   { <http://www-sop.inria.fr/members/Alice> <
> http://xmlns.com/foaf/0.1/name> ?name }
> SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
> SELECT  * WHERE   { "J.K. Rowling" <http://xmlns.com/foaf/0.1/name> ?name
> }
>
> (taken from fuseki log)
>
>
> The problem is because of the subqueries:
> SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
> SELECT  * WHERE   { _:b0 <http://xmlns.com/foaf/0.1/name> ?name }
>
> As blank nodes in this case are treated like variables, this subquery in
> the second endpoint returns Bob, Alice, and Charlie:
>
> -------------
> | name      |
> =============
> | "Bob"     |
> | "Alice"   |
> | "Charlie" |
> -------------
> Then this is joined with the first part of the query and produces the
> incorrect results below:
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Bob"     |
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Alice"   |
> | <http://example.org/book/book2> | "Harry Potter and the Chamber of
> Secrets"      | _:b0                                    | "Charlie" |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Bob"     |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Alice"   |
> | <http://example.org/book/book3> | "Harry Potter and the Prisoner Of
> Azkaban"     | _:b0                                    | "Charlie" |
>
> I think the bounded blank nodes from the first endpoint should not have
> been sent to second endpoint in the subqueries. The scope of blank nodes is
> local and this should be considered also in this case.
>
>
> Finally, if I change the oder of the service part in the query, it does
> return the correct result.
>
> PREFIX : <http://example/>
> PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>
> SELECT * where
> {
>
>   SERVICE <http://localhost:3031/persons/query>
>      { ?author foaf:name ?name  }
>   ?book dc:title ?title .
>   ?book dc:creator ?author.
>
> }
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------
> | author                                  | name    | book
>            | title                                          |
>
> ========================================================================================================================================
> | <http://www-sop.inria.fr/members/Alice> | "Alice" | <
> http://example.org/book/book8> | "Distributed Query Processing for Linked
> Data" |
>
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> I was wondering if it is a bug or if I am missing something here.
>
>
> Best regards,
> Rakebul Hasan
>
>