You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@marmotta.apache.org by "Dietmar Glachs (JIRA)" <ji...@apache.org> on 2017/10/02 12:04:02 UTC

[jira] [Assigned] (MARMOTTA-603) SPARQL OPTIONAL issues

     [ https://issues.apache.org/jira/browse/MARMOTTA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dietmar Glachs reassigned MARMOTTA-603:
---------------------------------------

    Assignee:     (was: Dietmar Glachs)

> SPARQL OPTIONAL issues
> ----------------------
>
>                 Key: MARMOTTA-603
>                 URL: https://issues.apache.org/jira/browse/MARMOTTA-603
>             Project: Marmotta
>          Issue Type: Bug
>          Components: KiWi Triple Store
>    Affects Versions: 3.3.0
>            Reporter: Rupert Westenthaler
>            Priority: Critical
>
> The SPARQL implemenation of the KiWi triple store seams to have issues with the evaluation of OPTIONAL segments of SPARQL queries. In the following test data and test queries are provided.
> h2. Data
> {code}
> 	<urn:test.org:place.1> rdf:type schema:Palce ;
> 		schema:geo <urn:test.org:geo.1> ;
> 		schema:name "Place 1" .
> 	<urn:test.org:geo.1> rdf:type schema:GeoCoordinates ;
> 		schema:latitude "16"^^xsd:double ;
> 		schema:longitude "17"^^xsd:double ;
> 		schema:elevation "123"^^xsd:int .
> 	<urn:test.org:place.2> rdf:type schema:Palce ;
> 		schema:geo <urn:test.org:geo.2> ;
> 		schema:name "Place 2" .
> 	<urn:test.org:geo.2> rdf:type schema:GeoCoordinates ;
> 		schema:latitude "15"^^xsd:double ;
> 		schema:longitude "16"^^xsd:double ;
> 		schema:elevation "99"^^xsd:int .
> 	<urn:test.org:place.3> rdf:type schema:Palce ;
> 		schema:geo <urn:test.org:geo.3> ;
> 		schema:name "Place 3" .
> 	<urn:test.org:geo.3> rdf:type schema:GeoCoordinates ;
> 		schema:latitude "15"^^xsd:double ;
> 		schema:longitude "17"^^xsd:double .
> 	<urn:test.org:place.4> rdf:type schema:Palce ;
> 		schema:geo <urn:test.org:geo.4> ;
> 		schema:name "Place 4" .
> 	<urn:test.org:geo.4> rdf:type schema:GeoCoordinates ;
> 		schema:longitude "17"^^xsd:double ;
> 		schema:elevation "123"^^xsd:int .
> {code}
> Important is that `geo.1` and `geo.2` do have all latitude, longitude and elevation defined. `geo.3` has no elevation and `geo.4` is missing the latitude to simulate invalid geo coordinate data.
> h2. Test Case 1
> The following query using an OPTIONAL graph pattern including `schema:latitude` and `schema:longitude`. This assumes a user just want lat/long values of locations that do define both.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>             ?location schema:latitude ?lat .
>             ?location    schema:longitude ?long .
>         }
>     }
> {code}
> translate to the Algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                 (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>             (bgp (triple ?entity schema:geo ?location))
>             (bgp
>                 (triple ?location schema:latitude ?lat)
>                 (triple ?location schema:longitude ?long)
>              ))))
> {code}
> The expected result are 
> {code}
>     entity,location,lat,long
>     urn:test.org:place.1,urn:test.org:geo.1,16,17
>     urn:test.org:place.2,urn:test.org:geo.2,15,16
>     urn:test.org:place.3,urn:test.org:geo.3,15,17
>     urn:test.org:place.4,urn:test.org:geo.4,,
> {code}
> All four locations are expected in the result set as the `OPTIONAL` graph pattern is translated to a `leftjoin` with `triple ?entity schema:geo ?location`.
> However for `geo.4` no value is expected for `?lat` AND `long` as this resource only defines a longitude and therefore does not match
> {code}
>     (bgp
>         (triple ?location schema:latitude ?lat)
>         (triple ?location schema:longitude ?long)
>     )
> {code}
> Marmotta responses with 
> {code}
>     entity,location,lat,long
>     urn:test.org:place.1,urn:test.org:geo.1,16,17
>     urn:test.org:place.2,urn:test.org:geo.2,15,16
>     urn:test.org:place.3,urn:test.org:geo.3,15,17
>     urn:test.org:place.4,urn:test.org:geo.4,,17
> {code}
> Note that the longitude is returned for the resource `geo.4`
> h2. Test Case 2
> As a variation we now also include the `schema:elevation` in the OPTIONAL graph pattern.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>       	    ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             ?location schema:elevation ?alt .
>         }
>     }
> {code}
> This query translates to the following algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                    (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>             (bgp (triple ?entity schema:geo ?location))
>             (bgp
>                 (triple ?location schema:latitude ?lat)
>                 (triple ?location schema:longitude ?long)
>                 (triple ?location schema:elevation ?alt)
>             ))))
> {code}
> The expected result would have 4 result rows where `lat`, `long` and `alt` values are only provided for `geo.1` and `geo.2`.
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,,,
>     urn:test.org:place.4,urn:test.org:geo.4,,,
> {code}
> With this query Marmotta behaves very strange as the results depend on the ordering of the  tripple patterns in the `OPTIONAL` graph pattern. I will not include all variations but just provide two examples:
> {code}
>         OPTIONAL {
>       	    ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             ?location schema:elevation ?alt .
>         }
> {code}
> gives
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
>     urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
>     urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
> {code}
> while
> {code}
>         OPTIONAL {
>             ?location schema:longitude ?long .
>       	    ?location schema:latitude ?lat .
>             ?location schema:elevation ?alt .
>         }
> {code}
> gives
> {code}
>     entity,location,long,lat,alt
>     urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
>     urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
> {code}
> This behavior further indicates that `OPTIONAL` are wrongly processed.
> h2. Test Case 3
> Modifying the query to 
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>       	    ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>         }
>         OPTIONAL {
>             ?location schema:elevation ?alt .
>         }
>     }
> {code}
> results in a similar result to _Test Case 1_ where we have 4 results, but for `geo.4` we do get the unexpected value for `?long`.
> h2. Test Case 4
> This test case assumes that the user requires `lat` and `long` and optionally wants the `alt` but only for resources that do have a valid location.
> {code}
>     PREFIX schema: <http://schema.org/>
>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>     SELECT * WHERE {
>         ?entity schema:geo ?location
>         OPTIONAL {
>       	    ?location schema:latitude ?lat .
>             ?location schema:longitude ?long .
>             OPTIONAL {
>                 ?location schema:elevation ?alt .
>             }
>         }
>     }
> {code}
> This translates to the following algebra
> {code}
>     (base <http://example/base/>
>         (prefix ((schema: <http://schema.org/>)
>                    (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>             (leftjoin
>                 (bgp (triple ?entity schema:geo ?location))
>                 (leftjoin
>                     (bgp
>                         (triple ?location schema:latitude ?lat)
>                         (triple ?location schema:longitude ?long)
>                     )
>                         (bgp (triple ?location schema:elevation ?alt))))))
> {code}
> So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is in an other `leftjoin` with the results of `?entity schema:geo ?location`. Because expected results are as follows
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,,,
>     urn:test.org:place.4,urn:test.org:geo.4,,,
> {code}
> Marmotta however returns
> {code}
>     entity,location,lat,long,alt
>     urn:test.org:place.1,urn:test.org:geo.1,16,17,123
>     urn:test.org:place.2,urn:test.org:geo.2,15,16,99
>     urn:test.org:place.3,urn:test.org:geo.3,15,17,
>     urn:test.org:place.4,urn:test.org:geo.4,,17,123
> {code}
> All test cases show that OPTIONAL query segments are not correctly evaluated by the SPARQL implementation of the KiWi triple store.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)