You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Johnathon Harris <jm...@gmail.com> on 2012/05/28 17:54:03 UTC

Inexplicable behaviour with OPTIONAL in Jena

(Apologies if this comes through on the mailing list twice, appears I used an 
old address for the list first time- jena-users@incubator.apache.org)

Hello all,
Here is an example of a scenario where I cannot explain why the query behaves 
the way it does regarding an OPTIONAL. I expect it's because the query is 
flawed but I'd appreciate an explanation as to why. The Jena reference might 
be a red herring- it behaves the same in TDB or SDB and with the latest ARQ 
version (2.9.0-incubating).

Scenario: A system records when a group of activities have taken place by 
storing triples in a named graph each time they are recorded. The location and 
date of the group are also recorded.

For a given graph, we select which activities were completed. However it is 
possible that multiple groups may have been submitted on the same day, so we 
also want to know if an activity which wasn't recorded in one group was 
present in another (on the same day and location).

Example data:
Create 2 records of activities, both in the same location but on different 
days.

Graph A URI: http://example/ontology/graph#graphA
Records 2 activities, only one of which is complete.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ont="http://example/ontology/data#">

  <rdf:Description rdf:nodeID="A0">
    <rdf:type rdf:resource="http://example/ontology/data#ActivityCollection"/>
    <ont:location 
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LocationName</ont:location>
    <ont:activity_date rdf:nodeID="A1"/>
    <ont:activityA rdf:nodeID="A2"/>
    <ont:activityB rdf:nodeID="A3"/>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A1">
    <rdf:type rdf:resource="http://example/ontology/data#Date"/>
    <ont:primitive_value>2012-05-26Z</ont:primitive_value>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A2">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A3">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <!-- Is NOT set. -->
  </rdf:Description>
</rdf:RDF>

Graph B URI: http://example/ontology/graph#graphB
Records 2 activities, both of which are complete.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ont="http://example/ontology/data#">

  <rdf:Description rdf:nodeID="A8">
    <rdf:type rdf:resource="http://example/ontology/data#ActivityCollection"/>
    <ont:location 
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LocationName</ont:location>
    <ont:activity_date rdf:nodeID="A1"/>
    <ont:activityA rdf:nodeID="A2"/>
    <ont:activityB rdf:nodeID="A3"/>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A1">
    <rdf:type rdf:resource="http://example/ontology/data#Date"/>
    <ont:primitive_value>2012-05-25Z</ont:primitive_value>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A2">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A3">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
</rdf:RDF>

*Now for the inexplicable part:* Run this query to find the tasks completed 
for graphA:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ont: <http://example/ontology/data#>

SELECT ?date ?location ?task ?taskComplete ?otherGraph
WHERE {
  GRAPH <http://example/ontology/graph#graphA> {

    # Locate the activity, its date and location.
    ?activity rdf:type ont:ActivityCollection ;
                           ont:location ?location ;
                           ont:activity_date ?dateNode .
    ?dateNode ont:primitive_value ?date .

    {
      # The activity has been done.
      ?activity ?task ?taskNodeTrue .
      ?taskNodeTrue rdf:type ont:Boolean .
      ?taskNodeTrue ont:primitive_value ?taskComplete .
      FILTER (?taskComplete = 'true')

    } UNION {
      # The activity has not been done.
      ?activity ?task ?taskNodeNotSet .
      ?taskNodeNotSet rdf:type ont:Boolean .
      OPTIONAL {
        ?taskNodeNotSet ont:primitive_value ?taskNotSet .
      }
      FILTER (!bound(?taskNotSet))

      # Detect if any other activity has been recorded for this date and 
location.
      OPTIONAL {
        GRAPH ?otherGraph {
          ?otherTrueForm ont:location ?location ;
                         ont:activity_date ?otherDateNode ;
                         ?task ?otherTaskNode .
          ?otherDateNode ont:primitive_value ?date .
          ?otherTaskNode ont:primitive_value ?otherPrimitiveValue .
          FILTER (?otherPrimitiveValue = 'true')
        }
      }

    }
  }
}

*You will only get a single result.* I don't expect that, since the part which 
selects in 'otherGraph' is OPTIONAL. And indeed if you remove that optional 
part entire you get the 2 activities you would expect. How can this be given 
that the W3C recommendation states:

- if the optional part does not match, it creates no bindings but does not 
eliminate the solution

My suspicion is that this is related to nesting the GRAPH patterns, since that 
isn't something ever mentioned in the W3C spec. Hoping someone can shed some 
light on what is going on here.

Regards, John Harris.

RE: Inexplicable behaviour with OPTIONAL in Jena

Posted by Robert Vesse <rv...@yarcdata.com>.
Hi Johnathon

I saw your question on Answers.SemanticWeb.com and answered there as well but will repeat the answer here for completeness

The problem is not to do with Jena but with your pattern nesting as you suspected.

The OPTIONAL is in the right hand side of your UNION whereas I suspect you actually meant to place it outside the UNION.  By placing it inside one side of the UNION it only applies to matches found on the right hand size of the UNION (which there do not appear to be any)

Moving one of the closing brackets to before the OPTIONAL so you place it outside the UNION does the trick

Hope this helps

Rob
_

_______________________________________
From: Johnathon Harris [jmharris@gmail.com]
Sent: 28 May 2012 08:54
To: users@jena.apache.org
Subject: Inexplicable behaviour with OPTIONAL in Jena

(Apologies if this comes through on the mailing list twice, appears I used an
old address for the list first time- jena-users@incubator.apache.org)

Hello all,
Here is an example of a scenario where I cannot explain why the query behaves
the way it does regarding an OPTIONAL. I expect it's because the query is
flawed but I'd appreciate an explanation as to why. The Jena reference might
be a red herring- it behaves the same in TDB or SDB and with the latest ARQ
version (2.9.0-incubating).

Scenario: A system records when a group of activities have taken place by
storing triples in a named graph each time they are recorded. The location and
date of the group are also recorded.

For a given graph, we select which activities were completed. However it is
possible that multiple groups may have been submitted on the same day, so we
also want to know if an activity which wasn't recorded in one group was
present in another (on the same day and location).

Example data:
Create 2 records of activities, both in the same location but on different
days.

Graph A URI: http://example/ontology/graph#graphA
Records 2 activities, only one of which is complete.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ont="http://example/ontology/data#">

  <rdf:Description rdf:nodeID="A0">
    <rdf:type rdf:resource="http://example/ontology/data#ActivityCollection"/>
    <ont:location
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LocationName</ont:location>
    <ont:activity_date rdf:nodeID="A1"/>
    <ont:activityA rdf:nodeID="A2"/>
    <ont:activityB rdf:nodeID="A3"/>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A1">
    <rdf:type rdf:resource="http://example/ontology/data#Date"/>
    <ont:primitive_value>2012-05-26Z</ont:primitive_value>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A2">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A3">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <!-- Is NOT set. -->
  </rdf:Description>
</rdf:RDF>

Graph B URI: http://example/ontology/graph#graphB
Records 2 activities, both of which are complete.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ont="http://example/ontology/data#">

  <rdf:Description rdf:nodeID="A8">
    <rdf:type rdf:resource="http://example/ontology/data#ActivityCollection"/>
    <ont:location
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LocationName</ont:location>
    <ont:activity_date rdf:nodeID="A1"/>
    <ont:activityA rdf:nodeID="A2"/>
    <ont:activityB rdf:nodeID="A3"/>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A1">
    <rdf:type rdf:resource="http://example/ontology/data#Date"/>
    <ont:primitive_value>2012-05-25Z</ont:primitive_value>
  </rdf:Description>

  <rdf:Description rdf:nodeID="A2">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A3">
    <rdf:type rdf:resource="http://example/ontology/data#Boolean"/>
    <ont:primitive_value>true</ont:primitive_value>
  </rdf:Description>
</rdf:RDF>

*Now for the inexplicable part:* Run this query to find the tasks completed
for graphA:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ont: <http://example/ontology/data#>

SELECT ?date ?location ?task ?taskComplete ?otherGraph
WHERE {
  GRAPH <http://example/ontology/graph#graphA> {

    # Locate the activity, its date and location.
    ?activity rdf:type ont:ActivityCollection ;
                           ont:location ?location ;
                           ont:activity_date ?dateNode .
    ?dateNode ont:primitive_value ?date .

    {
      # The activity has been done.
      ?activity ?task ?taskNodeTrue .
      ?taskNodeTrue rdf:type ont:Boolean .
      ?taskNodeTrue ont:primitive_value ?taskComplete .
      FILTER (?taskComplete = 'true')

    } UNION {
      # The activity has not been done.
      ?activity ?task ?taskNodeNotSet .
      ?taskNodeNotSet rdf:type ont:Boolean .
      OPTIONAL {
        ?taskNodeNotSet ont:primitive_value ?taskNotSet .
      }
      FILTER (!bound(?taskNotSet))

      # Detect if any other activity has been recorded for this date and
location.
      OPTIONAL {
        GRAPH ?otherGraph {
          ?otherTrueForm ont:location ?location ;
                         ont:activity_date ?otherDateNode ;
                         ?task ?otherTaskNode .
          ?otherDateNode ont:primitive_value ?date .
          ?otherTaskNode ont:primitive_value ?otherPrimitiveValue .
          FILTER (?otherPrimitiveValue = 'true')
        }
      }

    }
  }
}

*You will only get a single result.* I don't expect that, since the part which
selects in 'otherGraph' is OPTIONAL. And indeed if you remove that optional
part entire you get the 2 activities you would expect. How can this be given
that the W3C recommendation states:

- if the optional part does not match, it creates no bindings but does not
eliminate the solution

My suspicion is that this is related to nesting the GRAPH patterns, since that
isn't something ever mentioned in the W3C spec. Hoping someone can shed some
light on what is going on here.

Regards, John Harris.