You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Shawn Smith (Jira)" <ji...@apache.org> on 2019/10/17 21:16:00 UTC

[jira] [Created] (JENA-1770) Spilling bindings with OPTIONAL leads to wrong answers

Shawn Smith created JENA-1770:
---------------------------------

             Summary: Spilling bindings with OPTIONAL leads to wrong answers
                 Key: JENA-1770
                 URL: https://issues.apache.org/jira/browse/JENA-1770
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 3.13.1
            Reporter: Shawn Smith


A query like the following where some variables are optional may lead to wrong answers when spilling occurs: 
{code:java}
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
SELECT  ?name ?mbox
WHERE
  { ?x  foaf:name  ?name
    OPTIONAL
      { ?x  foaf:mbox  ?mbox }
  }
ORDER BY ASC(?mbox)
{code}
This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.

The root cause is that BindingOutputStream emits a VARS row based on the first binding, but it doesn't emit a new VARS row when a subsequent binding contains additional variables.  

The BindingOutputStream.needVars() method will cause a second VARS row to be emitted when a new binding is missing variables, but not when it has extras.  This logic may be inverted from what was intended.

There's a TestDistinctDataBag test case below that reproduces the problem. It generates a spill file like this:
{code}
VARS ?1 .
"A" .
"A" .
{code}
when a correct spill file would be:
{code}
VARS ?1 .
"A" .
VARS ?2 ?1 .
"B" "A" .
{code}

If you run it, you may notice that it fails with a spill threshold of 2 but passes with a higher threshold:
{code:java}
@Test public void testOptionalVariables()
{
    // Setup a situation where the second binding in a spill file binds more
    // variables than the first binding
    BindingMap binding1 = BindingFactory.create();
    binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A"));

    BindingMap binding2 = BindingFactory.create();
    binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
    binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B"));

    List<Binding> undistinct = Arrays.asList(binding1, binding2, binding1);
    List<Binding> control = Iter.toList(Iter.distinct(undistinct.iterator()));
    List<Binding> distinct = new ArrayList<>();

    DistinctDataBag<Binding> db = new DistinctDataBag<>(
            new ThresholdPolicyCount<Binding>(2),
            SerializationFactoryFinder.bindingSerializationFactory(),
            new BindingComparator(new ArrayList<SortCondition>()));
    try
    {
        db.addAll(undistinct);
        Iterator<Binding> iter = db.iterator();
        while (iter.hasNext())
        {
            distinct.add(iter.next());
        }
        Iter.close(iter);
    }
    finally
    {
        db.close();
    }

    assertEquals(control.size(), distinct.size());
    assertTrue(ResultSetCompare.equalsByTest(control, distinct, NodeUtils.sameTerm));
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)