You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2019/10/29 17:06:00 UTC
[jira] [Resolved] (JENA-1770) Spilling bindings with OPTIONAL leads
to wrong answers
[ https://issues.apache.org/jira/browse/JENA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne resolved JENA-1770.
---------------------------------
Fix Version/s: Jena 3.14.0
Resolution: Fixed
> Spilling bindings with OPTIONAL leads to wrong answers
> ------------------------------------------------------
>
> Key: JENA-1770
> URL: https://issues.apache.org/jira/browse/JENA-1770
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.13.1
> Reporter: Shawn Smith
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 3.14.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> A query like the following where some variables are optional may lead to wrong answers when spilling occurs:
> {code:java}
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> SELECT ?name ?mbox
> WHERE
> { ?x foaf:name ?name
> OPTIONAL
> { ?x foaf:mbox ?mbox }
> }
> ORDER BY ASC(?mbox)
> {code}
> This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.
> The root cause is that BindingOutputStream emits a VARS row based on the first binding, but it doesn't emit a new VARS row when a subsequent binding contains additional variables.
> The BindingOutputStream.needVars() method will cause a second VARS row to be emitted when a new binding is missing variables, but not when it has extras. This logic may be inverted from what was intended.
> There's a TestDistinctDataBag test case below that reproduces the problem. It generates a spill file like this:
> {code}
> VARS ?1 .
> "A" .
> "A" .
> {code}
> when a correct spill file would be:
> {code}
> VARS ?1 .
> "A" .
> VARS ?2 ?1 .
> "B" "A" .
> {code}
> If you run it, you may notice that it fails with a spill threshold of 2 but passes with a higher threshold:
> {code:java}
> @Test public void testOptionalVariables()
> {
> // Setup a situation where the second binding in a spill file binds more
> // variables than the first binding
> BindingMap binding1 = BindingFactory.create();
> binding1.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
> BindingMap binding2 = BindingFactory.create();
> binding2.add(Var.alloc("1"), NodeFactory.createLiteral("A"));
> binding2.add(Var.alloc("2"), NodeFactory.createLiteral("B"));
> List<Binding> undistinct = Arrays.asList(binding1, binding2, binding1);
> List<Binding> control = Iter.toList(Iter.distinct(undistinct.iterator()));
> List<Binding> distinct = new ArrayList<>();
> DistinctDataBag<Binding> db = new DistinctDataBag<>(
> new ThresholdPolicyCount<Binding>(2),
> SerializationFactoryFinder.bindingSerializationFactory(),
> new BindingComparator(new ArrayList<SortCondition>()));
> try
> {
> db.addAll(undistinct);
> Iterator<Binding> iter = db.iterator();
> while (iter.hasNext())
> {
> distinct.add(iter.next());
> }
> Iter.close(iter);
> }
> finally
> {
> db.close();
> }
> assertEquals(control.size(), distinct.size());
> assertTrue(ResultSetCompare.equalsByTest(control, distinct, NodeUtils.sameTerm));
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)