You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2019/10/29 17:06:00 UTC
[jira] [Resolved] (JENA-1771) Spilling combined with DISTINCT ..
ORDER BY returns rows in the wrong order
[ https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne resolved JENA-1771.
---------------------------------
Fix Version/s: Jena 3.14.0
Resolution: Fixed
> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
> Key: JENA-1771
> URL: https://issues.apache.org/jira/browse/JENA-1771
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.13.1
> Reporter: Shawn Smith
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 3.14.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs. This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT *
> WHERE
> { ?x :p ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
> (order ((asc ?v))
> (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions. As a result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making things work correctly. For example, adding a LIMIT 5 clause to the query above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.
> You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
> qe.getContext().set(ARQ.spillToDiskThreshold, 4); // add this line
> ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query:
> PREFIX : <http://example/>
> SELECT DISTINCT *
> WHERE
> { ?x :p ?v }
> ORDER BY ASC(?v)
> LIMIT 5
> Got: 5 --------------------------------
> -------------
> | x | v |
> =============
> | :x1 | 1 |
> | :x2 | 2 |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x | v |
> ============
> | :x1 | 1 |
> | :x2 | 2 |
> | :x3 | 3 |
> | :x3a | 3 |
> | :x4 | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.assertTrue(Assert.java:22)
> at junit.framework.TestCase.assertTrue(TestCase.java:192)
> at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
> at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
> at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
> at junit.framework.TestCase.runBare(TestCase.java:141)
> at junit.framework.TestResult$1.protect(TestResult.java:122)
> at junit.framework.TestResult.runProtected(TestResult.java:142)
> at junit.framework.TestResult.run(TestResult.java:125)
> at junit.framework.TestCase.run(TestCase.java:129)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)