You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2019/10/24 10:11:00 UTC

[jira] [Assigned] (JENA-1771) Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

     [ https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne reassigned JENA-1771:
-----------------------------------

    Assignee: Andy Seaborne

> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
>                 Key: JENA-1771
>                 URL: https://issues.apache.org/jira/browse/JENA-1771
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Assignee: Andy Seaborne
>            Priority: Major
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs. This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
>   (order ((asc ?v))
>     (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions. As a result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making things work correctly. For example, adding a LIMIT 5 clause to the query above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.
> You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
>     qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
>     ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query: 
> PREFIX  :     <http://example/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> LIMIT   5
> Got: 5 --------------------------------
> -------------
> | x    | v  |
> =============
> | :x1  | 1  |
> | :x2  | 2  |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x    | v |
> ============
> | :x1  | 1 |
> | :x2  | 2 |
> | :x3  | 3 |
> | :x3a | 3 |
> | :x4  | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
> 	at junit.framework.Assert.fail(Assert.java:57)
> 	at junit.framework.Assert.assertTrue(Assert.java:22)
> 	at junit.framework.TestCase.assertTrue(TestCase.java:192)
> 	at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
> 	at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
> 	at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
> 	at junit.framework.TestCase.runBare(TestCase.java:141)
> 	at junit.framework.TestResult$1.protect(TestResult.java:122)
> 	at junit.framework.TestResult.runProtected(TestResult.java:142)
> 	at junit.framework.TestResult.run(TestResult.java:125)
> 	at junit.framework.TestCase.run(TestCase.java:129)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)