You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Shawn Smith (Jira)" <ji...@apache.org> on 2019/10/17 23:09:00 UTC

[jira] [Updated] (JENA-1771) Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order

     [ https://issues.apache.org/jira/browse/JENA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shawn Smith updated JENA-1771:
------------------------------
    Description: 
It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs. This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
PREFIX : <http://example.org/>
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Here's the query plan for this query:
{code:java}
(distinct
  (order ((asc ?v))
    (bgp (triple ?x <http://example.org/p> ?v))))
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:
{quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
{quote}
But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions. As a result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and doesn't preserve the ORDER BY ASC(?v) requirement.

Note that some query plans will reorder the ORDER BY and DISTINCT, making things work correctly. For example, adding a LIMIT clause to the query above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.

You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
    qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
    ...
{code}
For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:
{code:java}
Query: 
PREFIX  :     <http://example/>

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 --------------------------------
-------------
| x    | v  |
=============
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x    | v |
============
| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |
------------

junit.framework.AssertionFailedError: Results do not match
	at junit.framework.Assert.fail(Assert.java:57)
	at junit.framework.Assert.assertTrue(Assert.java:22)
	at junit.framework.TestCase.assertTrue(TestCase.java:192)
	at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
	at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
	at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
	at junit.framework.TestCase.runBare(TestCase.java:141)
	at junit.framework.TestResult$1.protect(TestResult.java:122)
	at junit.framework.TestResult.runProtected(TestResult.java:142)
	at junit.framework.TestResult.run(TestResult.java:125)
	at junit.framework.TestCase.run(TestCase.java:129)
{code}

  was:
It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs.  This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.

Consider the following query:
{code:java}
SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
{code}
Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:

bq. The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions.  As a result, the DISTINCT operation doesn't preserve the ORDER BY ASC(?v) requirement.

You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:
{code:java}
void runTestSelect(Query query, QueryExecution qe)
{
    qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
    ...
{code}

For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:
{code}
Query: 
PREFIX  :     <http://example/>

SELECT DISTINCT  *
WHERE
  { ?x  :p  ?v }
ORDER BY ASC(?v)
LIMIT   5

Got: 5 --------------------------------
-------------
| x    | v  |
=============
| :x1  | 1  |
| :x2  | 2  |
| :x10 | 10 |
| :x11 | 11 |
| :x12 | 12 |
-------------
Expected: 5 -----------------------------
------------
| x    | v |
============
| :x1  | 1 |
| :x2  | 2 |
| :x3  | 3 |
| :x3a | 3 |
| :x4  | 4 |
------------

junit.framework.AssertionFailedError: Results do not match
	at junit.framework.Assert.fail(Assert.java:57)
	at junit.framework.Assert.assertTrue(Assert.java:22)
	at junit.framework.TestCase.assertTrue(TestCase.java:192)
	at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
	at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
	at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
	at junit.framework.TestCase.runBare(TestCase.java:141)
	at junit.framework.TestResult$1.protect(TestResult.java:122)
	at junit.framework.TestResult.runProtected(TestResult.java:142)
	at junit.framework.TestResult.run(TestResult.java:125)
	at junit.framework.TestCase.run(TestCase.java:129)
{code}


> Spilling combined with DISTINCT .. ORDER BY returns rows in the wrong order
> ---------------------------------------------------------------------------
>
>                 Key: JENA-1771
>                 URL: https://issues.apache.org/jira/browse/JENA-1771
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Priority: Major
>
> It looks like Jena assumes that OpDistinct preserves order, but order is not preserved when spilling occurs. This is only a problem when the ARQ.spillToDiskThreshold setting has been configured.
> Consider the following query:
> {code:java}
> PREFIX : <http://example.org/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> {code}
> Here's the query plan for this query:
> {code:java}
> (distinct
>   (order ((asc ?v))
>     (bgp (triple ?x <http://example.org/p> ?v))))
> {code}
> Jena executes the ORDER BY ASC(?v) before the DISTINCT, relying on the SPARQL requirement:
> {quote}The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
> {quote}
> But, when spilling, QueryIterDistinct (which executes OpDistinct) creates a DistinctDataBag with a BindingComparator without any sort conditions. As a result, the DISTINCT operation sorts using "compareBindingsSyntactic()" and doesn't preserve the ORDER BY ASC(?v) requirement.
> Note that some query plans will reorder the ORDER BY and DISTINCT, making things work correctly. For example, adding a LIMIT clause to the query above results in a "(top (5 (asc ?v))" operation that doesn't suffer from the bug.
> You can reproduce this by injecting the following into QueryTest.java then running the ARQTestRefEngine tests:
> {code:java}
> void runTestSelect(Query query, QueryExecution qe)
> {
>     qe.getContext().set(ARQ.spillToDiskThreshold, 4);   // add this line
>     ...
> {code}
> For example, "ARQTestRefEngine -> Algebra optimizations -> QueryTest.opt-top-05" will fail with:
> {code:java}
> Query: 
> PREFIX  :     <http://example/>
> SELECT DISTINCT  *
> WHERE
>   { ?x  :p  ?v }
> ORDER BY ASC(?v)
> LIMIT   5
> Got: 5 --------------------------------
> -------------
> | x    | v  |
> =============
> | :x1  | 1  |
> | :x2  | 2  |
> | :x10 | 10 |
> | :x11 | 11 |
> | :x12 | 12 |
> -------------
> Expected: 5 -----------------------------
> ------------
> | x    | v |
> ============
> | :x1  | 1 |
> | :x2  | 2 |
> | :x3  | 3 |
> | :x3a | 3 |
> | :x4  | 4 |
> ------------
> junit.framework.AssertionFailedError: Results do not match
> 	at junit.framework.Assert.fail(Assert.java:57)
> 	at junit.framework.Assert.assertTrue(Assert.java:22)
> 	at junit.framework.TestCase.assertTrue(TestCase.java:192)
> 	at org.apache.jena.sparql.junit.QueryTest.runTestSelect(QueryTest.java:284)
> 	at org.apache.jena.sparql.junit.QueryTest.runTestForReal(QueryTest.java:201)
> 	at org.apache.jena.sparql.junit.EarlTestCase.runTest(EarlTestCase.java:88)
> 	at junit.framework.TestCase.runBare(TestCase.java:141)
> 	at junit.framework.TestResult$1.protect(TestResult.java:122)
> 	at junit.framework.TestResult.runProtected(TestResult.java:142)
> 	at junit.framework.TestResult.run(TestResult.java:125)
> 	at junit.framework.TestCase.run(TestCase.java:129)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)