You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by robbinspg <gi...@git.apache.org> on 2017/02/23 12:30:12 UTC

[GitHub] spark pull request #17039: [SPARK-19710] Fix ordering of rows in query resul...

GitHub user robbinspg opened a pull request:

    https://github.com/apache/spark/pull/17039

    [SPARK-19710] Fix ordering of rows in query results

    ## What changes were proposed in this pull request?
    Changes to SQLQueryTests to make the order of the results constant.
    Where possible ORDER BY has been added to match the existing expected output
    
    ## How was this patch tested?
    Test runs on x86, zLinux (big endian), ppc (big endian)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/robbinspg/spark-1 SPARK-19710

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17039
    
----
commit fbc46a6f5ec2a4aaf3c6b4d5d776ccc7b114842f
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-12-21T10:06:41Z

    o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray test
    fails on big endian. Only change byte order on little endian

commit 30e20be2c199cc57a6a85547770dfa6fc3d32752
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-12-22T09:14:14Z

    Simplify setting  of byte order

commit 145c76a2ce4b53726c209f04e0a230692b395369
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-12-22T16:08:38Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit f0e77f29f1dca2198a87efa28fb01fd247162ceb
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-12-23T08:55:15Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit 1bc1adf48dae6b1047ff4d4e3d467ffec88abe12
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-12-23T08:56:42Z

    remove redundant comment

commit ea259fc7a00b3aab1dd554ec9850d057407b1875
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-01-03T10:29:01Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit f4b76a779df4c9b952114a59907a27a2eddd9898
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-03T13:21:41Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit b5571ea47408f2027a73368adf763b0f0d60eba0
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-13T13:13:05Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit 191777387b4b76afd8442ddc9b33815bd487dfe6
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-14T10:57:47Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit a832b740a5e29b25e8e31cd5de73a311f855a12d
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-20T13:18:05Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit bafe31ccbefdc80d434e34baae8600cb6ef26c56
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-23T11:29:32Z

    Merge branch 'master' of https://github.com/apache/spark.git

commit 950415f98c4574532da9dd089e5a9b027d3683d8
Author: Pete Robbins <ro...@gmail.com>
Date:   2017-02-23T11:33:53Z

    Update tests to produce reliably ordered results

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17039


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @gatorsmile  I'm glad it wasn't just me that found it complex ;-)
    
    I've modified the patch to remove an unnecessary change as that query was not ordered and the test suite code handles that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    **[Test build #73508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73508/testReport)** for PR 17039 at commit [`4a4d7ad`](https://github.com/apache/spark/commit/4a4d7ad4b349e49dc4cb81235f796e360dd183f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    A small follow-up (I got the code to run). We could use the following in `SQLQueryTestSuite.getNormalizedResult`:
    ```scala
          val baseDf = session.sql(sql)
          val (df, isSorted) = baseDf.logicalPlan match {
            case Sort(ordering, true, child) =>
              val sort = Sort(ordering ++ child.output.map(SortOrder(_, Ascending)), true, child)
              (Dataset.ofRows(session, sort), true)
            case _ =>
              (baseDf, false)
          }
    
          val schema = df.schema
          // Get answer, but also get rid of the #1234 expression ids that show up in explain plans
          val answer = df.queryExecution.hiveResultString().map(_.replaceAll("#\\d+", "#x"))
    
          // If the output is not pre-sorted, sort it.
          if (isSorted) (schema, answer) else (schema, answer.sorted)
    ```
    
    If I run this, it also changes results in the following files:
    
    - group-analytics.sql.out
    - in-set-operations.sql.out
    - order-by-nulls-ordering.sql.out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    How about the more pragmatic approach. I think relation algebra only guarantees ordering when an order by is the top level operation. Why not just check that, and if we find one, add all output columns to the order by?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @hvanhovell Is that possible the SQL queries are used to verify the behavior of ORDER BY? Do you think we should explicitly leave a comment to say SQLQueryTestSuite will not be used for this goal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73349/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @robbinspg `SQLQueryTestQuite` checks if the result is sorted, if it isn't the results get sorted. The problem with this sort check is that it does not check if the output is completely sorted, and thus causing test failures when rows have the same ordering. The current PR fixes issues with the current tests, but it does not guarantee that this won't happen again. Perhaps it is an idea to make a modification to the `SQLQueryTestQuite.getNormalizedResult`, that always does a complete sort (while respecting the predefined sort order)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    ok there were a couple of similar issues such as in-set-operations query 9, group-analytics.sql.out query 21 and 22


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    I think that the current "order if not currently ordered" in the test suite is good for checking the set of results for unordered queries.
    
    If ordered at all then the results should be deterministic given the input data and query are part of the test otherwise it is a bad test. So... I think this PR is the way to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    LGTM - merging to master! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    The major issue is we do not know the original intention of users' query. The query might purposely check whether the result set is sorted or not. Thus, the existing test suite design is conservative to avoid adding any sort as long as users specify the ORDER BY clause. For example, 
    ```SQL
    SELECT c1, c2, sum(c1) FROM tab1 GROUP BY c1, c2 ORDER BY c1, c2
    ```
    In the above example, although the order by clause does not contain all the columns, the result set is always sorted. Thus, our test suite should not sort it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    The code didn't respect the ordering because the analyzer adds a project on top; this is for cases where you sort by a column that is not in the final projection. We could also pattern match on that, but I don't think we should be adding too much magic.
    
    Let me merge this, and we can pick this up if it ever becomes an issue again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    **[Test build #73349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73349/testReport)** for PR 17039 at commit [`950415f`](https://github.com/apache/spark/commit/950415f98c4574532da9dd089e5a9b027d3683d8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @robbinspg yeah let's use the approach that I suggested. That at least makes sure we won't have to fix this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @hvanhovell @gatorsmile  I agree that would be a better solution however I don't know how to achieve that being unfamiliar with this code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    I have just taken a look at this. Modifying the sort does not really help. I am fine with this approach.
    
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Jenkins retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    **[Test build #73349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73349/testReport)** for PR 17039 at commit [`950415f`](https://github.com/apache/spark/commit/950415f98c4574532da9dd089e5a9b027d3683d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    ok so here is an example of output I'm not sure is correct:
    
    in-order-by
    
    -- !query 17
    SELECT Count(DISTINCT( t1a )),
           t1b
    FROM   t1
    WHERE  t1h NOT IN (SELECT t2h
                       FROM   t2
                       where t1a = t2a
                       order by t2d DESC nulls first
                       )
    GROUP  BY t1a,
              t1b
    ORDER  BY t1b DESC nulls last
    -- !query 17 schema
    struct<count(DISTINCT t1a):bigint,t1b:smallint>
    -- !query 17 output
    1	10
    1	10
    1	16
    1	6
    1	8
    1	NULL
    
    That is the "new" output with your change but it doesn't actually match what you'd expect from that query (it isn't t1b DESC) which would be
    
    1	16
    1	10
    1	10
    1	8
    1	6
    1	NULL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @hvanhovell I made a try and just realized it might be hard to automatically detect whether it is a complete sort and respect the predefined sort order. The order by clauses could be complex expressions and the output attributes of queries might be the alias of complex expressions too. Thus, maybe we still can keep the existing conservative way (i.e., as long as the test query specifies the order-by clauses, we expect the query result have deterministic orders)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Hmmmm - apparently the ORDER BY is not the top node in that plan... Lemme check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @hvanhovell Yes! This is caused by partial sort + duplication values on the sorted columns. Since all these result sets are pretty small, maybe another idea is to sort the whole result when we found duplication values on the sorted columns? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73508/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    **[Test build #73508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73508/testReport)** for PR 17039 at commit [`4a4d7ad`](https://github.com/apache/spark/commit/4a4d7ad4b349e49dc4cb81235f796e360dd183f8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...

Posted by robbinspg <gi...@git.apache.org>.

Github user robbinspg commented on the issue:

    https://github.com/apache/spark/pull/17039
  
    @hvanhovell So I backed out the changes in this PR, implemented your change to SQLQueryTestSuite.getNormalizedResult, regenerated the golden results files and the tests all pass on my x86 and big endian platforms.
    
    results files that were changed:
    sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out
    sql/core/src/test/resources/sql-tests/results/order-by-nulls-ordering.sql.out
    sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out
    sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out
    sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
    sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-joins.sql.out
    
    
    So, should I abandon this PR and go with your solution? I can submit your change in a PR along with updated results files if you want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org