You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/01/28 00:55:32 UTC

[GitHub] spark pull request #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows a...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/16724

    [SPARK-19352][WIP][SQL] Keep sort order of rows after external sorter when writing

    ## What changes were proposed in this pull request?
    
    WIP
    
    ## How was this patch tested?
    
    Will add test case later.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 keep-sort-order-after-external-sorter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16724
    
----
commit 93d380620c411dc33c14a4787f2ceee28e9c155c
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-01-28T00:45:11Z

    Keep sort order of rows after external sorter when writing.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16724: [SPARK-19352][SQL] Preserve sort order when savin...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16724#discussion_r100687176
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala ---
    @@ -487,6 +487,36 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi
         }
       }
     
    +  test("SPARK-19352: Keep sort order of rows after external sorter when writing") {
    --- End diff --
    
    again, this is not guaranteed, we should not test it.
    
    This is an optimization and advanced users can leverage this to preserve the sort order, but it may change in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72100/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72751/testReport)** for PR 16724 at commit [`b1ce030`](https://github.com/apache/spark/commit/b1ce0308cf44ca5bad60a4e954f6169a3c80967e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72119/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)** for PR 16724 at commit [`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72119/testReport)** for PR 16724 at commit [`3c040b6`](https://github.com/apache/spark/commit/3c040b664b7aeb0d1ee78272f79140a34ec30ef6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    @cloud-fan Is the current change not suitable? We can change it to only preserve data order when specifying partitioning and no bucketing for the output.
    
    This change only adds a new constructor to `UnsafeKVExternalSorter`.  No other API change I think. As the data output is going through this external sorter, it definitely changes the data order without this change. I think we may not be able to preserve data order with a workaround which doesn't touch `UnsafeKVExternalSorter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    The point of optimization is good for me. I will create another JIRA/PR for it if the current change is not considered to merge in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    `sortWithinPartitions("userId", "timestamp")` doesn't it make the `userId` continuous?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    @cloud-fan Yeah, I think it is reasonable that the ordering should be preserved if the writer doesn't specify partitioning and bucketing (we already did this now), or the specified partitioning is same as the input data (currently we don't do this). It is reasonable is because bucketing can be thought as a kind of data partition. And the data ordering after a data partition can't be guaranteed, I think.
    
    @igozali I think you don't need to bucket the data in each partition, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72096/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Preserve sort order when saving datas...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    @cloud-fan ok. I will look into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72116/testReport)** for PR 16724 at commit [`3c040b6`](https://github.com/apache/spark/commit/3c040b664b7aeb0d1ee78272f79140a34ec30ef6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by igozali <gi...@git.apache.org>.

Github user igozali commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    My original use case for sorting the output files based on timestamp using Spark was to use the output files with some other machine learning framework which might not readily work well with very large data files, like TensorFlow or Theano. The benefit that I was trying to get was to offload the sorting to Spark, since even if I ended up with large CSV files I could potentially mmap the CSV files to be used with the subsequent frameworks (TF/Theano).
    
    I thought this could be a relatively common use case, but from the impressions I'm getting from this discussion, I wonder if this is not a paradigm that Spark supports or encourages?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72106/testReport)** for PR 16724 at commit [`c341cfa`](https://github.com/apache/spark/commit/c341cfa4ff095ee97e99c76f9c1051e22666b038).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    I don't think we should respect the ordering of input data in the writer. The current `DataFrameWriter` doesn't allow users to write data out orderly.
    
    However, there is an opportunity for optimization: when the data is already partitioned, the writer doesn't need to sort the data by partition columns anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72751/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)** for PR 16724 at commit [`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)** for PR 16724 at commit [`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Preserve sort order when saving datas...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Close this in favor of #16898.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72106/testReport)** for PR 16724 at commit [`c341cfa`](https://github.com/apache/spark/commit/c341cfa4ff095ee97e99c76f9c1051e22666b038).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    > oh good catch. Then it seems like df.repartition($"userId").sortWithinPartitions("timestamp") won't produce a result set as we expected.
    
    Just realized `df.repartition($"userId").sortWithinPartitions("userId", "timestamp")` will produce a result set as we expected, can we optimize this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)** for PR 16724 at commit [`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    @cloud-fan OK. I see. If we don't want to add implicit penalty into the existing API, the only way I can think now, is a config to preserve the sort order. This config can be in `SQLConf`, or we can just have an option for it in `DataFrameWriter` like `maxRecordsPerFile`.
    
    E.g.,
    
        val df = spark.range(100)
          .select($"id", explode(array(col("id") + 1, col("id") + 2, col("id") + 3)).as("value"))
          .repartition($"id")
          .sortWithinPartitions($"value".desc).toDF()
    
        df.write
          .option("perserveSortOrder", true) // default is false
          .partitionBy("id")
          .parquet(tempDir)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    cc @cloud-fan @rxin @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72116/testReport)** for PR 16724 at commit [`3c040b6`](https://github.com/apache/spark/commit/3c040b664b7aeb0d1ee78272f79140a34ec30ef6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16724: [SPARK-19352][SQL] Preserve sort order when savin...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16724#discussion_r100687252
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala ---
    @@ -487,6 +487,36 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi
         }
       }
     
    +  test("SPARK-19352: Keep sort order of rows after external sorter when writing") {
    --- End diff --
    
    got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16724: [SPARK-19352][SQL] Preserve sort order when savin...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16724#discussion_r100687150
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -369,7 +371,78 @@ object FileFormatWriter extends Logging {
             context = taskAttemptContext)
         }
     
    +    // Returns the partition path given a partition key.
    +    private val getPartitionStringFunc = UnsafeProjection.create(
    +      Seq(Concat(partitionStringExpression)), description.partitionColumns)
    +
    +    // Returns the data columns to be written given an input row
    +    private val getOutputRow = UnsafeProjection.create(
    +      description.dataColumns, description.allColumns)
    +
         override def execute(iter: Iterator[InternalRow]): Set[String] = {
    +      val outputOrderingExprs = description.outputOrdering.map(_.child)
    --- End diff --
    
    this duplicates too much code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16724: [SPARK-19352][SQL] Preserve sort order when savin...

Posted by viirya <gi...@git.apache.org>.

Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/16724


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    @cloud-fan Even the specified partitioning is same as the input data, we still need the sorting. Because the rows with same partition values are not guaranteed to be continuous in all rows, so you can't write all rows of same partition values at once with an outputwriter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72097/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72116/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72119/testReport)** for PR 16724 at commit [`3c040b6`](https://github.com/apache/spark/commit/3c040b664b7aeb0d1ee78272f79140a34ec30ef6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    > Just realized df.repartition($"userId").sortWithinPartitions("userId", "timestamp") will produce a result set as we expected, can we optimize this case?
    
    I may not understand you correctly. The result sets should be partitioned by "userId" and sorted by "timestamp". But in each partition, the rows with the same "userId" are not continuous.
    
    But we want the rows with the same "userId" are continuous in each partition and their `timestamp` values are sorted.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    If we admit that preserving the sort order is not guaranteed by the API, then the change in this PR is not reasonable, as it has performance penalty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72751/testReport)** for PR 16724 at commit [`b1ce030`](https://github.com/apache/spark/commit/b1ce0308cf44ca5bad60a4e954f6169a3c80967e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    oh good catch. Then it seems like `df.repartition($"userId").sortWithinPartitions("timestamp")` won't produce a result set as we expected.
    
    So there is no way to write out partitioned sorted data currently, @viirya can you think of a workaround? Adding a new API maybe not a good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    A data save API which doesn't respect the ordering of data sounds strange for me. If this is true, then, extremely said, we can remove the final Sort operator if any, when we want to output the data. Besides, you can't perform a simple task like "write out the csv files for customers sorted by buying amount in each city".
    
    Actually, when the data is not partitioned, then saving sorted data will keep the ordering, as we don't do the external sorting, if I think it correctly. For end-users, it might be hard for them to figure out why saving partitioned and sorted data can't keep the ordering.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72106/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)** for PR 16724 at commit [`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    I think it's hard to reason about how/when to preserve the data ordering when writing, considering partitioning and bucketing. For now it looks like the ordering **should** be preserved if the writer doesn't specify partitioning and bucketing, or the specified partitioning is same as the input data.
    
    If we implement the optimization mentioned above, then we can preserve the ordering if the specified partitioning is same as the input data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    > sortWithinPartitions("userId", "timestamp") doesn't it make the userId continuous?
    
    Oh, yes. I miss looking...
    
    So you recommend that we only optimize this case to preserve the sort order. Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][SQL] Keep sort order of rows after externa...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16724: [SPARK-19352][SQL] Preserve sort order when savin...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16724#discussion_r100687248
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -369,7 +371,78 @@ object FileFormatWriter extends Logging {
             context = taskAttemptContext)
         }
     
    +    // Returns the partition path given a partition key.
    +    private val getPartitionStringFunc = UnsafeProjection.create(
    +      Seq(Concat(partitionStringExpression)), description.partitionColumns)
    +
    +    // Returns the data columns to be written given an input row
    +    private val getOutputRow = UnsafeProjection.create(
    +      description.dataColumns, description.allColumns)
    +
         override def execute(iter: Iterator[InternalRow]): Set[String] = {
    +      val outputOrderingExprs = description.outputOrdering.map(_.child)
    --- End diff --
    
    Mainly is because there are two types of iterators, one is [UnsafeRow, UnsafeRow], another is just [UnsafeRow].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16724
  
    **[Test build #72100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)** for PR 16724 at commit [`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org