You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Maciej Bryński (JIRA)" <ji...@apache.org> on 2016/10/05 11:52:20 UTC

[jira] [Created] (SPARK-17786) [SPARK 2.0] Sorting algorithm gives higher skewness of output

Maciej Bryński created SPARK-17786:
--------------------------------------

             Summary: [SPARK 2.0] Sorting algorithm gives higher skewness of output
                 Key: SPARK-17786
                 URL: https://issues.apache.org/jira/browse/SPARK-17786
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.1
            Reporter: Maciej Bryński


Hi,
I'm using df.sort("column") to sort my data before saving it to parquet.

When using Spark 1.6.2 all partitions were similar in size.
On Spark 2.0.0 three of the partitions are much bigger than rest.

Can I go back to previous behaviour of sorting ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org