You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "angerszhu (Jira)" <ji...@apache.org> on 2021/02/25 11:44:00 UTC

[jira] [Created] (SPARK-34537) Repartition miss/duplicated data

angerszhu created SPARK-34537:
---------------------------------

             Summary: Repartition miss/duplicated data
                 Key: SPARK-34537
                 URL: https://issues.apache.org/jira/browse/SPARK-34537
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: angerszhu
         Attachments: image-2021-02-25-19-43-49-687.png

We have a SQL
{code:java}
INSERT OVERWRITE TABLE t1 
SELECT /*+ repartition(300) */ * from t2.{code}
Below is SQL metrics of the repartition ShuffleExchange. we can see that the shuffle record written and records read is not same. 

In the result table, there are some data missing and some data duplicated.

!image-2021-02-25-19-41-09-575.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org