You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "angerszhu (Jira)" <ji...@apache.org> on 2021/02/25 11:44:00 UTC
[jira] [Created] (SPARK-34537) Repartition miss/duplicated data
angerszhu created SPARK-34537:
---------------------------------
Summary: Repartition miss/duplicated data
Key: SPARK-34537
URL: https://issues.apache.org/jira/browse/SPARK-34537
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.1
Reporter: angerszhu
Attachments: image-2021-02-25-19-43-49-687.png
We have a SQL
{code:java}
INSERT OVERWRITE TABLE t1
SELECT /*+ repartition(300) */ * from t2.{code}
Below is SQL metrics of the repartition ShuffleExchange. we can see that the shuffle record written and records read is not same.
In the result table, there are some data missing and some data duplicated.
!image-2021-02-25-19-41-09-575.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org