You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jarred Li (Jira)" <ji...@apache.org> on 2023/01/15 03:45:00 UTC
[jira] [Created] (SPARK-42069) Data duplicate or data lost with non-deterministic function
Jarred Li created SPARK-42069:
---------------------------------
Summary: Data duplicate or data lost with non-deterministic function
Key: SPARK-42069
URL: https://issues.apache.org/jira/browse/SPARK-42069
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.2.3, 3.1.0, 3.0.0
Reporter: Jarred Li
When write table with shuffle data and non-deterministic function, data may be duplicate or lost due to retry task attempt.
For example:
{quote}insert overwrite table target_table partition(ds)
select ... from a join b join c...
ditributed by ds, cast(rand()*10 as int){quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org