You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jarred Li (Jira)" <ji...@apache.org> on 2023/01/15 03:45:00 UTC

[jira] [Created] (SPARK-42069) Data duplicate or data lost with non-deterministic function

Jarred Li created SPARK-42069:
---------------------------------

             Summary: Data duplicate or data lost with non-deterministic function
                 Key: SPARK-42069
                 URL: https://issues.apache.org/jira/browse/SPARK-42069
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.3, 3.1.0, 3.0.0
            Reporter: Jarred Li


When write table with shuffle data and non-deterministic function, data may be duplicate or lost due to retry task attempt.

 

For example:
{quote}insert overwrite table target_table partition(ds)
select ... from a join b join c...
ditributed by ds, cast(rand()*10 as int){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org