You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Heping Wang (Jira)" <ji...@apache.org> on 2023/01/02 08:26:00 UTC

[jira] [Commented] (SPARK-41813) UnsafeHashedRelation read method needs to confirm the correctness of the data

    [ https://issues.apache.org/jira/browse/SPARK-41813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653525#comment-17653525 ] 

Heping Wang commented on SPARK-41813:
-------------------------------------

If it can be improved I will submit a PR to fix it



{code:java}
// in org.apache.spark.sql.execution.joins.UnsafeHashedRelation#read
assert(binaryMap.numKeys() == nKeys && binaryMap.numValues() == nValues, "The data actually read needs to be matched with the keyNum and valueNum of the Header")
{code}



> UnsafeHashedRelation read method needs to confirm the correctness of the data
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-41813
>                 URL: https://issues.apache.org/jira/browse/SPARK-41813
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.8, 3.3.1
>            Reporter: Heping Wang
>            Priority: Major
>              Labels: improvement
>
> Recently, we have encountered the thread-safe issue of[ [SPARK-31511]|https://issues.apache.org/jira/browse/SPARK-31511] in production.  The version 2.4.3 we use has not been fixed yet, which leads to data errors.  I think this is a serious error, the data broadcast by the Driver is inconsistent with the data of the Executor.  The Executor side should confirm the correctness of the data when reading the data.  The numKeys and numValues read from the file header should be consistent with the real data read.  This judgment should be added to prevent wrong data from being calculated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org