You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "mcdull_zhang (Jira)" <ji...@apache.org> on 2022/03/13 12:23:00 UTC

[jira] [Updated] (SPARK-38542) UnsafeHashedRelation should serialize numKeys out

     [ https://issues.apache.org/jira/browse/SPARK-38542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mcdull_zhang updated SPARK-38542:
---------------------------------
    Description: 
At present, UnsafeHashedRelation does not write out numKeys during serialization, so the numKeys of UnsafeHashedRelation obtained by deserialization is equal to 0. The numFields of UnsafeRows returned by UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect data.

 

For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is called.
{code:java}
val broadcastRelation = child.executeBroadcast[HashedRelation]().value
val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
  (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
} else {
  (broadcastRelation.keys(),
    BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable))
}{code}
 

 

 

 

  was:
At present, UnsafeHashedRelation does not write out numKeys during serialization, so the numKeys of UnsafeHashedRelation obtained by deserialization is equal to 0. The numFields of UnsafeRows returned by UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect data.

 

For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is called.
{code:java}
val broadcastRelation = child.executeBroadcast[HashedRelation]().value
val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
  (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
} else {
  (broadcastRelation.keys(),
    BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable))
}{code}
 

 

 


> UnsafeHashedRelation should serialize numKeys out
> -------------------------------------------------
>
>                 Key: SPARK-38542
>                 URL: https://issues.apache.org/jira/browse/SPARK-38542
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: mcdull_zhang
>            Priority: Critical
>
> At present, UnsafeHashedRelation does not write out numKeys during serialization, so the numKeys of UnsafeHashedRelation obtained by deserialization is equal to 0. The numFields of UnsafeRows returned by UnsafeHashedRelation.keys() are all 0, which can lead to missing or incorrect data.
>  
> For example, in SubqueryBroadcastExec, the HashedRelation.keys() function is called.
> {code:java}
> val broadcastRelation = child.executeBroadcast[HashedRelation]().value
> val (iter, expr) = if (broadcastRelation.isInstanceOf[LongHashedRelation]) {
>   (broadcastRelation.keys(), HashJoin.extractKeyExprAt(buildKeys, index))
> } else {
>   (broadcastRelation.keys(),
>     BoundReference(index, buildKeys(index).dataType, buildKeys(index).nullable))
> }{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org