You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liwei Lin (JIRA)" <ji...@apache.org> on 2016/08/17 02:51:20 UTC
[jira] [Commented] (SPARK-17093) Roundtrip encoding of
array> fields is wrong when whole-stage codegen is disabled
[ https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423777#comment-15423777 ]
Liwei Lin commented on SPARK-17093:
-----------------------------------
Oh the interpreted evaluation codepath indeed forgot to {{copy}} somewhere. I'll submit a patch shortly, thanks.
> Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-17093
> URL: https://issues.apache.org/jira/browse/SPARK-17093
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Josh Rosen
> Priority: Critical
>
> The following failing test demonstrates a bug where Spark mis-encodes array-of-struct fields if whole-stage codegen is disabled:
> {code}
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
> val data = Array(Array((1, 2), (3, 4)))
> val ds = spark.sparkContext.parallelize(data).toDS()
> assert(ds.collect() === data)
> }
> {code}
> When wholestage codegen is enabled (the default), this works fine. When it's disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. Because the last element of the array appears to be repeated my best guess is that the interpreted evaluation codepath forgot to {{copy()}} somewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org