You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/03/23 14:30:41 UTC

[jira] [Commented] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

    [ https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938438#comment-15938438 ] 

Apache Spark commented on SPARK-19716:
--------------------------------------

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/17398

> Dataset should allow by-name resolution for struct type elements in array
> -------------------------------------------------------------------------
>
>                 Key: SPARK-19716
>                 URL: https://issues.apache.org/jira/browse/SPARK-19716
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array<struct<a: int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we will add cast for each field, except struct type field, because struct type is flexible, the number of columns can mismatch. We should probably also skip cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org