You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tycho Grouwstra (JIRA)" <ji...@apache.org> on 2015/10/30 23:19:27 UTC

[jira] [Created] (SPARK-11431) Allow exploding arrays of structs in DataFrames

Tycho Grouwstra created SPARK-11431:
---------------------------------------

             Summary: Allow exploding arrays of structs in DataFrames
                 Key: SPARK-11431
                 URL: https://issues.apache.org/jira/browse/SPARK-11431
             Project: Spark
          Issue Type: New Feature
          Components: SQL
            Reporter: Tycho Grouwstra


I am creating DataFrames from some [JSON data](http://www.kayak.com/h/explore/api?airport=AMS), and would like to explode an array of structs (as are common in JSON) to their own rows so I could start analyzing the data using GraphX. I believe many others might have use for this as well, since most web data is in JSON format.

This feature would build upon the existing `explode` functionality added to DataFrames by [~marmbrus], which currently errors when you call it on such arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor function to infer column types -- this approach is insufficient in the case of Rows, since their type does not contain the required info. The alternative here would be to instead grab the schema info from the existing schema for such cases.

I'm trying to implement a patch that might add this functionality, so stay tuned until I've figured that out. I'm new here though so I'll probably have use for some feedback...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org