You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Eugene Morozov <ev...@gmail.com> on 2015/10/10 16:06:00 UTC

DataFrame Explode for ArrayBuffer[Any]

Hi,

I have a DataFrame with several columns I'd like to explode. All of the
columns I have to explode has an ArrayBuffer type of some other types
inside.
I'd say that the following code is totally legit to use it as explode
function for any given ArrayBuffer - my assumption is that for any given
row with a column that has a collection it will produce several rows with
specific objects in that column:

dataFrame.explode(inputColumn, outputColumn) { a: ArrayBuffer[Any] => a }

But instead I've got exception with
java.lang.UnsupportedOperationException: Schema for type Any is not
supported
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
at org.apache.spark.sql.DataFrame.explode(DataFrame.scala:1116)

Is there any way I could do that what I want?


I'm not really good at scala, yet, so if I know the exact type of
particular ArrayBuffer's element, how can I specify it instead of Any?
Let's say I have the following:
val dataType = ..., then how can I use it to use explode?
dataFrame.explode(inputColumn, outputColumn) { a: ArrayBuffer[  /* dataType
*/  ] => a }

Thank you in advance.
--
Be well!
Jean Morozov