You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by William Kinney <wi...@gmail.com> on 2016/10/07 02:58:40 UTC
spark 2.0.1, union on non-null and null String dataframes causing
ClassCastException UTF8String cannot be cast to java.lang.String
It seems when doing a union on a DF where one DF contains lit(null) or null
for a String, causes a:
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String
cannot be cast to java.lang.String
when doing getString(i) on a Row within forEachPartition.
Stack:
Caused by: java.lang.ClassCastException:
org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String
at org.apache.spark.sql.Row$class.getString(Row.scala:249)
at
org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:192)
at $anonfun$1.apply(<console>:48)
Can easily reproduce with:
val df0 = spark.sparkContext.parallelize(1 to 10).map { i =>
(i, i.toString)
}.toDF("i", "iString")
val a = df0.select($"i" as "i", lit(null) as "iString")
/**
* b.printSchema
* root
* |-- i: integer (nullable = false)
* |-- iString: string (nullable = true)
*/
val b = a.union(df0)
b.foreachPartition { p =>
if (p.hasNext) {
val row = p.next()
// throws java.lang.ClassCastException:
org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String
println(row.getString(1))
}
}
Has anyone seen this issue? Shall I create a ticket?
Thanks.
Will