You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by William Kinney <wi...@gmail.com> on 2016/10/07 02:58:40 UTC

spark 2.0.1, union on non-null and null String dataframes causing ClassCastException UTF8String cannot be cast to java.lang.String

It seems when doing a union on a DF where one DF contains lit(null) or null
for a String, causes a:
java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String
cannot be cast to java.lang.String

when doing getString(i) on a Row within forEachPartition.

Stack:

Caused by: java.lang.ClassCastException:
org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String
  at org.apache.spark.sql.Row$class.getString(Row.scala:249)
  at
org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:192)
  at $anonfun$1.apply(<console>:48)


Can easily reproduce with:

val df0 = spark.sparkContext.parallelize(1 to 10).map { i =>
  (i, i.toString)
}.toDF("i", "iString")

val a = df0.select($"i" as "i", lit(null) as "iString")
/**
  *  b.printSchema
  *  root
  *   |-- i: integer (nullable = false)
  *   |-- iString: string (nullable = true)
  */
val b = a.union(df0)

b.foreachPartition { p =>
  if (p.hasNext) {
    val row = p.next()
    // throws java.lang.ClassCastException:
org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String
    println(row.getString(1))
  }
}

Has anyone seen this issue? Shall I create a ticket?

Thanks.
Will