You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by invkrh <in...@gmail.com> on 2014/12/03 17:50:23 UTC

scala.MatchError on SparkSQL when creating ArrayType of StructType

Hi,

I am using SparkSQL on 1.1.0 branch.

The following code leads to a scala.MatchError 
at
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)

val scm = StructType(*inputRDD*.schema.fields.init :+
      StructField("list",
        ArrayType(
          StructType(
            Seq(StructField("*date*", StringType, nullable = *false*),
              StructField("*nbPurchase*", IntegerType, nullable =
*false*)))),
        nullable = false))

// *purchaseRDD* is RDD[sql.ROW] whose schema is corresponding to scm. It is
transformed from *inputRDD*
val schemaRDD = hiveContext.applySchema(purchaseRDD, scm)
schemaRDD.registerTempTable("t_purchase")

Here's the stackTrace:
scala.MatchError: ArrayType(StructType(List(StructField(date,StringType,
*true* ), StructField(n_reachat,IntegerType, *true* ))),true) (of class
org.apache.spark.sql.catalyst.types.ArrayType)
	at
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
	at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
	at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)
	at
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84)
	at
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:66)
	at
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:50)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149)
	at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
	at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

The strange thing is that *nullable* of *date* and *nbPurchase* field are
set to true while it were false in the code. If I set both to *true*, it
works. But, in fact, they should not be nullable.

Here's what I find at Cast.scala:247 on 1.1.0 branch

  private[this] lazy val cast: Any => Any = dataType match {
    case StringType => castToString
    case BinaryType => castToBinary
    case DecimalType => castToDecimal
    case TimestampType => castToTimestamp
    case BooleanType => castToBoolean
    case ByteType => castToByte
    case ShortType => castToShort
    case IntegerType => castToInt
    case FloatType => castToFloat
    case LongType => castToLong
    case DoubleType => castToDouble
  }

Any idea? Thank you.

Hao



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/scala-MatchError-on-SparkSQL-when-creating-ArrayType-of-StructType-tp9623.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: scala.MatchError on SparkSQL when creating ArrayType of StructType

Posted by Yin Huai <hu...@gmail.com>.
Seems you hit https://issues.apache.org/jira/browse/SPARK-4245. It was
fixed in 1.2.

Thanks,

Yin

On Wed, Dec 3, 2014 at 11:50 AM, invkrh <in...@gmail.com> wrote:

> Hi,
>
> I am using SparkSQL on 1.1.0 branch.
>
> The following code leads to a scala.MatchError
> at
>
> org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
>
> val scm = StructType(*inputRDD*.schema.fields.init :+
>       StructField("list",
>         ArrayType(
>           StructType(
>             Seq(StructField("*date*", StringType, nullable = *false*),
>               StructField("*nbPurchase*", IntegerType, nullable =
> *false*)))),
>         nullable = false))
>
> // *purchaseRDD* is RDD[sql.ROW] whose schema is corresponding to scm. It
> is
> transformed from *inputRDD*
> val schemaRDD = hiveContext.applySchema(purchaseRDD, scm)
> schemaRDD.registerTempTable("t_purchase")
>
> Here's the stackTrace:
> scala.MatchError: ArrayType(StructType(List(StructField(date,StringType,
> *true* ), StructField(n_reachat,IntegerType, *true* ))),true) (of class
> org.apache.spark.sql.catalyst.types.ArrayType)
>         at
>
> org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
>         at
> org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
>         at
> org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)
>         at
>
> org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84)
>         at
>
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:66)
>         at
>
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:50)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
> $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> The strange thing is that *nullable* of *date* and *nbPurchase* field are
> set to true while it were false in the code. If I set both to *true*, it
> works. But, in fact, they should not be nullable.
>
> Here's what I find at Cast.scala:247 on 1.1.0 branch
>
>   private[this] lazy val cast: Any => Any = dataType match {
>     case StringType => castToString
>     case BinaryType => castToBinary
>     case DecimalType => castToDecimal
>     case TimestampType => castToTimestamp
>     case BooleanType => castToBoolean
>     case ByteType => castToByte
>     case ShortType => castToShort
>     case IntegerType => castToInt
>     case FloatType => castToFloat
>     case LongType => castToLong
>     case DoubleType => castToDouble
>   }
>
> Any idea? Thank you.
>
> Hao
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/scala-MatchError-on-SparkSQL-when-creating-ArrayType-of-StructType-tp9623.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>