You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Josh Mahonin (JIRA)" <ji...@apache.org> on 2015/09/23 19:57:04 UTC
[jira] [Commented] (PHOENIX-2287) Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row

    [ https://issues.apache.org/jira/browse/PHOENIX-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904916#comment-14904916 ] 

Josh Mahonin commented on PHOENIX-2287:
---------------------------------------

Updated patch adds support for Spark 1.5.0, and is backwards compatible back down to 1.3.0 (manually tested, Spark version profiles may be worth looking at in the future)

In 1.5.0, they've gone and explicitly hidden the GenericMutableRow data structure. Fortunately, we are able to the external-facing 'Row' data type, which is backwards compatible, and should remain compatible in future releases as well.

As part of the update, Spark SQL deprecated a constructor on their 'DecimalType'. In updating this, I exposed a new issue, which is that we don't carry-forward the precision and scale of the underlying PDecimal type through to Spark. For now I've set it to use the Spark defaults, but I'll create another issue for that specifically. I've included an ignored integration test in this patch as well.

[~maghamravikiran] Could you take a look?

> Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2287
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2287
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.2
>         Environment: - HBase 1.1.1 running in standalone mode on OS X
> - Spark 1.5.0
> - Phoenix 4.5.2
>            Reporter: Babar Tareen
>         Attachments: PHOENIX-2287.patch
>
>
> Running the DataFrame example on Spark Plugin page (https://phoenix.apache.org/phoenix_spark.html) results in following exception. The same code works as expected with Spark 1.4.1.
> {code:java}
>     import org.apache.spark.SparkContext
>     import org.apache.spark.sql.SQLContext
>     import org.apache.phoenix.spark._
>     val sc = new SparkContext("local", "phoenix-test")
>     val sqlContext = new SQLContext(sc)
>     val df = sqlContext.load(
>       "org.apache.phoenix.spark",
>       Map("table" -> "TABLE1", "zkUrl" -> "127.0.0.1:2181")
>     )
>     df
>       .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
>       .select(df("ID"))
>       .show
> {code}
> Exception
> {quote}
> java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
>     at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) ~[scala-library-2.11.4.jar:na]
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) ~[scala-library-2.11.4.jar:na]
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) ~[scala-library-2.11.4.jar:na]
>     at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) ~[spark-sql_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.scheduler.Task.run(Task.scala:88) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.11-1.5.0.jar:1.5.0]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
>     at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)