You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by "Haifeng Li (JIRA)" <ji...@apache.org> on 2017/07/30 11:35:00 UTC
[jira] [Updated] (TOREE-428) Can't use case class in the Scala
notebook
[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haifeng Li updated TOREE-428:
-----------------------------
Description:
the version of docker:
jupyter/all-spark-notebook:lastest
the way to start docker:
docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook:latest
or
docker ps -a
docker start -i containerID
the steps:
Visit http://localhost:8888
Start an spylon-kernal notebook
input code above
{code:java}
import spark.implicits._
val p = spark.sparkContext.textFile ("../Data/person.txt")
val pmap = p.map ( _.split (","))
pmap.collect()
{code}
the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), Array(George, Bush, 68), Array(Bill, Clinton, 68))
{code:java}
case class Persons (first_name:String,last_name: String,age:Int)
val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
personRDD.take(1)
{code}
the error message:
{code:java}
org.apache.spark.SparkDriverExecutionException: Execution error
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
... 39 elided
Caused by: java.lang.ArrayStoreException: [LPersons;
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}
The above code is working with the spark-shell. From error message, I speculated that the driver program didn't correctly handle case class Persons to RDD partition.
was:
the version of docker:
jupyter/all-spark-notebook:lastest
the way to start docker:
docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook:latest
or
docker ps -a
docker start -i containerID
the steps:
Visit http://localhost:8888
Start an spylon-kernal notebook
input code above
import spark.implicits._
val p = spark.sparkContext.textFile ("../Data/person.txt")
val pmap = p.map ( _.split (","))
pmap.collect()
the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), Array(George, Bush, 68), Array(Bill, Clinton, 68))
case class Persons (first_name:String,last_name: String,age:Int)
val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
personRDD.take(1)
the error message:
org.apache.spark.SparkDriverExecutionException: Execution error
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
... 39 elided
Caused by: java.lang.ArrayStoreException: [LPersons;
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
The above code is working with the spark-shell. From error message, I speculated that the driver program didn't correctly handle case class Persons to RDD partition.
> Can't use case class in the Scala notebook
> ------------------------------------------
>
> Key: TOREE-428
> URL: https://issues.apache.org/jira/browse/TOREE-428
> Project: TOREE
> Issue Type: Bug
> Components: Build
> Reporter: Haifeng Li
>
> the version of docker:
> jupyter/all-spark-notebook:lastest
> the way to start docker:
> docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook:latest
> or
> docker ps -a
> docker start -i containerID
> the steps:
> Visit http://localhost:8888
> Start an spylon-kernal notebook
> input code above
> {code:java}
> import spark.implicits._
> val p = spark.sparkContext.textFile ("../Data/person.txt")
> val pmap = p.map ( _.split (","))
> pmap.collect()
> {code}
> the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), Array(George, Bush, 68), Array(Bill, Clinton, 68))
> {code:java}
> case class Persons (first_name:String,last_name: String,age:Int)
> val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
> personRDD.take(1)
> {code}
> the error message:
> {code:java}
> org.apache.spark.SparkDriverExecutionException: Execution error
> at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
> ... 39 elided
> Caused by: java.lang.ArrayStoreException: [LPersons;
> at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
> at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
> at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
> at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> The above code is working with the spark-shell. From error message, I speculated that the driver program didn't correctly handle case class Persons to RDD partition.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)