You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/07/15 04:12:12 UTC

count on RDD yields NoClassDefFoundError on 1.0.1

Changing the subject since this doesn’t appear to be related to Spark SQL
specifically. I’m on a 1.0.1 EC2 cluster.

On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <mi...@databricks.com>
wrote:

Are you sure the code running on the cluster has been updated?

I’m launching the cluster using spark-ec2 so I’m assuming that’s been taken
care of.

If the above doesn't fix it, the following would be helpful:
>  - The full stack trace
>
Here’s the stack trace:

scala> tweets
res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32

scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756
(task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to
java.lang.NoClassDefFoundError
java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
    at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
    at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
    at org.apache.spark.scheduler.Task.run(Task.scala:51)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN
TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
    at $line36.$read$$iwC.<init>(<console>:6)
    at $line36.$read.<init>(<console>:48)
    at $line36.$read$.<init>(<console>:52)
    at $line36.$read$.<clinit>(<console>)
    at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
    at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
    at org.apache.spark.scheduler.Task.run(Task.scala:51)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN
TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR
TaskSetManager: Task 27.0:4 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 27.0:4 failed 4 times, most recent failure: Exception failure in
TID 791 on host ip-10-231-146-237.ec2.internal:
java.lang.NoClassDefFoundError: Could not initialize class
        $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
        $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
        scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
        org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
        org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        org.apache.spark.scheduler.Task.run(Task.scala:51)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:744)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on
ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04
ERROR TaskSchedulerImpl: Lost executor 1 on
ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04
ERROR TaskSchedulerImpl: Lost executor 2 on
ip-10-144-192-36.ec2.internal: remote Akka client disassociated

The definition of Tweet is as follows:

case class Tweet(
  user: String,
  created_at: String,
  text: String,
  is_retweet: Boolean
)

  - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
> ...").queryExecution))

I was able to reproduce the problem this time without a query.

Nick
​

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Aaron Davidson <il...@gmail.com>.
Yup, that does sound right, assuming you build spark from source before
copying it around (or just use a prebuilt package). You'll want to probably
also want to keep the stuff in "~/spark/conf', though, so copy that out
before mucking with the internals.

Note that if you're just trying to get an updated version running quick and
easy, you can just download the prebuilt package, untar it, and only copy
the file named something like "lib/spark-assembly-1.0.1-hadoop1.0.4.jar"
over the equivalent file inside the ~/spark/lib directory (make sure to
replace the old one, the name doesn't matter), then use
"~/spark-ec2/copy-dir ~/spark/lib".

Note that this jar swap is sort of a hack which should usually work for
maintenance version upgrades (since the jars are the only things that
should change), but is not guaranteed to work, especially for upgrading
between bigger version differences.

Hopefully that makes sense and perhaps even clarifies some of the details
of Spark's deployment.


On Mon, Jul 14, 2014 at 10:14 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Yeah, I’m beginning to think something was missed in spark-ec2 with the
> 1.0.1 release. Now, the 1.0.1 EC2 script does appear to have been updated
> <https://github.com/apache/spark/blob/branch-1.0/ec2/spark_ec2.py#L73>,
> at least partially.
>
> So are you saying just try the following?
>
>    - Launch 1.0.1 cluster normally using spark-ec2
>    - rm ~/spark on all nodes (or maybe just the master)
>    - Download the 1.0.1 source package
>    <http://spark.apache.org/releases/spark-release-1-0-1.html> to the
>    master, and copy it to the slaves using ~/spark-ec2/copy-dir ~/spark
>
> Is that correct? Sorry about the basic question. I’ve been happily
> oblivious to deployment details thanks to spark-ec2.
>
> Nick
> ​
>
>
> On Tue, Jul 15, 2014 at 1:01 AM, Aaron Davidson <il...@gmail.com>
> wrote:
>
>> I don't believe the spark-ec2 scripts have been updated for 1.0.1, so you
>> may have to download the release yourself on the master node, and rsync it
>> (using "~/spark-ec2/copy-dir ~/spark") to the other workers.
>>
>>
>> On Mon, Jul 14, 2014 at 9:49 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> For the record, this same code against the same dataset works fine on a
>>> 1.0.0 EC2 cluster.
>>>
>>>
>>> On Tue, Jul 15, 2014 at 12:36 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> Like this:
>>>>
>>>> val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2), t(3).toBoolean))
>>>>
>>>> raw is just an RDD of tab-delimited strings.
>>>>
>>>> scala> raw
>>>> res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition at <console>:23
>>>>
>>>> Nick
>>>> ​
>>>>
>>>>
>>>> On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <hu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Nick,
>>>>>
>>>>> How was tweets generated?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yin
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> Changing the subject since this doesn’t appear to be related to Spark
>>>>>> SQL specifically. I’m on a 1.0.1 EC2 cluster.
>>>>>>
>>>>>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>>>>>> michael@databricks.com> wrote:
>>>>>>
>>>>>> Are you sure the code running on the cluster has been updated?
>>>>>>
>>>>>> I’m launching the cluster using spark-ec2 so I’m assuming that’s
>>>>>> been taken care of.
>>>>>>
>>>>>> If the above doesn't fix it, the following would be helpful:
>>>>>>>  - The full stack trace
>>>>>>>
>>>>>> Here’s the stack trace:
>>>>>>
>>>>>> scala> tweets
>>>>>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>>>>>
>>>>>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>>>>>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>>>>>> java.lang.ExceptionInInitializerError
>>>>>>     at $line36.$read$$iwC.<init>(<console>:6)
>>>>>>     at $line36.$read.<init>(<console>:48)
>>>>>>     at $line36.$read$.<init>(<console>:52)
>>>>>>     at $line36.$read$.<clinit>(<console>)
>>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>         java.lang.Thread.run(Thread.java:744)
>>>>>> Driver stacktrace:
>>>>>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>     at scala.Option.foreach(Option.scala:236)
>>>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>
>>>>>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>>>>>>
>>>>>> The definition of Tweet is as follows:
>>>>>>
>>>>>> case class Tweet(
>>>>>>   user: String,
>>>>>>   created_at: String,
>>>>>>   text: String,
>>>>>>   is_retweet: Boolean
>>>>>> )
>>>>>>
>>>>>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>>>>>> ...").queryExecution))
>>>>>>
>>>>>> I was able to reproduce the problem this time without a query.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Nicholas Chammas <ni...@gmail.com>.
Yeah, I’m beginning to think something was missed in spark-ec2 with the
1.0.1 release. Now, the 1.0.1 EC2 script does appear to have been updated
<https://github.com/apache/spark/blob/branch-1.0/ec2/spark_ec2.py#L73>, at
least partially.

So are you saying just try the following?

   - Launch 1.0.1 cluster normally using spark-ec2
   - rm ~/spark on all nodes (or maybe just the master)
   - Download the 1.0.1 source package
   <http://spark.apache.org/releases/spark-release-1-0-1.html> to the
   master, and copy it to the slaves using ~/spark-ec2/copy-dir ~/spark

Is that correct? Sorry about the basic question. I’ve been happily
oblivious to deployment details thanks to spark-ec2.

Nick
​


On Tue, Jul 15, 2014 at 1:01 AM, Aaron Davidson <il...@gmail.com> wrote:

> I don't believe the spark-ec2 scripts have been updated for 1.0.1, so you
> may have to download the release yourself on the master node, and rsync it
> (using "~/spark-ec2/copy-dir ~/spark") to the other workers.
>
>
> On Mon, Jul 14, 2014 at 9:49 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> For the record, this same code against the same dataset works fine on a
>> 1.0.0 EC2 cluster.
>>
>>
>> On Tue, Jul 15, 2014 at 12:36 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Like this:
>>>
>>> val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2), t(3).toBoolean))
>>>
>>> raw is just an RDD of tab-delimited strings.
>>>
>>> scala> raw
>>> res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition at <console>:23
>>>
>>> Nick
>>> ​
>>>
>>>
>>> On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <hu...@gmail.com>
>>> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> How was tweets generated?
>>>>
>>>> Thanks,
>>>>
>>>> Yin
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> Changing the subject since this doesn’t appear to be related to Spark
>>>>> SQL specifically. I’m on a 1.0.1 EC2 cluster.
>>>>>
>>>>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>>
>>>>> Are you sure the code running on the cluster has been updated?
>>>>>
>>>>> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
>>>>> taken care of.
>>>>>
>>>>> If the above doesn't fix it, the following would be helpful:
>>>>>>  - The full stack trace
>>>>>>
>>>>> Here’s the stack trace:
>>>>>
>>>>> scala> tweets
>>>>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>>>>
>>>>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>>>>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>>>>> java.lang.ExceptionInInitializerError
>>>>>     at $line36.$read$$iwC.<init>(<console>:6)
>>>>>     at $line36.$read.<init>(<console>:48)
>>>>>     at $line36.$read$.<init>(<console>:52)
>>>>>     at $line36.$read$.<clinit>(<console>)
>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>         java.lang.Thread.run(Thread.java:744)
>>>>> Driver stacktrace:
>>>>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>     at scala.Option.foreach(Option.scala:236)
>>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>>>>>
>>>>> The definition of Tweet is as follows:
>>>>>
>>>>> case class Tweet(
>>>>>   user: String,
>>>>>   created_at: String,
>>>>>   text: String,
>>>>>   is_retweet: Boolean
>>>>> )
>>>>>
>>>>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>>>>> ...").queryExecution))
>>>>>
>>>>> I was able to reproduce the problem this time without a query.
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>
>>>>
>>>
>>
>

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Aaron Davidson <il...@gmail.com>.
I don't believe the spark-ec2 scripts have been updated for 1.0.1, so you
may have to download the release yourself on the master node, and rsync it
(using "~/spark-ec2/copy-dir ~/spark") to the other workers.


On Mon, Jul 14, 2014 at 9:49 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> For the record, this same code against the same dataset works fine on a
> 1.0.0 EC2 cluster.
>
>
> On Tue, Jul 15, 2014 at 12:36 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Like this:
>>
>> val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2), t(3).toBoolean))
>>
>> raw is just an RDD of tab-delimited strings.
>>
>> scala> raw
>> res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition at <console>:23
>>
>> Nick
>> ​
>>
>>
>> On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <hu...@gmail.com> wrote:
>>
>>> Hi Nick,
>>>
>>> How was tweets generated?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>>
>>> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> Changing the subject since this doesn’t appear to be related to Spark
>>>> SQL specifically. I’m on a 1.0.1 EC2 cluster.
>>>>
>>>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>> Are you sure the code running on the cluster has been updated?
>>>>
>>>> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
>>>> taken care of.
>>>>
>>>> If the above doesn't fix it, the following would be helpful:
>>>>>  - The full stack trace
>>>>>
>>>> Here’s the stack trace:
>>>>
>>>> scala> tweets
>>>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>>>
>>>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>>>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>>>> java.lang.ExceptionInInitializerError
>>>>     at $line36.$read$$iwC.<init>(<console>:6)
>>>>     at $line36.$read.<init>(<console>:48)
>>>>     at $line36.$read$.<init>(<console>:52)
>>>>     at $line36.$read$.<clinit>(<console>)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         java.lang.Thread.run(Thread.java:744)
>>>> Driver stacktrace:
>>>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>     at scala.Option.foreach(Option.scala:236)
>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>
>>>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>>>>
>>>> The definition of Tweet is as follows:
>>>>
>>>> case class Tweet(
>>>>   user: String,
>>>>   created_at: String,
>>>>   text: String,
>>>>   is_retweet: Boolean
>>>> )
>>>>
>>>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>>>> ...").queryExecution))
>>>>
>>>> I was able to reproduce the problem this time without a query.
>>>>
>>>> Nick
>>>> ​
>>>>
>>>
>>>
>>
>

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Nicholas Chammas <ni...@gmail.com>.
For the record, this same code against the same dataset works fine on a
1.0.0 EC2 cluster.


On Tue, Jul 15, 2014 at 12:36 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Like this:
>
> val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2), t(3).toBoolean))
>
> raw is just an RDD of tab-delimited strings.
>
> scala> raw
> res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition at <console>:23
>
> Nick
> ​
>
>
> On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <hu...@gmail.com> wrote:
>
>> Hi Nick,
>>
>> How was tweets generated?
>>
>> Thanks,
>>
>> Yin
>>
>>
>> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Changing the subject since this doesn’t appear to be related to Spark
>>> SQL specifically. I’m on a 1.0.1 EC2 cluster.
>>>
>>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>>> michael@databricks.com> wrote:
>>>
>>> Are you sure the code running on the cluster has been updated?
>>>
>>> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
>>> taken care of.
>>>
>>> If the above doesn't fix it, the following would be helpful:
>>>>  - The full stack trace
>>>>
>>> Here’s the stack trace:
>>>
>>> scala> tweets
>>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>>
>>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>>> java.lang.ExceptionInInitializerError
>>>     at $line36.$read$$iwC.<init>(<console>:6)
>>>     at $line36.$read.<init>(<console>:48)
>>>     at $line36.$read$.<init>(<console>:52)
>>>     at $line36.$read$.<clinit>(<console>)
>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         java.lang.Thread.run(Thread.java:744)
>>> Driver stacktrace:
>>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>     at scala.Option.foreach(Option.scala:236)
>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>>>
>>> The definition of Tweet is as follows:
>>>
>>> case class Tweet(
>>>   user: String,
>>>   created_at: String,
>>>   text: String,
>>>   is_retweet: Boolean
>>> )
>>>
>>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>>> ...").queryExecution))
>>>
>>> I was able to reproduce the problem this time without a query.
>>>
>>> Nick
>>> ​
>>>
>>
>>
>

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Nicholas Chammas <ni...@gmail.com>.
Like this:

val tweets = raw.map(_.split('\t')).map(t => Tweet(t(0), t(1), t(2),
t(3).toBoolean))

raw is just an RDD of tab-delimited strings.

scala> raw
res35: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at repartition
at <console>:23

Nick
​


On Tue, Jul 15, 2014 at 12:16 AM, Yin Huai <hu...@gmail.com> wrote:

> Hi Nick,
>
> How was tweets generated?
>
> Thanks,
>
> Yin
>
>
> On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Changing the subject since this doesn’t appear to be related to Spark SQL
>> specifically. I’m on a 1.0.1 EC2 cluster.
>>
>> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <
>> michael@databricks.com> wrote:
>>
>> Are you sure the code running on the cluster has been updated?
>>
>> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
>> taken care of.
>>
>> If the above doesn't fix it, the following would be helpful:
>>>  - The full stack trace
>>>
>> Here’s the stack trace:
>>
>> scala> tweets
>> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>>
>> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
>> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
>> java.lang.ExceptionInInitializerError
>>     at $line36.$read$$iwC.<init>(<console>:6)
>>     at $line36.$read.<init>(<console>:48)
>>     at $line36.$read$.<init>(<console>:52)
>>     at $line36.$read$.<clinit>(<console>)
>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         java.lang.Thread.run(Thread.java:744)
>> Driver stacktrace:
>>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>     at scala.Option.foreach(Option.scala:236)
>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>>
>> The definition of Tweet is as follows:
>>
>> case class Tweet(
>>   user: String,
>>   created_at: String,
>>   text: String,
>>   is_retweet: Boolean
>> )
>>
>>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>>> ...").queryExecution))
>>
>> I was able to reproduce the problem this time without a query.
>>
>> Nick
>> ​
>>
>
>

Re: count on RDD yields NoClassDefFoundError on 1.0.1

Posted by Yin Huai <hu...@gmail.com>.
Hi Nick,

How was tweets generated?

Thanks,

Yin


On Mon, Jul 14, 2014 at 7:12 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Changing the subject since this doesn’t appear to be related to Spark SQL
> specifically. I’m on a 1.0.1 EC2 cluster.
>
> On Mon, Jul 14, 2014 at 12:05 AM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
> Are you sure the code running on the cluster has been updated?
>
> I’m launching the cluster using spark-ec2 so I’m assuming that’s been
> taken care of.
>
> If the above doesn't fix it, the following would be helpful:
>>  - The full stack trace
>>
> Here’s the stack trace:
>
> scala> tweets
> res13: org.apache.spark.rdd.RDD[Tweet] = MappedRDD[18] at map at <console>:32
>
> scala> tweets.count14/07/15 02:04:04 WARN TaskSetManager: Lost TID 756 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError
> java.lang.NoClassDefFoundError: Could not initialize class $line36.$read$
>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 749 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 747 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
> java.lang.ExceptionInInitializerError
>     at $line36.$read$$iwC.<init>(<console>:6)
>     at $line36.$read.<init>(<console>:48)
>     at $line36.$read$.<init>(<console>:52)
>     at $line36.$read$.<clinit>(<console>)
>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>     at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>     at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>     at org.apache.spark.scheduler.Task.run(Task.scala:51)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 746 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 753 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 752 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 750 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 755 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 745 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 748 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 751 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 754 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 757 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 758 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 760 (task 27.0:1)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 759 (task 27.0:2)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 761 (task 27.0:12)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 762 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 763 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 764 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 765 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 766 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 771 (task 27.0:13)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 767 (task 27.0:3)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 769 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 768 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 770 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 772 (task 27.0:11)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 774 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 776 (task 27.0:17)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 775 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 777 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 773 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 778 (task 27.0:21)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 781 (task 27.0:4)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 780 (task 27.0:20)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 779 (task 27.0:10)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 782 (task 27.0:0)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 784 (task 27.0:5)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 783 (task 27.0:8)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 786 (task 27.0:14)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 785 (task 27.0:7)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 787 (task 27.0:16)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 788 (task 27.0:9)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 789 (task 27.0:15)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 790 (task 27.0:6)14/07/15 02:04:04 WARN TaskSetManager: Lost TID 791 (task 27.0:4)14/07/15 02:04:04 ERROR TaskSetManager: Task 27.0:4 failed 4 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 27.0:4 failed 4 times, most recent failure: Exception failure in TID 791 on host ip-10-231-146-237.ec2.internal: java.lang.NoClassDefFoundError: Could not initialize class
>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>         $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:32)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>         org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>         org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:744)
> Driver stacktrace:
>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>     at scala.Option.foreach(Option.scala:236)
>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> scala> 14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 0 on ip-10-237-184-110.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 1 on ip-10-231-146-237.ec2.internal: Uncaught exception14/07/15 02:04:04 ERROR TaskSchedulerImpl: Lost executor 2 on ip-10-144-192-36.ec2.internal: remote Akka client disassociated
>
> The definition of Tweet is as follows:
>
> case class Tweet(
>   user: String,
>   created_at: String,
>   text: String,
>   is_retweet: Boolean
> )
>
>   - The queryExecution from the SchemaRDD (i.e. println(sql("SELECT
>> ...").queryExecution))
>
> I was able to reproduce the problem this time without a query.
>
> Nick
> ​
>