You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Honey Joshi <ho...@ideata-analytics.com> on 2014/07/01 11:41:35 UTC

Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

Hi,
I am trying to run a project which takes data as a DStream and dumps the
data in the Shark table after various operations. I am getting the
following error :

Exception in thread "main" org.apache.spark.SparkException: Job aborted:
Task 0.0:0 failed 1 times (most recent failure: Exception failure:
java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
be cast to org.apache.spark.rdd.HadoopPartition)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
	at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
	at scala.Option.foreach(Option.scala:236)
	at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
	at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Can someone please explain the cause of this error, I am also using a
Spark Context with the existing Streaming Context.

Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

Posted by Honey Joshi <ho...@ideata-analytics.com>.

On Wed, July 2, 2014 2:00 am, Mayur Rustagi wrote:
> two job context cannot share data, are you collecting the data to the
> master & then sending it to the other context?
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
>
> On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi <
> honeyjoshi@ideata-analytics.com> wrote:
>
>> On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
>>
>>> Ideally you should be converting RDD to schemardd ?
>>> You are creating UnionRDD to join across dstream rdd?
>>>
>>>
>>>
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
>>> <honeyjoshi@ideata-analytics.com
>>>
>>>
>>>> wrote:
>>>>
>>>>
>>>
>>>> Hi,
>>>> I am trying to run a project which takes data as a DStream and dumps
>>>> the data in the Shark table after various operations. I am getting
>>>> the following error :
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>> aborted:
>>>> Task 0.0:0 failed 1 times (most recent failure: Exception failure:
>>>> java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition
>>>> cannot be cast to org.apache.spark.rdd.HadoopPartition) at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
>>>> ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$s
>>>> ched uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>>>> at
>>>>
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArra
>>>> y.sc ala:59)
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> at org.apache.spark.scheduler.DAGScheduler.org
>>>> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala
>>>> :102
>>>> 6)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
>>>> ply( DAGScheduler.scala:619)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.ap
>>>> ply( DAGScheduler.scala:619)
>>>> at scala.Option.foreach(Option.scala:236) at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.s
>>>> cala :619)
>>>> at
>>>>
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$a
>>>> nonf un$receive$1.applyOrElse(DAGScheduler.scala:207)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
>>>> akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
>>>> akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> at
>>>>
>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Ab
>>>> stra ctDispatcher.scala:386)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260
>>>> )
>>>> at
>>>>
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPo
>>>> ol.j ava:1339)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:
>>>> 1979
>>>> )
>>>> at
>>>>
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerTh
>>>> read .java:107)
>>>>
>>>>
>>>>
>>>> Can someone please explain the cause of this error, I am also using
>>>> a Spark Context with the existing Streaming Context.
>>>>
>>>>
>>>>
>>>
>>
>> I am using spark 0.9.0-Incubating, so it doesnt have anything to do
>> with schemaRDD.This error is probably coming when I am trying to use one
>> spark context and one shark context in the same job.Is there any way to
>> incorporate two context in one job? Regards
>>
>>
>> Honey Joshi
>> Ideata-Analytics
>>
>>
>>
>
Both of these contexts are independently executing but they were still
giving us issues, mostly because of the lazy evaluation in scala.This
error is probably coming when I am trying to use one spark context and one
shark context in the same job.Got it resolved by stopping the existing
spark context before calling the shark context. Thanks for your help
Mayur.

Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

Posted by Mayur Rustagi <ma...@gmail.com>.

two job context cannot share data, are you collecting the data to the
master & then sending it to the other context?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Jul 2, 2014 at 11:57 AM, Honey Joshi <
honeyjoshi@ideata-analytics.com> wrote:

> On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
> > Ideally you should be converting RDD to schemardd ?
> > You are creating UnionRDD to join across dstream rdd?
> >
> >
> >
> > Mayur Rustagi
> > Ph: +1 (760) 203 3257
> > http://www.sigmoidanalytics.com
> > @mayur_rustagi <https://twitter.com/mayur_rustagi>
> >
> >
> >
> >
> > On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
> > <honeyjoshi@ideata-analytics.com
> >
> >> wrote:
> >>
> >
> >> Hi,
> >> I am trying to run a project which takes data as a DStream and dumps the
> >>  data in the Shark table after various operations. I am getting the
> >> following error :
> >>
> >> Exception in thread "main" org.apache.spark.SparkException: Job
> >> aborted:
> >> Task 0.0:0 failed 1 times (most recent failure: Exception failure:
> >> java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
> >>  be cast to org.apache.spark.rdd.HadoopPartition) at
> >>
> >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
> >> uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
> >> at
> >>
> >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
> >> uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
> >> at
> >>
> >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc
> >> ala:59)
> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >> at org.apache.spark.scheduler.DAGScheduler.org
> >> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102
> >> 6)
> >> at
> >>
> >> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
> >> DAGScheduler.scala:619)
> >> at
> >>
> >> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
> >> DAGScheduler.scala:619)
> >> at scala.Option.foreach(Option.scala:236) at
> >>
> >> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala
> >> :619)
> >> at
> >>
> >> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf
> >> un$receive$1.applyOrElse(DAGScheduler.scala:207)
> >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
> >> akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
> >> akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >> at
> >>
> >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra
> >> ctDispatcher.scala:386)
> >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >> at
> >>
> >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j
> >> ava:1339)
> >> at
> >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979
> >> )
> >> at
> >>
> >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
> >> .java:107)
> >>
> >>
> >> Can someone please explain the cause of this error, I am also using a
> >> Spark Context with the existing Streaming Context.
> >>
> >>
> >
>
> I am using spark 0.9.0-Incubating, so it doesnt have anything to do with
> schemaRDD.This error is probably coming when I am trying to use one spark
> context and one shark context in the same job.Is there any way to
> incorporate two context in one job?
> Regards
>
> Honey Joshi
> Ideata-Analytics
>
>

Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

Posted by Honey Joshi <ho...@ideata-analytics.com>.

On Wed, July 2, 2014 1:11 am, Mayur Rustagi wrote:
> Ideally you should be converting RDD to schemardd ?
> You are creating UnionRDD to join across dstream rdd?
>
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
>
> On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi
> <honeyjoshi@ideata-analytics.com
>
>> wrote:
>>
>
>> Hi,
>> I am trying to run a project which takes data as a DStream and dumps the
>>  data in the Shark table after various operations. I am getting the
>> following error :
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job
>> aborted:
>> Task 0.0:0 failed 1 times (most recent failure: Exception failure:
>> java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
>>  be cast to org.apache.spark.rdd.HadoopPartition) at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
>> uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$sched
>> uler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>> at
>>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sc
>> ala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:102
>> 6)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
>> DAGScheduler.scala:619)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(
>> DAGScheduler.scala:619)
>> at scala.Option.foreach(Option.scala:236) at
>>
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala
>> :619)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonf
>> un$receive$1.applyOrElse(DAGScheduler.scala:207)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at
>> akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
>> akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abstra
>> ctDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.j
>> ava:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979
>> )
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
>> .java:107)
>>
>>
>> Can someone please explain the cause of this error, I am also using a
>> Spark Context with the existing Streaming Context.
>>
>>
>

I am using spark 0.9.0-Incubating, so it doesnt have anything to do with
schemaRDD.This error is probably coming when I am trying to use one spark
context and one shark context in the same job.Is there any way to
incorporate two context in one job?
Regards

Honey Joshi
Ideata-Analytics

Re: Error: UnionPartition cannot be cast to org.apache.spark.rdd.HadoopPartition

Posted by Mayur Rustagi <ma...@gmail.com>.

Ideally you should be converting RDD to schemardd ?
You are creating UnionRDD to join across dstream rdd?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Tue, Jul 1, 2014 at 3:11 PM, Honey Joshi <honeyjoshi@ideata-analytics.com
> wrote:

> Hi,
> I am trying to run a project which takes data as a DStream and dumps the
> data in the Shark table after various operations. I am getting the
> following error :
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task 0.0:0 failed 1 times (most recent failure: Exception failure:
> java.lang.ClassCastException: org.apache.spark.rdd.UnionPartition cannot
> be cast to org.apache.spark.rdd.HadoopPartition)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>         at scala.Option.foreach(Option.scala:236)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Can someone please explain the cause of this error, I am also using a
> Spark Context with the existing Streaming Context.
>