You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by dachuan <hd...@gmail.com> on 2013/10/29 14:54:50 UTC

met a problem while running a streaming example program

Hi,

I have tried the clickstream example, it runs into an exception, anybody
met this before?

Since the program mentioned "local[2]", so I run it in my local machine.

thanks in advance,
dachuan.

Log Snippet 1:

13/10/29 08:50:25 INFO scheduler.DAGScheduler: Submitting 46 missing tasks
from Stage 12 (MapPartitionsRDD[63] at combineByKey at
ShuffledDStream.scala:41)
13/10/29 08:50:25 INFO local.LocalTaskSetManager: Size of task 75 is 4230
bytes
13/10/29 08:50:25 INFO local.LocalScheduler: Running 75
13/10/29 08:50:25 INFO spark.CacheManager: Cache key is rdd_9_0
13/10/29 08:50:25 INFO spark.CacheManager: Computing partition
org.apache.spark.rdd.BlockRDDPartition@0
13/10/29 08:50:25 WARN storage.BlockManager: Putting block rdd_9_0 failed
13/10/29 08:50:25 INFO local.LocalTaskSetManager: Loss was due to
java.io.NotSerializableException
java.io.NotSerializableException:
org.apache.spark.streaming.examples.clickstream.PageView

Log Snippet 2:
org.apache.spark.SparkException: Job failed: Task 12.0:0 failed more than 4
times; aborting job java.io.NotSerializableException:
org.apache.spark.streaming.examples.clickstream.PageView
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
        at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
        at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)

Two commands that run this app:
./run-example
org.apache.spark.streaming.exampl.clickstream.PageViewGenerator 44444 10
./run-example
org.apache.spark.streaming.examples.clickstream.PageViewStream
errorRatePerZipCode localhost 44444

Re: met a problem while running a streaming example program

Posted by dachuan <hd...@gmail.com>.
yes, it works after checkout branch-0.8.

thanks.


On Tue, Oct 29, 2013 at 12:51 PM, Patrick Wendell <pw...@gmail.com>wrote:

> If you just add the "extends Serializable" changes from here it should
> work.
>
> On Tue, Oct 29, 2013 at 9:36 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > This was fixed on 0.8 branch and master:
> > https://github.com/apache/incubator-spark/pull/63/files
> >
> > - Patrick
> >
> > On Tue, Oct 29, 2013 at 9:17 AM, Thunder Stumpges
> > <th...@gmail.com> wrote:
> >> I vaguely remember running into this same error. It says there
> >> "java.io.NotSerializableException:
> >> org.apache.spark.streaming.examples.clickstream.PageView"... can you
> >> check the PageView class in the examples and make sure it has the
> >> @serializable directive? I seem to remember having to add it.
> >>
> >> good luck,
> >> Thunder
> >>
> >>
> >> On Tue, Oct 29, 2013 at 6:54 AM, dachuan <hd...@gmail.com> wrote:
> >>> Hi,
> >>>
> >>> I have tried the clickstream example, it runs into an exception,
> anybody met
> >>> this before?
> >>>
> >>> Since the program mentioned "local[2]", so I run it in my local
> machine.
> >>>
> >>> thanks in advance,
> >>> dachuan.
> >>>
> >>> Log Snippet 1:
> >>>
> >>> 13/10/29 08:50:25 INFO scheduler.DAGScheduler: Submitting 46 missing
> tasks
> >>> from Stage 12 (MapPartitionsRDD[63] at combineByKey at
> >>> ShuffledDStream.scala:41)
> >>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Size of task 75 is
> 4230
> >>> bytes
> >>> 13/10/29 08:50:25 INFO local.LocalScheduler: Running 75
> >>> 13/10/29 08:50:25 INFO spark.CacheManager: Cache key is rdd_9_0
> >>> 13/10/29 08:50:25 INFO spark.CacheManager: Computing partition
> >>> org.apache.spark.rdd.BlockRDDPartition@0
> >>> 13/10/29 08:50:25 WARN storage.BlockManager: Putting block rdd_9_0
> failed
> >>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Loss was due to
> >>> java.io.NotSerializableException
> >>> java.io.NotSerializableException:
> >>> org.apache.spark.streaming.examples.clickstream.PageView
> >>>
> >>> Log Snippet 2:
> >>> org.apache.spark.SparkException: Job failed: Task 12.0:0 failed more
> than 4
> >>> times; aborting job java.io.NotSerializableException:
> >>> org.apache.spark.streaming.examples.clickstream.PageView
> >>>         at
> >>>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
> >>>         at
> >>>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
> >>>         at
> >>>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> >>>         at
> >>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>>         at
> >>>
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
> >>>         at
> >>>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
> >>>         at
> >>> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
> >>>         at
> >>>
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
> >>>
> >>> Two commands that run this app:
> >>> ./run-example
> >>> org.apache.spark.streaming.exampl.clickstream.PageViewGenerator 44444
> 10
> >>> ./run-example
> org.apache.spark.streaming.examples.clickstream.PageViewStream
> >>> errorRatePerZipCode localhost 44444
> >>>
>



-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

Re: met a problem while running a streaming example program

Posted by Patrick Wendell <pw...@gmail.com>.
If you just add the "extends Serializable" changes from here it should work.

On Tue, Oct 29, 2013 at 9:36 AM, Patrick Wendell <pw...@gmail.com> wrote:
> This was fixed on 0.8 branch and master:
> https://github.com/apache/incubator-spark/pull/63/files
>
> - Patrick
>
> On Tue, Oct 29, 2013 at 9:17 AM, Thunder Stumpges
> <th...@gmail.com> wrote:
>> I vaguely remember running into this same error. It says there
>> "java.io.NotSerializableException:
>> org.apache.spark.streaming.examples.clickstream.PageView"... can you
>> check the PageView class in the examples and make sure it has the
>> @serializable directive? I seem to remember having to add it.
>>
>> good luck,
>> Thunder
>>
>>
>> On Tue, Oct 29, 2013 at 6:54 AM, dachuan <hd...@gmail.com> wrote:
>>> Hi,
>>>
>>> I have tried the clickstream example, it runs into an exception, anybody met
>>> this before?
>>>
>>> Since the program mentioned "local[2]", so I run it in my local machine.
>>>
>>> thanks in advance,
>>> dachuan.
>>>
>>> Log Snippet 1:
>>>
>>> 13/10/29 08:50:25 INFO scheduler.DAGScheduler: Submitting 46 missing tasks
>>> from Stage 12 (MapPartitionsRDD[63] at combineByKey at
>>> ShuffledDStream.scala:41)
>>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Size of task 75 is 4230
>>> bytes
>>> 13/10/29 08:50:25 INFO local.LocalScheduler: Running 75
>>> 13/10/29 08:50:25 INFO spark.CacheManager: Cache key is rdd_9_0
>>> 13/10/29 08:50:25 INFO spark.CacheManager: Computing partition
>>> org.apache.spark.rdd.BlockRDDPartition@0
>>> 13/10/29 08:50:25 WARN storage.BlockManager: Putting block rdd_9_0 failed
>>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Loss was due to
>>> java.io.NotSerializableException
>>> java.io.NotSerializableException:
>>> org.apache.spark.streaming.examples.clickstream.PageView
>>>
>>> Log Snippet 2:
>>> org.apache.spark.SparkException: Job failed: Task 12.0:0 failed more than 4
>>> times; aborting job java.io.NotSerializableException:
>>> org.apache.spark.streaming.examples.clickstream.PageView
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
>>>         at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>>         at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>>>         at
>>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>>>
>>> Two commands that run this app:
>>> ./run-example
>>> org.apache.spark.streaming.exampl.clickstream.PageViewGenerator 44444 10
>>> ./run-example org.apache.spark.streaming.examples.clickstream.PageViewStream
>>> errorRatePerZipCode localhost 44444
>>>

Re: met a problem while running a streaming example program

Posted by Patrick Wendell <pw...@gmail.com>.
This was fixed on 0.8 branch and master:
https://github.com/apache/incubator-spark/pull/63/files

- Patrick

On Tue, Oct 29, 2013 at 9:17 AM, Thunder Stumpges
<th...@gmail.com> wrote:
> I vaguely remember running into this same error. It says there
> "java.io.NotSerializableException:
> org.apache.spark.streaming.examples.clickstream.PageView"... can you
> check the PageView class in the examples and make sure it has the
> @serializable directive? I seem to remember having to add it.
>
> good luck,
> Thunder
>
>
> On Tue, Oct 29, 2013 at 6:54 AM, dachuan <hd...@gmail.com> wrote:
>> Hi,
>>
>> I have tried the clickstream example, it runs into an exception, anybody met
>> this before?
>>
>> Since the program mentioned "local[2]", so I run it in my local machine.
>>
>> thanks in advance,
>> dachuan.
>>
>> Log Snippet 1:
>>
>> 13/10/29 08:50:25 INFO scheduler.DAGScheduler: Submitting 46 missing tasks
>> from Stage 12 (MapPartitionsRDD[63] at combineByKey at
>> ShuffledDStream.scala:41)
>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Size of task 75 is 4230
>> bytes
>> 13/10/29 08:50:25 INFO local.LocalScheduler: Running 75
>> 13/10/29 08:50:25 INFO spark.CacheManager: Cache key is rdd_9_0
>> 13/10/29 08:50:25 INFO spark.CacheManager: Computing partition
>> org.apache.spark.rdd.BlockRDDPartition@0
>> 13/10/29 08:50:25 WARN storage.BlockManager: Putting block rdd_9_0 failed
>> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Loss was due to
>> java.io.NotSerializableException
>> java.io.NotSerializableException:
>> org.apache.spark.streaming.examples.clickstream.PageView
>>
>> Log Snippet 2:
>> org.apache.spark.SparkException: Job failed: Task 12.0:0 failed more than 4
>> times; aborting job java.io.NotSerializableException:
>> org.apache.spark.streaming.examples.clickstream.PageView
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
>>         at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>         at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>>
>> Two commands that run this app:
>> ./run-example
>> org.apache.spark.streaming.exampl.clickstream.PageViewGenerator 44444 10
>> ./run-example org.apache.spark.streaming.examples.clickstream.PageViewStream
>> errorRatePerZipCode localhost 44444
>>

Re: met a problem while running a streaming example program

Posted by Thunder Stumpges <th...@gmail.com>.
I vaguely remember running into this same error. It says there
"java.io.NotSerializableException:
org.apache.spark.streaming.examples.clickstream.PageView"... can you
check the PageView class in the examples and make sure it has the
@serializable directive? I seem to remember having to add it.

good luck,
Thunder


On Tue, Oct 29, 2013 at 6:54 AM, dachuan <hd...@gmail.com> wrote:
> Hi,
>
> I have tried the clickstream example, it runs into an exception, anybody met
> this before?
>
> Since the program mentioned "local[2]", so I run it in my local machine.
>
> thanks in advance,
> dachuan.
>
> Log Snippet 1:
>
> 13/10/29 08:50:25 INFO scheduler.DAGScheduler: Submitting 46 missing tasks
> from Stage 12 (MapPartitionsRDD[63] at combineByKey at
> ShuffledDStream.scala:41)
> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Size of task 75 is 4230
> bytes
> 13/10/29 08:50:25 INFO local.LocalScheduler: Running 75
> 13/10/29 08:50:25 INFO spark.CacheManager: Cache key is rdd_9_0
> 13/10/29 08:50:25 INFO spark.CacheManager: Computing partition
> org.apache.spark.rdd.BlockRDDPartition@0
> 13/10/29 08:50:25 WARN storage.BlockManager: Putting block rdd_9_0 failed
> 13/10/29 08:50:25 INFO local.LocalTaskSetManager: Loss was due to
> java.io.NotSerializableException
> java.io.NotSerializableException:
> org.apache.spark.streaming.examples.clickstream.PageView
>
> Log Snippet 2:
> org.apache.spark.SparkException: Job failed: Task 12.0:0 failed more than 4
> times; aborting job java.io.NotSerializableException:
> org.apache.spark.streaming.examples.clickstream.PageView
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
>         at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>
> Two commands that run this app:
> ./run-example
> org.apache.spark.streaming.exampl.clickstream.PageViewGenerator 44444 10
> ./run-example org.apache.spark.streaming.examples.clickstream.PageViewStream
> errorRatePerZipCode localhost 44444
>