You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mars Max <ma...@baidu.com> on 2014/12/16 04:00:38 UTC

Fetch Failed caused job failed.

While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally. 

And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.

4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
    raise EOFError
EOFError

	at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
	at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
	at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
	at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
	at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
	at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
	at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
	at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
	at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
	at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

答复: 答复: Fetch Failed caused job failed.

Posted by "Ma,Xi" <ma...@baidu.com>.

Actually there was still Fetch failure. However, after I upgrade the spark to 1.1.1, this error was not met again.

Thanks,
Mars


发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com]
发送时间: 2014年12月16日 17:52
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org
主题: Re: 答复: Fetch Failed caused job failed.

So the fetch failure error is gone? Can you paste the code that you are executing? What is the size of the data and your cluster setup?

Thanks
Best Regards

On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <ma...@baidu.com>> wrote:
Hi Das,

Thanks for your advice.

I'm not sure what's the usage of setting memoryFraction to 1. I've tried to rerun the test again with the following parameters in spark_default.conf, but failed again:

spark.rdd.compress  true
spark.akka.frameSize  50
spark.storage.memoryFraction 0.8
spark.core.connection.ack.wait.timeout 6000

14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int
    raise EOFError
EOFError
         at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
         at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
         at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
         at org.apache.spark.scheduler.Task.run(Task.scala:54)
         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)

I suspect that there something wrong in shuffle stage, but not sure what's the error ?

Thanks,

Mars


发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com<ma...@sigmoidanalytics.com>]
发送时间: 2014年12月16日 14:57
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
主题: Re: Fetch Failed caused job failed.

You could try setting the following while creating the sparkContext


      .set("spark.rdd.compress","true")

      .set("spark.storage.memoryFraction","1")

      .set("spark.core.connection.ack.wait.timeout","600")

      .set("spark.akka.frameSize","50")


Thanks
Best Regards

On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com>> wrote:
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally.

And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.

4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
    raise EOFError
EOFError

        at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
        at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: 答复: Fetch Failed caused job failed.

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

So the fetch failure error is gone? Can you paste the code that you are
executing? What is the size of the data and your cluster setup?

Thanks
Best Regards

On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <ma...@baidu.com> wrote:
>
>  Hi Das,
>
>
>
> Thanks for your advice.
>
>
>
> I'm not sure what's the usage of setting memoryFraction to 1. I've tried
> to rerun the test again with the following parameters in
> spark_default.conf, but failed again:
>
>
>
> spark.rdd.compress  true
>
> spark.akka.frameSize  50
>
> spark.storage.memoryFraction 0.8
>
> spark.core.connection.ack.wait.timeout 6000
>
>
>
> 14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
>
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>
>   File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>
>     command = pickleSer._read_with_length(infile)
>
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>
>     length = read_int(stream)
>
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>
>     raise EOFError
>
> EOFError
>
>          at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>
>          at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>
>          at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
>          at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>
>          at org.apache.spark.scheduler.Task.run(Task.scala:54)
>
>          at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>          at java.lang.Thread.run(Thread.java:662)
>
>
>
> I suspect that there something wrong in shuffle stage, but not sure what's
> the error ?
>
>
>
> Thanks,
>
>
>
> Mars
>
>
>
>
>
> *发件人:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
> *发送时间:* 2014年12月16日 14:57
> *收件人:* Ma,Xi
> *抄送:* user@spark.incubator.apache.org
> *主题:* Re: Fetch Failed caused job failed.
>
>
>
> You could try setting the following while creating the sparkContext
>
>
>
>       *.*set*(*"spark.rdd.compress"*,*"true"*)*
>
>       *.*set*(*"spark.storage.memoryFraction"*,*"1"*)*
>
>       *.*set*(*"spark.core.connection.ack.wait.timeout"*,*"600"*)*
>
>       *.*set*(*"spark.akka.frameSize"*,*"50"*)*
>
>
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com> wrote:
>
> While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
> xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
> were many retries, and the job failed finally.
>
> And the log showed the following error, does anybody meet this error ? or
> is
> it a known issue in Spark ? Thanks.
>
> 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>     length = read_int(stream)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>     raise EOFError
> EOFError
>
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>         at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
> exception:
> org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task
> 18305
> 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID
> 18305)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

答复: Fetch Failed caused job failed.

Posted by "Ma,Xi" <ma...@baidu.com>.

Hi Das,

Thanks for your advice.

I'm not sure what's the usage of setting memoryFraction to 1. I've tried to rerun the test again with the following parameters in spark_default.conf, but failed again:

spark.rdd.compress  true
spark.akka.frameSize  50
spark.storage.memoryFraction 0.8
spark.core.connection.ack.wait.timeout 6000

14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int
    raise EOFError
EOFError
         at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
         at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
         at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
         at org.apache.spark.scheduler.Task.run(Task.scala:54)
         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)

I suspect that there something wrong in shuffle stage, but not sure what's the error ?

Thanks,

Mars


发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com]
发送时间: 2014年12月16日 14:57
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org
主题: Re: Fetch Failed caused job failed.

You could try setting the following while creating the sparkContext


      .set("spark.rdd.compress","true")

      .set("spark.storage.memoryFraction","1")

      .set("spark.core.connection.ack.wait.timeout","600")

      .set("spark.akka.frameSize","50")


Thanks
Best Regards

On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com>> wrote:
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally.

And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.

4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
    command = pickleSer._read_with_length(infile)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
    length = read_int(stream)
  File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
    raise EOFError
EOFError

        at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
        at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: Fetch Failed caused job failed.

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You could try setting the following while creating the sparkContext

      .set("spark.rdd.compress","true")
.set("spark.storage.memoryFraction","1")
.set("spark.core.connection.ack.wait.timeout","600")
.set("spark.akka.frameSize","50")



Thanks
Best Regards

On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com> wrote:
>
> While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
> xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
> were many retries, and the job failed finally.
>
> And the log showed the following error, does anybody meet this error ? or
> is
> it a known issue in Spark ? Thanks.
>
> 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>     length = read_int(stream)
>   File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>     raise EOFError
> EOFError
>
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>         at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>         at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
> exception:
> org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task
> 18305
> 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID
> 18305)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>