You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mars Max <ma...@baidu.com> on 2014/12/16 04:00:38 UTC
Fetch Failed caused job failed.
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally.
And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.
4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
length = read_int(stream)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
raise EOFError
EOFError
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
答复: 答复: Fetch Failed caused job failed.
Posted by "Ma,Xi" <ma...@baidu.com>.
Actually there was still Fetch failure. However, after I upgrade the spark to 1.1.1, this error was not met again.
Thanks,
Mars
发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com]
发送时间: 2014年12月16日 17:52
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org
主题: Re: 答复: Fetch Failed caused job failed.
So the fetch failure error is gone? Can you paste the code that you are executing? What is the size of the data and your cluster setup?
Thanks
Best Regards
On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <ma...@baidu.com>> wrote:
Hi Das,
Thanks for your advice.
I'm not sure what's the usage of setting memoryFraction to 1. I've tried to rerun the test again with the following parameters in spark_default.conf, but failed again:
spark.rdd.compress true
spark.akka.frameSize 50
spark.storage.memoryFraction 0.8
spark.core.connection.ack.wait.timeout 6000
14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length
length = read_int(stream)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int
raise EOFError
EOFError
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I suspect that there something wrong in shuffle stage, but not sure what's the error ?
Thanks,
Mars
发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com<ma...@sigmoidanalytics.com>]
发送时间: 2014年12月16日 14:57
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
主题: Re: Fetch Failed caused job failed.
You could try setting the following while creating the sparkContext
.set("spark.rdd.compress","true")
.set("spark.storage.memoryFraction","1")
.set("spark.core.connection.ack.wait.timeout","600")
.set("spark.akka.frameSize","50")
Thanks
Best Regards
On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com>> wrote:
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally.
And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.
4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
length = read_int(stream)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
raise EOFError
EOFError
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>
Re: 答复: Fetch Failed caused job failed.
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
So the fetch failure error is gone? Can you paste the code that you are
executing? What is the size of the data and your cluster setup?
Thanks
Best Regards
On Tue, Dec 16, 2014 at 3:16 PM, Ma,Xi <ma...@baidu.com> wrote:
>
> Hi Das,
>
>
>
> Thanks for your advice.
>
>
>
> I'm not sure what's the usage of setting memoryFraction to 1. I've tried
> to rerun the test again with the following parameters in
> spark_default.conf, but failed again:
>
>
>
> spark.rdd.compress true
>
> spark.akka.frameSize 50
>
> spark.storage.memoryFraction 0.8
>
> spark.core.connection.ack.wait.timeout 6000
>
>
>
> 14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
>
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>
> File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
>
> command = pickleSer._read_with_length(infile)
>
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
>
> length = read_int(stream)
>
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
>
> raise EOFError
>
> EOFError
>
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
>
> at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:662)
>
>
>
> I suspect that there something wrong in shuffle stage, but not sure what's
> the error ?
>
>
>
> Thanks,
>
>
>
> Mars
>
>
>
>
>
> *发件人:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
> *发送时间:* 2014年12月16日 14:57
> *收件人:* Ma,Xi
> *抄送:* user@spark.incubator.apache.org
> *主题:* Re: Fetch Failed caused job failed.
>
>
>
> You could try setting the following while creating the sparkContext
>
>
>
> *.*set*(*"spark.rdd.compress"*,*"true"*)*
>
> *.*set*(*"spark.storage.memoryFraction"*,*"1"*)*
>
> *.*set*(*"spark.core.connection.ack.wait.timeout"*,*"600"*)*
>
> *.*set*(*"spark.akka.frameSize"*,*"50"*)*
>
>
>
>
> Thanks
>
> Best Regards
>
>
>
> On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com> wrote:
>
> While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
> xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
> were many retries, and the job failed finally.
>
> And the log showed the following error, does anybody meet this error ? or
> is
> it a known issue in Spark ? Thanks.
>
> 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
> File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
> command = pickleSer._read_with_length(infile)
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
> length = read_int(stream)
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
> raise EOFError
> EOFError
>
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
> at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
> exception:
> org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task
> 18305
> 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID
> 18305)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
答复: Fetch Failed caused job failed.
Posted by "Ma,Xi" <ma...@baidu.com>.
Hi Das,
Thanks for your advice.
I'm not sure what's the usage of setting memoryFraction to 1. I've tried to rerun the test again with the following parameters in spark_default.conf, but failed again:
spark.rdd.compress true
spark.akka.frameSize 50
spark.storage.memoryFraction 0.8
spark.core.connection.ack.wait.timeout 6000
14/12/16 16:45:08 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in _read_with_length
length = read_int(stream)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in read_int
raise EOFError
EOFError
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I suspect that there something wrong in shuffle stage, but not sure what's the error ?
Thanks,
Mars
发件人: Akhil Das [mailto:akhil@sigmoidanalytics.com]
发送时间: 2014年12月16日 14:57
收件人: Ma,Xi
抄送: user@spark.incubator.apache.org
主题: Re: Fetch Failed caused job failed.
You could try setting the following while creating the sparkContext
.set("spark.rdd.compress","true")
.set("spark.storage.memoryFraction","1")
.set("spark.core.connection.ack.wait.timeout","600")
.set("spark.akka.frameSize","50")
Thanks
Best Regards
On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com>> wrote:
While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
xxxxxxxxxx.com<http://xxxxxxxxxx.com>, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
were many retries, and the job failed finally.
And the log showed the following error, does anybody meet this error ? or is
it a known issue in Spark ? Thanks.
4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
_read_with_length
length = read_int(stream)
File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
read_int
raise EOFError
EOFError
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
exception:
org.apache.spark.shuffle.FetchFailedException: Fetch failed:
BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com<http://nmg01-taihang-d11609.nmg01.baidu.com>, 40975, 0) 2 5 286
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task 18305
14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID 18305)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>
Re: Fetch Failed caused job failed.
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You could try setting the following while creating the sparkContext
.set("spark.rdd.compress","true")
.set("spark.storage.memoryFraction","1")
.set("spark.core.connection.ack.wait.timeout","600")
.set("spark.akka.frameSize","50")
Thanks
Best Regards
On Tue, Dec 16, 2014 at 8:30 AM, Mars Max <ma...@baidu.com> wrote:
>
> While I was running spark MR job, there was FetchFailed(BlockManagerId(47,
> xxxxxxxxxx.com, 40975, 0), shuffleId=2, mapId=5, reduceId=286), then there
> were many retries, and the job failed finally.
>
> And the log showed the following error, does anybody meet this error ? or
> is
> it a known issue in Spark ? Thanks.
>
> 4/12/16 10:43:43 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
> File "/home/spark/spark-1.1/python/pyspark/worker.py", line 75, in main
> command = pickleSer._read_with_length(infile)
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 146, in
> _read_with_length
> length = read_int(stream)
> File "/home/spark/spark-1.1/python/pyspark/serializers.py", line 464, in
> read_int
> raise EOFError
> EOFError
>
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
> at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:265)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 ERROR PythonRDD: This may have been caused by a prior
> exception:
> org.apache.spark.shuffle.FetchFailedException: Fetch failed:
> BlockManagerId(47, nmg01-taihang-d11609.nmg01.baidu.com, 40975, 0) 2 5 286
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:68)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:78)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> 14/12/16 10:43:43 INFO CoarseGrainedExecutorBackend: Got assigned task
> 18305
> 14/12/16 10:43:43 INFO Executor: Running task 623.0 in stage 5.0 (TID
> 18305)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failed-caused-job-failed-tp20697.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>