You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hong Shen (JIRA)" <ji...@apache.org> on 2014/12/20 10:01:13 UTC
[jira] [Created] (SPARK-4909) "Error communicating with
MapOutputTracker" when run a big spark job
Hong Shen created SPARK-4909:
--------------------------------
Summary: "Error communicating with MapOutputTracker" when run a big spark job
Key: SPARK-4909
URL: https://issues.apache.org/jira/browse/SPARK-4909
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.1.0
Reporter: Hong Shen
When I run a job spark job with 38788 mapTask and 997 reduceTask, Job failed. Here is the log.
14/12/20 15:11:18 ERROR spark.MapOutputTrackerWorker: Error communicating with MapOutputTracker
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:162)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:43)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:41)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:117)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:114)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
The mapOutputStatus is more than 15MB, and more than 500 executor ask driver to send map output locations for shuffle, and driver will send map output locations to all the executors, it's obviously will cause executor timeout.
Maybe we can optimize it, do not let driver send map output locations to all the executors, for example, to use broadcast variable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org