You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Al M <al...@gmail.com> on 2014/12/03 15:55:51 UTC

Failed fetch: "Could not get block(s)"

I am using Spark 1.1.1.  I am seeing an issue that only appears when I run in
standalone clustered mode with at least 2 workers.  The workers are on
separate physical machines.
I am performing a simple join on 2 RDDs.  After the join I run first() on
the joined RDD (in Scala) to get the first result.  When this first() runs
on Worker A it works fine; when the first() runs on worker B I get an error
'Fetch Failure'.
I looked at the work stderr log for worker B.  It shows the following
exception:
INFO BlockFetcherInterator$BasicBlockFetcherIterator: Started 2 remote
fetches in 2 msERROR BlockFetcherIterator$BasicBlockFetcherIterator: Could
not get block(s) from ConnectionManagerId(, )java.io.IOException:
sendMessageReliably failed because ack was not received within 60 sec    at
org.apache.spark.network.ConnectionManager$$anon$10$$anonfun$run$15.apply(ConnectionManager.scala:866).....
It is trying to connect to the ConnectionManager for BlockManager on Worker
A from Worker B.  It manages to connect, but it always times out.  When I
try to connect via telnet I see the same: it connects, but I don't get
anything back from the host
I noticed that two other people reported  this issue
<http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Freezing-while-running-TPC-H-query-5-td14902.html> 
.  Unfortunately there was no meaningful progress.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-fetch-Could-not-get-block-s-tp20262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.