You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tianshuo Deng <td...@twitter.com.INVALID> on 2015/03/23 05:16:11 UTC
SocketTimeout only when launching lots of executors
Hi, spark users.
When running a spark application with lots of executors(300+), I see following failures:
java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:421) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
When I reduce the number of executors, the spark app runs fine. From the stack trace, it looks like that multiple executors requesting downloading dependencies at the same time is causing driver to timeout?
Anyone experienced similar issues or has any suggestions?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: SocketTimeout only when launching lots of executors
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
It seems your driver is getting flooded by those many executors and hence
it gets timeout. There are some configuration options like
spark.akka.timeout etc, you could try playing with those. More information
will be available here:
http://spark.apache.org/docs/latest/configuration.html
Thanks
Best Regards
On Mon, Mar 23, 2015 at 9:46 AM, Tianshuo Deng <td...@twitter.com.invalid>
wrote:
> Hi, spark users.
>
> When running a spark application with lots of executors(300+), I see
> following failures:
>
> java.net.SocketTimeoutException: Read timed out at
> java.net.SocketInputStream.socketRead0(Native Method) at
> java.net.SocketInputStream.read(SocketInputStream.java:152) at
> java.net.SocketInputStream.read(SocketInputStream.java:122) at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at
> java.io.BufferedInputStream.read(BufferedInputStream.java:334) at
> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690) at
> sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
> at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583) at
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:421) at
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
> at
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> When I reduce the number of executors, the spark app runs fine. From the
> stack trace, it looks like that multiple executors requesting downloading
> dependencies at the same time is causing driver to timeout?
>
> Anyone experienced similar issues or has any suggestions?
>
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>