You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tianshuo Deng <td...@twitter.com.INVALID> on 2015/03/23 05:16:11 UTC

SocketTimeout only when launching lots of executors

Hi, spark users.

When running a spark application with lots of executors(300+), I see following failures:

java.net.SocketTimeoutException: Read timed out      at java.net.SocketInputStream.socketRead0(Native Method)      at java.net.SocketInputStream.read(SocketInputStream.java:152)      at java.net.SocketInputStream.read(SocketInputStream.java:122)      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)      at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)      at java.io.BufferedInputStream.read(BufferedInputStream.java:334)      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)      at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583)      at org.apache.spark.util.Utils$.fetchFile(Utils.scala:421)      at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)      at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)      at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)      at java.lang.Thread.run(Thread.java:745)

When I reduce the number of executors, the spark app runs fine. From the stack trace, it looks like that multiple executors requesting downloading dependencies at the same time is causing driver to timeout?

Anyone experienced similar issues or has any suggestions?

Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: SocketTimeout only when launching lots of executors

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

It seems your driver is getting flooded by those many executors and hence
it gets timeout. There are some configuration options like
spark.akka.timeout etc, you could try playing with those. More information
will be available here:
http://spark.apache.org/docs/latest/configuration.html

Thanks
Best Regards

On Mon, Mar 23, 2015 at 9:46 AM, Tianshuo Deng <td...@twitter.com.invalid>
wrote:

> Hi, spark users.
>
> When running a spark application with lots of executors(300+), I see
> following failures:
>
> java.net.SocketTimeoutException: Read timed out      at
> java.net.SocketInputStream.socketRead0(Native Method)      at
> java.net.SocketInputStream.read(SocketInputStream.java:152)      at
> java.net.SocketInputStream.read(SocketInputStream.java:122)      at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)      at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:275)      at
> java.io.BufferedInputStream.read(BufferedInputStream.java:334)      at
> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)      at
> sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)      at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
>     at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583)      at
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:421)      at
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
>     at
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
>     at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>     at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>     at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>     at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>     at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)      at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>     at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
>     at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> When I reduce the number of executors, the spark app runs fine. From the
> stack trace, it looks like that multiple executors requesting downloading
> dependencies at the same time is causing driver to timeout?
>
> Anyone experienced similar issues or has any suggestions?
>
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>