You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Thijs Haarhuis <th...@oranggo.com> on 2018/10/16 06:56:37 UTC

SocketTimeoutException with spark-r and using latest R version

Hi all,

I am running into a problem that once in a while my job is giving me the following exception(s):
java.net.SocketTimeoutException: Accept timed out
                at java.net.PlainSocketImpl.socketAccept(Native Method)
                at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
                at java.net.ServerSocket.implAccept(ServerSocket.java:545)
                at java.net.ServerSocket.accept(ServerSocket.java:513)
                at org.apache.spark.api.r.RRunner.compute(RRunner.scala:77)
                at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:436)
                at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:418)
                at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
                at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
                at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
                at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
                at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
                at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
                at org.apache.spark.scheduler.Task.run(Task.scala:108)
                at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)
18/10/16 08:47:18:388 INFO CoarseGrainedExecutorBackend: Got assigned task 2059
18/10/16 08:47:18:388 INFO Executor: Running task 22.0 in stage 21.0 (TID 2059)
18/10/16 08:47:18:391 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
18/10/16 08:47:18:391 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/10/16 08:47:18:394 ERROR Executor: Exception in task 22.0 in stage 21.0 (TID 2059)
java.net.SocketException: Broken pipe (Write failed)
                at java.net.SocketOutputStream.socketWrite0(Native Method)
                at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
                at java.net.SocketOutputStream.write(SocketOutputStream.java:155)

It does not happen all the time, but after 15 times or so.
I am using Apache Livy for submitting R scripts to the cluster which is running on CentOS 7.5 with Spark Version 2.2.1 and R 3.5.1

I am using another system with an older CentOS version 6.6 and R 3.2.2 which is running very stable without any problems and using the same R script.

Is it possible that this R version is somehow not compatible with spark 2.2.1?

Thanks
Thijs