You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by VG <vl...@gmail.com> on 2016/07/23 13:37:29 UTC

Error in collecting RDD as a Map - IOException in collectAsMap

Please suggest if I am doing something wrong or an alternative way of doing
this.

I have an RDD with two values as follows
JavaPairRDD<String, Long> rdd

When I execute   rdd..collectAsMap()
it always fails with IO exceptions.


16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning
fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /192.168.1.3:58179
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
at
org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
at
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Connection timed out: no further
information: /192.168.1.3:58179
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1
outstanding blocks after 5000 ms

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by Andrew Ehrlich <an...@aehrlich.com>.

+1 for the misleading error. Messages about failing to connect often mean that an executor has died. If so, dig into the executor logs and find out why the executor died (out of memory, perhaps). 

Andrew

> On Jul 23, 2016, at 11:39 AM, VG <vl...@gmail.com> wrote:
> 
> Hi Pedro,
> 
> Based on your suggestion, I deployed this on a aws node and it worked fine. 
> thanks for your advice. 
> 
> I am still trying to figure out the issues on the local environment
> Anyways thanks again
> 
> -VG
> 
> On Sat, Jul 23, 2016 at 9:26 PM, Pedro Rodriguez <ski.rodriguez@gmail.com <ma...@gmail.com>> wrote:
> Have you changed spark-env.sh or spark-defaults.conf from the default? It looks like spark is trying to address local workers based on a network address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1, 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly. You might also check /etc/hosts to make sure that you don’t have anything weird going on.
> 
> Last thing to try perhaps is that are you running Spark within a VM and/or Docker? If networking isn’t setup correctly on those you may also run into trouble.
> 
> What would be helpful is to know everything about your setup that might affect networking.
> 
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
> 
> pedrorodriguez.io <http://pedrorodriguez.io/> | 909-353-4423 <tel:909-353-4423>
> github.com/EntilZha <http://github.com/EntilZha> | LinkedIn <https://www.linkedin.com/in/pedrorodriguezscience>
> On July 23, 2016 at 9:10:31 AM, VG (vlinked@gmail.com <ma...@gmail.com>) wrote:
> 
>> Hi pedro,
>> 
>> Apologies for not adding this earlier. 
>> 
>> This is running on a local cluster set up as follows.
>> JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR");
>> 
>> Any suggestions based on this ? 
>> 
>> The ports are not blocked by firewall. 
>> 
>> Regards,
>> 
>> 
>> 
>> On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <ski.rodriguez@gmail.com <ma...@gmail.com>> wrote:
>> Make sure that you don’t have ports firewalled. You don’t really give much information to work from, but it looks like the master can’t access the worker nodes for some reason. If you give more information on the cluster, networking, etc, it would help.
>> 
>> For example, on AWS you can create a security group which allows all traffic to/from itself to itself. If you are using something like ufw on ubuntu then you probably need to know the ip addresses of the worker nodes beforehand.
>> 
>> —
>> Pedro Rodriguez
>> PhD Student in Large-Scale Machine Learning | CU Boulder
>> Systems Oriented Data Scientist
>> UC Berkeley AMPLab Alumni
>> 
>> pedrorodriguez.io <http://pedrorodriguez.io/> | 909-353-4423 <tel:909-353-4423>
>> github.com/EntilZha <http://github.com/EntilZha> | LinkedIn <https://www.linkedin.com/in/pedrorodriguezscience>
>> On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com <ma...@gmail.com>) wrote:
>> 
>>> Please suggest if I am doing something wrong or an alternative way of doing this. 
>>> 
>>> I have an RDD with two values as follows 
>>> JavaPairRDD<String, Long> rdd
>>> 
>>> When I execute   rdd..collectAsMap()
>>> it always fails with IO exceptions.   
>>> 
>>> 
>>> 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks 
>>> java.io.IOException: Failed to connect to /192.168.1.3:58179 <http://192.168.1.3:58179/>
>>> at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
>>> at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
>>> at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
>>> at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>>> at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
>>> at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
>>> at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
>>> at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
>>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
>>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
>>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>> at java.lang.Thread.run(Unknown Source)
>>> Caused by: java.net.ConnectException: Connection timed out: no further information: /192.168.1.3:58179 <http://192.168.1.3:58179/>
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>>> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>>> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
>>> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>>> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>>> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>>> ... 1 more
>>> 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
>>> 
>>> 
>>> 
>> 
>

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by VG <vl...@gmail.com>.

Hi Pedro,

Based on your suggestion, I deployed this on a aws node and it worked fine.
thanks for your advice.

I am still trying to figure out the issues on the local environment
Anyways thanks again

-VG

On Sat, Jul 23, 2016 at 9:26 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> Have you changed spark-env.sh or spark-defaults.conf from the default? It
> looks like spark is trying to address local workers based on a network
> address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1,
> 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly.
> You might also check /etc/hosts to make sure that you don’t have anything
> weird going on.
>
> Last thing to try perhaps is that are you running Spark within a VM and/or
> Docker? If networking isn’t setup correctly on those you may also run into
> trouble.
>
> What would be helpful is to know everything about your setup that might
> affect networking.
>
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
>
> pedrorodriguez.io | 909-353-4423
> github.com/EntilZha | LinkedIn
> <https://www.linkedin.com/in/pedrorodriguezscience>
>
> On July 23, 2016 at 9:10:31 AM, VG (vlinked@gmail.com) wrote:
>
> Hi pedro,
>
> Apologies for not adding this earlier.
>
> This is running on a local cluster set up as follows.
> JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR");
>
> Any suggestions based on this ?
>
> The ports are not blocked by firewall.
>
> Regards,
>
>
>
> On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <sk...@gmail.com>
> wrote:
>
>> Make sure that you don’t have ports firewalled. You don’t really give
>> much information to work from, but it looks like the master can’t access
>> the worker nodes for some reason. If you give more information on the
>> cluster, networking, etc, it would help.
>>
>> For example, on AWS you can create a security group which allows all
>> traffic to/from itself to itself. If you are using something like ufw on
>> ubuntu then you probably need to know the ip addresses of the worker nodes
>> beforehand.
>>
>> —
>> Pedro Rodriguez
>> PhD Student in Large-Scale Machine Learning | CU Boulder
>> Systems Oriented Data Scientist
>> UC Berkeley AMPLab Alumni
>>
>> pedrorodriguez.io | 909-353-4423
>> github.com/EntilZha | LinkedIn
>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>
>> On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com) wrote:
>>
>> Please suggest if I am doing something wrong or an alternative way of
>> doing this.
>>
>> I have an RDD with two values as follows
>> JavaPairRDD<String, Long> rdd
>>
>> When I execute   rdd..collectAsMap()
>> it always fails with IO exceptions.
>>
>>
>> 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning
>> fetch of 1 outstanding blocks
>> java.io.IOException: Failed to connect to /192.168.1.3:58179
>> at
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
>> at
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
>> at
>> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
>> at
>> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>> at
>> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
>> at
>> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
>> at
>> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
>> at
>> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
>> at
>> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>> at java.lang.Thread.run(Unknown Source)
>> Caused by: java.net.ConnectException: Connection timed out: no further
>> information: /192.168.1.3:58179
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>> at
>> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>> at
>> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>> ... 1 more
>> 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1
>> outstanding blocks after 5000 ms
>>
>>
>>
>>
>

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by Marco Mistroni <mm...@gmail.com>.

Hi vg I believe the error msg is misleading. I had a similar one with
pyspark yesterday after calling a count on a data frame, where the real
error was with an incorrect user defined function being applied .
Pls send me some sample code with a trimmed down version of the data and I
see if i can reproduce
Kr

On 23 Jul 2016 4:57 pm, "Pedro Rodriguez" <sk...@gmail.com> wrote:

Have you changed spark-env.sh or spark-defaults.conf from the default? It
looks like spark is trying to address local workers based on a network
address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1,
0.0.0.0,…). Additionally, that network address doesn’t resolve correctly.
You might also check /etc/hosts to make sure that you don’t have anything
weird going on.

Last thing to try perhaps is that are you running Spark within a VM and/or
Docker? If networking isn’t setup correctly on those you may also run into
trouble.

What would be helpful is to know everything about your setup that might
affect networking.

—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn
<https://www.linkedin.com/in/pedrorodriguezscience>

On July 23, 2016 at 9:10:31 AM, VG (vlinked@gmail.com) wrote:

Hi pedro,

Apologies for not adding this earlier.

This is running on a local cluster set up as follows.
JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR");

Any suggestions based on this ?

The ports are not blocked by firewall.

Regards,



On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> Make sure that you don’t have ports firewalled. You don’t really give much
> information to work from, but it looks like the master can’t access the
> worker nodes for some reason. If you give more information on the cluster,
> networking, etc, it would help.
>
> For example, on AWS you can create a security group which allows all
> traffic to/from itself to itself. If you are using something like ufw on
> ubuntu then you probably need to know the ip addresses of the worker nodes
> beforehand.
>
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
>
> pedrorodriguez.io | 909-353-4423
> github.com/EntilZha | LinkedIn
> <https://www.linkedin.com/in/pedrorodriguezscience>
>
> On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com) wrote:
>
> Please suggest if I am doing something wrong or an alternative way of
> doing this.
>
> I have an RDD with two values as follows
> JavaPairRDD<String, Long> rdd
>
> When I execute   rdd..collectAsMap()
> it always fails with IO exceptions.
>
>
> 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 1 outstanding blocks
> java.io.IOException: Failed to connect to /192.168.1.3:58179
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
> at
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
> at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
> at
> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
> at
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
> at
> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.ConnectException: Connection timed out: no further
> information: /192.168.1.3:58179
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
> at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> ... 1 more
> 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1
> outstanding blocks after 5000 ms
>
>
>
>

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by Pedro Rodriguez <sk...@gmail.com>.

Have you changed spark-env.sh or spark-defaults.conf from the default? It looks like spark is trying to address local workers based on a network address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1, 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly. You might also check /etc/hosts to make sure that you don’t have anything weird going on.

Last thing to try perhaps is that are you running Spark within a VM and/or Docker? If networking isn’t setup correctly on those you may also run into trouble.

What would be helpful is to know everything about your setup that might affect networking.

—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn

On July 23, 2016 at 9:10:31 AM, VG (vlinked@gmail.com) wrote:

Hi pedro,

Apologies for not adding this earlier. 

This is running on a local cluster set up as follows.
JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR");

Any suggestions based on this ? 

The ports are not blocked by firewall. 

Regards,



On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <sk...@gmail.com> wrote:
Make sure that you don’t have ports firewalled. You don’t really give much information to work from, but it looks like the master can’t access the worker nodes for some reason. If you give more information on the cluster, networking, etc, it would help.

For example, on AWS you can create a security group which allows all traffic to/from itself to itself. If you are using something like ufw on ubuntu then you probably need to know the ip addresses of the worker nodes beforehand.

—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn

On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com) wrote:

Please suggest if I am doing something wrong or an alternative way of doing this. 

I have an RDD with two values as follows 
JavaPairRDD<String, Long> rdd

When I execute   rdd..collectAsMap()
it always fails with IO exceptions.   


16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks 
java.io.IOException: Failed to connect to /192.168.1.3:58179
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Connection timed out: no further information: /192.168.1.3:58179
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by VG <vl...@gmail.com>.

Hi pedro,

Apologies for not adding this earlier.

This is running on a local cluster set up as follows.
JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR");

Any suggestions based on this ?

The ports are not blocked by firewall.

Regards,



On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> Make sure that you don’t have ports firewalled. You don’t really give much
> information to work from, but it looks like the master can’t access the
> worker nodes for some reason. If you give more information on the cluster,
> networking, etc, it would help.
>
> For example, on AWS you can create a security group which allows all
> traffic to/from itself to itself. If you are using something like ufw on
> ubuntu then you probably need to know the ip addresses of the worker nodes
> beforehand.
>
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
>
> pedrorodriguez.io | 909-353-4423
> github.com/EntilZha | LinkedIn
> <https://www.linkedin.com/in/pedrorodriguezscience>
>
> On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com) wrote:
>
> Please suggest if I am doing something wrong or an alternative way of
> doing this.
>
> I have an RDD with two values as follows
> JavaPairRDD<String, Long> rdd
>
> When I execute   rdd..collectAsMap()
> it always fails with IO exceptions.
>
>
> 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 1 outstanding blocks
> java.io.IOException: Failed to connect to /192.168.1.3:58179
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
> at
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
> at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
> at
> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
> at
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
> at
> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
> at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.ConnectException: Connection timed out: no further
> information: /192.168.1.3:58179
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
> at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> ... 1 more
> 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1
> outstanding blocks after 5000 ms
>
>
>
>

Re: Error in collecting RDD as a Map - IOException in collectAsMap

Posted by Pedro Rodriguez <sk...@gmail.com>.

Make sure that you don’t have ports firewalled. You don’t really give much information to work from, but it looks like the master can’t access the worker nodes for some reason. If you give more information on the cluster, networking, etc, it would help.

For example, on AWS you can create a security group which allows all traffic to/from itself to itself. If you are using something like ufw on ubuntu then you probably need to know the ip addresses of the worker nodes beforehand.

—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn

On July 23, 2016 at 7:38:01 AM, VG (vlinked@gmail.com) wrote:

Please suggest if I am doing something wrong or an alternative way of doing this. 

I have an RDD with two values as follows 
JavaPairRDD<String, Long> rdd

When I execute   rdd..collectAsMap()
it always fails with IO exceptions.   


16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks 
java.io.IOException: Failed to connect to /192.168.1.3:58179
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Connection timed out: no further information: /192.168.1.3:58179
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms