You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pillis W <pi...@gmail.com> on 2014/02/07 08:09:24 UTC

Akka Connection refused - standalone cluster using spark-0.9.0

I have a "Connection Refused" error on the first worker (standalone cluster
- no YARN, Mesos). No firewalls, and can ping master-worker nodes from the
other.

Master process started manually. It is running and can see Web UI at 8080.

Using "spark-0.9.0-incubating-bin-hadoop2.tgz"

===============================================
spark-0.9.0-incubating-bin-hadoop2]$ ./bin/spark-class
org.apache.spark.deploy.worker.Worker  spark://s1.machine.org:7077
14/02/07 07:00:58 INFO Utils: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/02/07 07:00:58 WARN Utils: Your hostname, s2.machine.org resolves to a
loopback address: 127.0.0.1; using 192.168.64.122 instead (on interface
eth1)
14/02/07 07:00:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
14/02/07 07:00:59 INFO Slf4jLogger: Slf4jLogger started
14/02/07 07:00:59 INFO Remoting: Starting remoting
14/02/07 07:00:59 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkWorker@s2:49614]
14/02/07 07:00:59 INFO Worker: Starting Spark worker s2:49614 with 1 cores,
853.0 MB RAM
14/02/07 07:00:59 INFO Worker: Spark home:
/home/vagrant/spark-0.9.0-incubating-bin-hadoop2
14/02/07 07:00:59 INFO WorkerWebUI: Started Worker web UI at http://s2:8081
14/02/07 07:00:59 INFO Worker: Connecting to master
spark://s1.machine.org:7077...
14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
sparkMaster@s1.machine.org:7077]: Error [Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: s1.machine.org/192.168.64.121:7077
]
14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
sparkMaster@s1.machine.org:7077]: Error [Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: s1.machine.org/192.168.64.121:7077
]
14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
sparkMaster@s1.machine.org:7077]: Error [Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: s1.machine.org/192.168.64.121:7077
]
14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
sparkMaster@s1.machine.org:7077]: Error [Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@s1.machine.org:7077]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: s1.machine.org/192.168.64.121:7077
]
14/02/07 07:00:59 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#607746123] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.

...

14/02/07 07:01:59 ERROR Worker: All masters are unresponsive! Giving up.
===============================================

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Pei-Lun Lee <pl...@appier.com>.
Same problem here. I am using 0.9.0 on EC2.
All worker nodes died at the same time after several started minutes.
Setting SPARK_MASTER_IP won't help.
Any suggestion is appreciated.

here's master log:

Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp
:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar
-Dspark.akka.logLifecycleEvents=true
-Djava.library.path=/root/hadoop-native/ -Xms512m -Xmx512m
org.apache.spark.deploy.master.Master --ip ZZZZZZ.ZZZZZ.ZZZZZZ.ZZZZ --port
7077 --webui-port 8080
========================================

log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/02/07 09:07:05 INFO Master: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/02/07 09:07:05 INFO Master: Starting Spark master at spark://
master.spark.XXXXXX.info:7077
14/02/07 09:07:05 INFO MasterWebUI: Started Master web UI at
http://master:8080
14/02/07 09:07:05 INFO Master: I have been elected leader! New state: ALIVE
14/02/07 09:07:07 INFO Master: Registering worker
master.spark.XXXXXX.info:35973 with 16 cores, 57.5 GB RAM
14/02/07 09:07:07 INFO Master: Registering worker
master.spark.XXXXXX.info:42106 with 16 cores, 57.5 GB RAM
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
got disassociated, removing it.
14/02/07 09:10:01 INFO Master: Removing worker
worker-20140207090706-ip-XX-XXX-XXX-XX.us-west-2.compute.internal-42106 on
ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
got disassociated, removing it.
14/02/07 09:10:01 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkMaster/deadLetters] to
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40XX.XXX.XXX.XX%3A56933-1#1326712555]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
got disassociated, removing it.
14/02/07 09:10:01 INFO Master: Removing worker
worker-20140207090706-ip-YY-YYY-YYY-YYY.us-west-2.compute.internal-35973 on
ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
got disassociated, removing it.
14/02/07 09:10:01 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkMaster/deadLetters] to
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40YY.YYY.YYY.YYY%3A58332-2#-579824674]
was not delivered. [2] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
got disassociated, removing it.
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-YY-YYY-YYY-YYY.us-west-2.compute.internal/YY.YYY.YYY.YYY:35973
]
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
got disassociated, removing it.
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-XX-XXX-XXX-XX.us-west-2.compute.internal/XX.XXX.XXX.XX:42106
]
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
got disassociated, removing it.
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
got disassociated, removing it.
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973
got disassociated, removing it.
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-YY-YYY-YYY-YYY.us-west-2.compute.internal/YY.YYY.YYY.YYY:35973
]
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-XX-XXX-XXX-XX.us-west-2.compute.internal/XX.XXX.XXX.XX:42106
]
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-YY-YYY-YYY-YYY.us-west-2.compute.internal:35973]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-YY-YYY-YYY-YYY.us-west-2.compute.internal/YY.YYY.YYY.YYY:35973
]
14/02/07 09:10:01 ERROR EndpointWriter: AssociationError [akka.tcp://
sparkMaster@master.spark.XXXXXX.info:7077] ->
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]:
Error [Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]]
[
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
ip-XX-XXX-XXX-XX.us-west-2.compute.internal/XX.XXX.XXX.XX:42106
]
14/02/07 09:10:01 INFO Master:
akka.tcp://sparkWorker@ip-XX-XXX-XXX-XX.us-west-2.compute.internal:42106
got disassociated, removing it.



2014-02-07 Sourav Chandra <so...@livestream.com>:

> What is the outpur of 'host s1.machine.org <http://s1.machine.org:7077/>'
> if you execute from your worker machine.
>
> ping will work but if this does not work it implies DNS entry is present
> for this machine (s1.machine.org <http://s1.machine.org:7077/>)
>
> 2 alternatives could be:
>  - add dns entry
>  - start master with SPARK_MASTER_IP=<master ip addess> env variable set
>
> Thanks,
> Sourav
>
>
> On Fri, Feb 7, 2014 at 12:39 PM, Pillis W <pi...@gmail.com> wrote:
>
>> I have a "Connection Refused" error on the first worker (standalone
>> cluster - no YARN, Mesos). No firewalls, and can ping master-worker nodes
>> from the other.
>>
>> Master process started manually. It is running and can see Web UI at 8080.
>>
>> Using "spark-0.9.0-incubating-bin-hadoop2.tgz"
>>
>> ===============================================
>> spark-0.9.0-incubating-bin-hadoop2]$ ./bin/spark-class
>> org.apache.spark.deploy.worker.Worker  spark://s1.machine.org:7077
>> 14/02/07 07:00:58 INFO Utils: Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 14/02/07 07:00:58 WARN Utils: Your hostname, s2.machine.org resolves to
>> a loopback address: 127.0.0.1; using 192.168.64.122 instead (on interface
>> eth1)
>> 14/02/07 07:00:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>> another address
>> 14/02/07 07:00:59 INFO Slf4jLogger: Slf4jLogger started
>> 14/02/07 07:00:59 INFO Remoting: Starting remoting
>> 14/02/07 07:00:59 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkWorker@s2:49614]
>> 14/02/07 07:00:59 INFO Worker: Starting Spark worker s2:49614 with 1
>> cores, 853.0 MB RAM
>> 14/02/07 07:00:59 INFO Worker: Spark home:
>> /home/vagrant/spark-0.9.0-incubating-bin-hadoop2
>> 14/02/07 07:00:59 INFO WorkerWebUI: Started Worker web UI at
>> http://s2:8081
>> 14/02/07 07:00:59 INFO Worker: Connecting to master
>> spark://s1.machine.org:7077...
>> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
>> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
>> sparkMaster@s1.machine.org:7077]: Error [Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]
>> Caused by:
>> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> Connection refused: s1.machine.org/192.168.64.121:7077
>> ]
>> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
>> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
>> sparkMaster@s1.machine.org:7077]: Error [Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]
>> Caused by:
>> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> Connection refused: s1.machine.org/192.168.64.121:7077
>> ]
>> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
>> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
>> sparkMaster@s1.machine.org:7077]: Error [Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]
>> Caused by:
>> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> Connection refused: s1.machine.org/192.168.64.121:7077
>> ]
>> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
>> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
>> sparkMaster@s1.machine.org:7077]: Error [Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@s1.machine.org:7077]
>> Caused by:
>> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> Connection refused: s1.machine.org/192.168.64.121:7077
>> ]
>> 14/02/07 07:00:59 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
>> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
>> Actor[akka://sparkWorker/user/Worker#607746123] to
>> Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
>> encountered. This logging can be turned off or adjusted with configuration
>> settings 'akka.log-dead-letters' and
>> 'akka.log-dead-letters-during-shutdown'.
>>
>> ...
>>
>> 14/02/07 07:01:59 ERROR Worker: All masters are unresponsive! Giving up.
>> ===============================================
>>
>
>
>
> --
>
> Sourav Chandra
>
> Senior Software Engineer
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>
> sourav.chandra@livestream.com
>
> o: +91 80 4121 8723
>
> m: +91 988 699 3746
>
> skype: sourav.chandra
>
> Livestream
>
> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
> Block, Koramangala Industrial Area,
>
> Bangalore 560034
>
> www.livestream.com
>

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Sourav Chandra <so...@livestream.com>.
What is the outpur of 'host s1.machine.org <http://s1.machine.org:7077/>'
if you execute from your worker machine.

ping will work but if this does not work it implies DNS entry is present
for this machine (s1.machine.org <http://s1.machine.org:7077/>)

2 alternatives could be:
 - add dns entry
 - start master with SPARK_MASTER_IP=<master ip addess> env variable set

Thanks,
Sourav


On Fri, Feb 7, 2014 at 12:39 PM, Pillis W <pi...@gmail.com> wrote:

> I have a "Connection Refused" error on the first worker (standalone
> cluster - no YARN, Mesos). No firewalls, and can ping master-worker nodes
> from the other.
>
> Master process started manually. It is running and can see Web UI at 8080.
>
> Using "spark-0.9.0-incubating-bin-hadoop2.tgz"
>
> ===============================================
> spark-0.9.0-incubating-bin-hadoop2]$ ./bin/spark-class
> org.apache.spark.deploy.worker.Worker  spark://s1.machine.org:7077
> 14/02/07 07:00:58 INFO Utils: Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 14/02/07 07:00:58 WARN Utils: Your hostname, s2.machine.org resolves to a
> loopback address: 127.0.0.1; using 192.168.64.122 instead (on interface
> eth1)
> 14/02/07 07:00:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
> 14/02/07 07:00:59 INFO Slf4jLogger: Slf4jLogger started
> 14/02/07 07:00:59 INFO Remoting: Starting remoting
> 14/02/07 07:00:59 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkWorker@s2:49614]
> 14/02/07 07:00:59 INFO Worker: Starting Spark worker s2:49614 with 1
> cores, 853.0 MB RAM
> 14/02/07 07:00:59 INFO Worker: Spark home:
> /home/vagrant/spark-0.9.0-incubating-bin-hadoop2
> 14/02/07 07:00:59 INFO WorkerWebUI: Started Worker web UI at
> http://s2:8081
> 14/02/07 07:00:59 INFO Worker: Connecting to master
> spark://s1.machine.org:7077...
> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
> sparkMaster@s1.machine.org:7077]: Error [Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: s1.machine.org/192.168.64.121:7077
> ]
> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
> sparkMaster@s1.machine.org:7077]: Error [Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: s1.machine.org/192.168.64.121:7077
> ]
> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
> sparkMaster@s1.machine.org:7077]: Error [Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: s1.machine.org/192.168.64.121:7077
> ]
> 14/02/07 07:00:59 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@s2:49614] -> [akka.tcp://
> sparkMaster@s1.machine.org:7077]: Error [Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@s1.machine.org:7077]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: s1.machine.org/192.168.64.121:7077
> ]
> 14/02/07 07:00:59 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#607746123] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> ...
>
> 14/02/07 07:01:59 ERROR Worker: All masters are unresponsive! Giving up.
> ===============================================
>



-- 

Sourav Chandra

Senior Software Engineer

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

sourav.chandra@livestream.com

o: +91 80 4121 8723

m: +91 988 699 3746

skype: sourav.chandra

Livestream

"Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
Block, Koramangala Industrial Area,

Bangalore 560034

www.livestream.com

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Gino Bustelo <lb...@gmail.com>.
I've been playing with the amplab docker scripts and I needed to set spark.driver.host to the driver host ip. One that all spark processes can get to. 

> On May 28, 2014, at 4:35 AM, jaranda <jo...@bsc.es> wrote:
> 
> Same here, got stuck at this point. Any hints on what might be going on?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p6463.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by jaranda <jo...@bsc.es>.
Same here, got stuck at this point. Any hints on what might be going on?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p6463.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by "Li, Rui" <ru...@intel.com>.
Hi Akhil,

Thanks for your e-mail.
I figured out that the previous problem is because AKKA uses FQDN as the master URL, while the worker uses only hostname. I set STANDALONE_SPARK_MASTER_HOST=`hostname -f` in spark-env.sh, and now worker can connect to master successfully. I checked the master web UI and can see the worker is listed there.

However I run into another problem when I try to run some example with:
./run-example org.apache.spark.examples.SparkPi spark://server-1633.novalocal:7077
The stack trace is:
14/02/26 10:06:10 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/02/26 10:06:10 WARN TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi$$anonfun$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1592)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1513)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1749)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
    at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1768)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:679)

I verified that the spark-examples-assembly jar is included in the classpath and SparkPi$$anonfun$1.class is in that jar.

I then checked the logs of master and worker. I found following error in master's log:
14/02/26 10:06:07 INFO Master: Registering app SparkPi
14/02/26 10:06:07 INFO Master: Registered app SparkPi with ID app-20140226100607-0000
14/02/26 10:06:07 INFO Master: Launching executor app-20140226100607-0000/0 on worker worker-20140226100542-server-1633.novalocal-7078
14/02/26 10:06:10 INFO Master: akka.tcp://spark@server-1633.novalocal:35408 got disassociated, removing it.
14/02/26 10:06:10 INFO Master: Removing app app-20140226100607-0000
14/02/26 10:06:10 INFO Master: akka.tcp://spark@server-1633.novalocal:35408 got disassociated, removing it.
14/02/26 10:06:10 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.20.20%3A34071-2#-1821201423] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/02/26 10:06:10 INFO Master: akka.tcp://spark@server-1633.novalocal:35408 got disassociated, removing it.
14/02/26 10:06:10 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@server-1633.novalocal:7077] -> [akka.tcp://spark@server-1633.novalocal:35408]: Error [Association failed with [akka.tcp://spark@server-1633.novalocal:35408]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@server-1633.novalocal:35408]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: server-1633.novalocal/192.168.20.20:35408
...

I found the following in worker's log:
14/02/26 10:05:42 INFO Worker: Connecting to master spark://server-1633.novalocal:7077...
14/02/26 10:05:43 INFO Worker: Successfully registered with master spark://server-1633.novalocal:7077
14/02/26 10:06:07 INFO Worker: Asked to launch executor app-20140226100607-0000/0 for SparkPi
14/02/26 10:06:07 INFO ExecutorRunner: Launch command: "java" "-cp" ":/usr/lib/spark/conf:/usr/lib/spark/jars/spark-assembly-0.9.0-hadoop2.2.0.jar:/etc/hadoop/conf" "-Djava.library.path=/usr/lib/spark/lib:/usr/lib/hadoop/lib/native" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@server-1633.novalocal:35408/user/CoarseGrainedScheduler" "0" "server-1633.novalocal" "8" "akka.tcp://sparkWorker@server-1633.novalocal:7078/user/Worker" "app-20140226100607-0000"
14/02/26 10:06:10 INFO Worker: Asked to kill executor app-20140226100607-0000/0
14/02/26 10:06:10 INFO ExecutorRunner: Killing process!
14/02/26 10:06:10 INFO ExecutorRunner: Runner thread for executor app-20140226100607-0000/0 interrupted
14/02/26 10:06:11 INFO Worker: Executor app-20140226100607-0000/0 finished with state KILLED
14/02/26 10:06:11 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40192.168.20.20%3A34539-2#1991828737] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
...

Any ideas will be appreciated.
From: Akhil Das [mailto:akhil@mobipulse.in]
Sent: Tuesday, February 25, 2014 11:29 PM
To: user@spark.apache.org
Subject: Re: Akka Connection refused - standalone cluster using spark-0.9.0

Hi Rui,

If you are getting a "Connection refused" exception, You can resolve it by checking

=> Master is running on the specific host

  *    netstat -at | grep 7077
You will get something similar to:

  *   tcp        0      0 akhldz.master.io:7077<http://akhldz.master.io:7077> *:*                         LISTEN

If that is the case, then from your worker machine do a

  *   host akhldz.master.io<http://akhldz.master.io> ( replace akhldz.master.io<http://akhldz.master.io> with your master host. If something goes wrong, then add a host entry in your /etc/hosts file)
  *   telnet akhldz.master.io<http://akhldz.master.io> 7077 ( If this is not connecting, then your worker wont connect either. )

=> Adding Host entry in /etc/hosts

Open /etc/hosts from your worker machine and add the following entry (example)

192.168.100.20   akhldz.master.io<http://akhldz.master.io>

PS :In the above case Pillis was having two ip addresses having same host name
eg:
192.168.100.40  s1.machine.org<http://s1.machine.org:7077/>
192.168.100.41  s1.machine.org<http://s1.machine.org:7077/>


Hope that help, Please do post your stack trace if that doesn't solve your problem.


On Tue, Feb 25, 2014 at 7:33 PM, Li, Rui <ru...@intel.com>> wrote:
Hi Pillis,

I met with the same problem here. Could you share how you solved the issue more specifically?
I added an entry in /etc/hosts, but it doesn't help.

From: Pillis W [mailto:pillis.work@gmail.com<ma...@gmail.com>]
Sent: Sunday, February 09, 2014 4:49 AM
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: Re: Akka Connection refused - standalone cluster using spark-0.9.0

I fixed my issue - two IP addresses had the same hostname.
Regards



On Fri, Feb 7, 2014 at 12:59 PM, Soumya Simanta <so...@gmail.com>> wrote:
I see similar logs but only when I try to run a standalone Scala program. The whole setup works just fine if I'm using the spark-shell/REPL.



On Fri, Feb 7, 2014 at 3:05 PM, mohankreddy <mr...@beanatomics.com>> wrote:
Here's more information. I have the master up but when I try to get the
workers up I am getting the following error.

log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
cores, 67.0 GB RAM
14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
http://yyyyyyyyy:8081
14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.



PS: I masked the IPs



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.





--
Thanks
Best Regards

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Akhil Das <ak...@mobipulse.in>.
Hi Rui,

If you are getting a "Connection refused" exception, You can resolve it by
checking

*=> Master is running on the specific host*


   -  *netstat -at | grep 7077*

You will get something similar to:


   - *tcp        0      0 akhldz.master.io:7077
   <http://akhldz.master.io:7077> *:*                         LISTEN *


If that is the case, then from your worker machine do a


   - *host akhldz.master.io <http://akhldz.master.io> *( replace
   akhldz.master.io with your master host. If something goes wrong, then
   add a host entry in your /etc/hosts file)
   - *telnet akhldz.master.io <http://akhldz.master.io> 7077 *( If this is
   not connecting, then your worker wont connect either. )


*=> Adding Host entry in /etc/hosts*


Open /etc/hosts from your worker machine and add the following entry
(example)

*192.168.100.20   akhldz.master.io <http://akhldz.master.io>*


*PS :In the above case Pillis was having two ip addresses having same host
name*

eg:
192.168.100.40  s1.machine.org <http://s1.machine.org:7077/>
192.168.100.41  s1.machine.org <http://s1.machine.org:7077/>


Hope that help, Please do post your stack trace if that doesn't solve your
problem.



On Tue, Feb 25, 2014 at 7:33 PM, Li, Rui <ru...@intel.com> wrote:

>  Hi Pillis,
>
>
>
> I met with the same problem here. Could you share how you solved the issue
> more specifically?
>
> I added an entry in /etc/hosts, but it doesn't help.
>
>
>
> *From:* Pillis W [mailto:pillis.work@gmail.com]
> *Sent:* Sunday, February 09, 2014 4:49 AM
> *To:* user@spark.incubator.apache.org
> *Subject:* Re: Akka Connection refused - standalone cluster using
> spark-0.9.0
>
>
>
> I fixed my issue - two IP addresses had the same hostname.
>
> Regards
>
>
>
>
>
>
>
> On Fri, Feb 7, 2014 at 12:59 PM, Soumya Simanta <so...@gmail.com>
> wrote:
>
> I see similar logs but only when I try to run a standalone Scala program.
> The whole setup works just fine if I'm using the spark-shell/REPL.
>
>
>
>
>
>
>
> On Fri, Feb 7, 2014 at 3:05 PM, mohankreddy <mr...@beanatomics.com>
> wrote:
>
> Here's more information. I have the master up but when I try to get the
> workers up I am getting the following error.
>
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jLogger).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> 14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
> cores, 67.0 GB RAM
> 14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
> 14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
> http://yyyyyyyyy:8081
> 14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
> 14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
>
> Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> 14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
> 14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
>
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> 14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
> 14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
>
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> 14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.
>
>
>
> PS: I masked the IPs
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>
>
>



-- 
Thanks
Best Regards

RE: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by "Li, Rui" <ru...@intel.com>.
Hi Pillis,

I met with the same problem here. Could you share how you solved the issue more specifically?
I added an entry in /etc/hosts, but it doesn't help.

From: Pillis W [mailto:pillis.work@gmail.com]
Sent: Sunday, February 09, 2014 4:49 AM
To: user@spark.incubator.apache.org
Subject: Re: Akka Connection refused - standalone cluster using spark-0.9.0

I fixed my issue - two IP addresses had the same hostname.
Regards



On Fri, Feb 7, 2014 at 12:59 PM, Soumya Simanta <so...@gmail.com>> wrote:
I see similar logs but only when I try to run a standalone Scala program. The whole setup works just fine if I'm using the spark-shell/REPL.



On Fri, Feb 7, 2014 at 3:05 PM, mohankreddy <mr...@beanatomics.com>> wrote:
Here's more information. I have the master up but when I try to get the
workers up I am getting the following error.

log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
cores, 67.0 GB RAM
14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
http://yyyyyyyyy:8081
14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035<tel:2037095035>] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.



PS: I masked the IPs



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Pillis W <pi...@gmail.com>.
I fixed my issue - two IP addresses had the same hostname.
Regards




On Fri, Feb 7, 2014 at 12:59 PM, Soumya Simanta <so...@gmail.com>wrote:

> I see similar logs but only when I try to run a standalone Scala program.
> The whole setup works just fine if I'm using the spark-shell/REPL.
>
>
>
>
> On Fri, Feb 7, 2014 at 3:05 PM, mohankreddy <mr...@beanatomics.com>wrote:
>
>> Here's more information. I have the master up but when I try to get the
>> workers up I am getting the following error.
>>
>> log4j:WARN No appenders could be found for logger
>> (akka.event.slf4j.Slf4jLogger).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>> more info.
>> 14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
>> cores, 67.0 GB RAM
>> 14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
>> 14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
>> http://yyyyyyyyy:8081
>> 14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
>> 14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
>> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
>> Actor[akka://sparkWorker/user/Worker#2037095035] to
>> Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
>> encountered. This logging can be turned off or adjusted with configuration
>> settings 'akka.log-dead-letters' and
>> 'akka.log-dead-letters-during-shutdown'.
>> 14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
>> 14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
>> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
>> Actor[akka://sparkWorker/user/Worker#2037095035] to
>> Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
>> encountered. This logging can be turned off or adjusted with configuration
>> settings 'akka.log-dead-letters' and
>> 'akka.log-dead-letters-during-shutdown'.
>> 14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
>> 14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
>> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
>> Actor[akka://sparkWorker/user/Worker#2037095035] to
>> Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
>> encountered. This logging can be turned off or adjusted with configuration
>> settings 'akka.log-dead-letters' and
>> 'akka.log-dead-letters-during-shutdown'.
>> 14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.
>>
>>
>>
>> PS: I masked the IPs
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by irina <fe...@gmail.com>.
Hi ssimanta,
were you able to resolve the problem with failing standalone scala program,
but spark repl working just fine? I am getting the same issue...
Thanks,
Irina



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p15684.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by Soumya Simanta <so...@gmail.com>.
I see similar logs but only when I try to run a standalone Scala program.
The whole setup works just fine if I'm using the spark-shell/REPL.




On Fri, Feb 7, 2014 at 3:05 PM, mohankreddy <mr...@beanatomics.com> wrote:

> Here's more information. I have the master up but when I try to get the
> workers up I am getting the following error.
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jLogger).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> 14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
> cores, 67.0 GB RAM
> 14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
> 14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
> http://yyyyyyyyy:8081
> 14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
> 14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
> 14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
> 14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
> 14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
> 14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
> Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
> Actor[akka://sparkWorker/user/Worker#2037095035] to
> Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
> 14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.
>
>
>
> PS: I masked the IPs
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Akka Connection refused - standalone cluster using spark-0.9.0

Posted by mohankreddy <mr...@beanatomics.com>.
Here's more information. I have the master up but when I try to get the
workers up I am getting the following error.

log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/02/07 15:01:17 INFO Worker: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/02/07 15:01:17 INFO Worker: Starting Spark worker yyyyyyy:58020 with 16
cores, 67.0 GB RAM
14/02/07 15:01:17 INFO Worker: Spark home: /opt/spark
14/02/07 15:01:17 INFO WorkerWebUI: Started Worker web UI at
http://yyyyyyyyy:8081
14/02/07 15:01:17 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:17 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [1] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:37 INFO Worker: Connecting to master spark://xxxxx/:7077...
14/02/07 15:01:37 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:01:57 INFO Worker: Connecting to master spark://xxxx/:7077...
14/02/07 15:01:57 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$RegisterWorker] from
Actor[akka://sparkWorker/user/Worker#2037095035] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/07 15:02:17 ERROR Worker: All masters are unresponsive! Giving up.



PS: I masked the IPs 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p1311.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.