You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Somnath Pandeya <So...@infosys.com> on 2015/01/05 08:07:28 UTC

spark worker nodes getting disassociated while running hive on spark

Hi,

I have setup the spark 1.2 standalone cluster and trying to run hive on spark by following  below link.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

I got the latest build of hive on spark from git and was trying to running few queries. Queries are running fine for some time and after that I am getting following errors

Error on master node
15/01/05 12:16:59 INFO actor.LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40xx.xx.xx.xx%3A34823-1#1101564287] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/01/05 12:16:59 INFO master.Master: akka.tcp://sparkWorker@machinename:58392 got disassociated, removing it.
15/01/05 12:16:59 INFO master.Master: Removing worker worker-20150105120340-machine-58392 on indhyhdppocap03.infosys-platforms.com:58392

Error on slave node

15/01/05 12:20:21 INFO transport.ProtocolStateActor: No response from remote. Handshake timed out or transport failure detector triggered.
15/01/05 12:20:21 INFO actor.LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40machineName%3A7077-1#-1301148631] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
15/01/05 12:20:21 INFO worker.Worker: Disassociated [akka.tcp://sparkWorker@machineName:58392] -> [akka.tcp://sparkMaster@machineName:7077] Disassociated !
15/01/05 12:20:21 ERROR worker.Worker: Connection to master failed! Waiting for master to reconnect...
15/01/05 12:20:21 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@machineName:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].


Please Help

-Somnath

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Re: spark worker nodes getting disassociated while running hive on spark

Posted by Xuefu Zhang <xz...@cloudera.com>.
Hi Somnath,

The error seems nothing to do with Hive. I haven't seen this problem, but
I'm wondering if your cluster has any configuration issue, especially the
timeout values for network communications. The default values worked well
for us fine.

If the problem persists, please provide detailed information about your
spark build and hive build.

Thanks,
Xuefu




On Sun, Jan 4, 2015 at 11:07 PM, Somnath Pandeya <
Somnath_Pandeya@infosys.com> wrote:

>  Hi,
>
>
>
> I have setup the spark 1.2 standalone cluster and trying to run hive on
> spark by following  below link.
>
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>
>
>
> I got the latest build of hive on spark from git and was trying to running
> few queries. Queries are running fine for some time and after that I am
> getting following errors
>
>
>
> Error on master node
>
> 15/01/05 12:16:59 INFO actor.LocalActorRef: Message
> [akka.remote.transport.AssociationHandle$Disassociated] from
> Actor[akka://sparkMaster/deadLetters] to
> Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40xx.xx.xx.xx%3A34823-1#1101564287]
> was not delivered. [1] dead letters encountered. This logging can be turned
> off or adjusted with configuration settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> 15/01/05 12:16:59 INFO master.Master: akka.tcp://sparkWorker@machinename:58392
> got disassociated, removing it.
>
> 15/01/05 12:16:59 INFO master.Master: Removing worker
> worker-20150105120340-machine-58392 on
> indhyhdppocap03.infosys-platforms.com:58392
>
>
>
> Error on slave node
>
>
>
> 15/01/05 12:20:21 INFO transport.ProtocolStateActor: No response from
> remote. Handshake timed out or transport failure detector triggered.
>
> 15/01/05 12:20:21 INFO actor.LocalActorRef: Message
> [akka.remote.transport.AssociationHandle$Disassociated] from
> Actor[akka://sparkWorker/deadLetters] to
> Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40machineName%3A7077-1#-1301148631]
> was not delivered. [1] dead letters encountered. This logging can be turned
> off or adjusted with configuration settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> 15/01/05 12:20:21 INFO worker.Worker: Disassociated
> [akka.tcp://sparkWorker@machineName:58392] ->
> [akka.tcp://sparkMaster@machineName:7077] Disassociated !
>
> 15/01/05 12:20:21 ERROR worker.Worker: Connection to master failed!
> Waiting for master to reconnect...
>
> 15/01/05 12:20:21 WARN remote.ReliableDeliverySupervisor: Association with
> remote system [akka.tcp://sparkMaster@machineName:7077] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
>
>
>
>
>
> Please Help
>
>
>
> -Somnath
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are not
> to copy, disclose, or distribute this e-mail or its contents to any other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
> every reasonable precaution to minimize this risk, but is not liable for any damage
> you may sustain as a result of any virus in this e-mail. You should carry out your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>