You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Chun-fan Ivan Liao <iv...@ivangelion.tw> on 2013/08/15 03:34:31 UTC

Machine hangs from time to time

Hi,

We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1
NN/JT, 1 SNN/DN & several DNs.

>From time to time, some of the servers just hanged, cannot be pinged,
screen blackened out, not responding to keyboard input and lost connection
with the NN. Lately, one DN was hanged even when there is no job to run.
Specifically, the unresponsive happens not on all machines. It usually
happens on several specific DNs.

How to tackle this problem? Does it leave a trace when the system
crashes/hangs?

Any help would be greatly appreciated.

RE: Machine hangs from time to time

Posted by Leo Leung <ll...@ddn.com>.
I doubt this is related to hadoop / java code,
since you mention there is no keyboard / console response and only on specific DN.

You may want to enable or check Linux abrtd, (base Linux tool) to help troubleshoot system level crashes (if any)

Find out if this is related to hardware, such has thermal dissipation problem (running too hot).

Hope this helps.
Good luck.


From: Chun-fan Ivan Liao [mailto:ivan@ivangelion.tw]
Sent: Wednesday, August 14, 2013 6:35 PM
To: user@hadoop.apache.org
Subject: Machine hangs from time to time

Hi,

We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs.

From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.

How to tackle this problem? Does it leave a trace when the system crashes/hangs?

Any help would be greatly appreciated.



Re: Machine hangs from time to time

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
How many map/reduce slots are you running per TT? How much memory is available per node? Did you enable memory management? - See http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring 

Thanks,
+Vinod

On Aug 14, 2013, at 6:34 PM, Chun-fan Ivan Liao wrote:

> Hi,
> 
> We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs. 
> 
> From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.
> 
> How to tackle this problem? Does it leave a trace when the system crashes/hangs?
> 
> Any help would be greatly appreciated. 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Machine hangs from time to time

Posted by Leo Leung <ll...@ddn.com>.
I doubt this is related to hadoop / java code,
since you mention there is no keyboard / console response and only on specific DN.

You may want to enable or check Linux abrtd, (base Linux tool) to help troubleshoot system level crashes (if any)

Find out if this is related to hardware, such has thermal dissipation problem (running too hot).

Hope this helps.
Good luck.


From: Chun-fan Ivan Liao [mailto:ivan@ivangelion.tw]
Sent: Wednesday, August 14, 2013 6:35 PM
To: user@hadoop.apache.org
Subject: Machine hangs from time to time

Hi,

We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs.

From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.

How to tackle this problem? Does it leave a trace when the system crashes/hangs?

Any help would be greatly appreciated.



Re: Machine hangs from time to time

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
How many map/reduce slots are you running per TT? How much memory is available per node? Did you enable memory management? - See http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring 

Thanks,
+Vinod

On Aug 14, 2013, at 6:34 PM, Chun-fan Ivan Liao wrote:

> Hi,
> 
> We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs. 
> 
> From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.
> 
> How to tackle this problem? Does it leave a trace when the system crashes/hangs?
> 
> Any help would be greatly appreciated. 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Machine hangs from time to time

Posted by Leo Leung <ll...@ddn.com>.
I doubt this is related to hadoop / java code,
since you mention there is no keyboard / console response and only on specific DN.

You may want to enable or check Linux abrtd, (base Linux tool) to help troubleshoot system level crashes (if any)

Find out if this is related to hardware, such has thermal dissipation problem (running too hot).

Hope this helps.
Good luck.


From: Chun-fan Ivan Liao [mailto:ivan@ivangelion.tw]
Sent: Wednesday, August 14, 2013 6:35 PM
To: user@hadoop.apache.org
Subject: Machine hangs from time to time

Hi,

We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs.

From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.

How to tackle this problem? Does it leave a trace when the system crashes/hangs?

Any help would be greatly appreciated.



Re: Machine hangs from time to time

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
How many map/reduce slots are you running per TT? How much memory is available per node? Did you enable memory management? - See http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring 

Thanks,
+Vinod

On Aug 14, 2013, at 6:34 PM, Chun-fan Ivan Liao wrote:

> Hi,
> 
> We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs. 
> 
> From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.
> 
> How to tackle this problem? Does it leave a trace when the system crashes/hangs?
> 
> Any help would be greatly appreciated. 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Machine hangs from time to time

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
How many map/reduce slots are you running per TT? How much memory is available per node? Did you enable memory management? - See http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring 

Thanks,
+Vinod

On Aug 14, 2013, at 6:34 PM, Chun-fan Ivan Liao wrote:

> Hi,
> 
> We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs. 
> 
> From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.
> 
> How to tackle this problem? Does it leave a trace when the system crashes/hangs?
> 
> Any help would be greatly appreciated. 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Machine hangs from time to time

Posted by Leo Leung <ll...@ddn.com>.
I doubt this is related to hadoop / java code,
since you mention there is no keyboard / console response and only on specific DN.

You may want to enable or check Linux abrtd, (base Linux tool) to help troubleshoot system level crashes (if any)

Find out if this is related to hardware, such has thermal dissipation problem (running too hot).

Hope this helps.
Good luck.


From: Chun-fan Ivan Liao [mailto:ivan@ivangelion.tw]
Sent: Wednesday, August 14, 2013 6:35 PM
To: user@hadoop.apache.org
Subject: Machine hangs from time to time

Hi,

We are using Hadoop 1.0.3 on Ubuntu 12.04.2 LTS. Hadoop servers include 1 NN/JT, 1 SNN/DN & several DNs.

From time to time, some of the servers just hanged, cannot be pinged, screen blackened out, not responding to keyboard input and lost connection with the NN. Lately, one DN was hanged even when there is no job to run. Specifically, the unresponsive happens not on all machines. It usually happens on several specific DNs.

How to tackle this problem? Does it leave a trace when the system crashes/hangs?

Any help would be greatly appreciated.