You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sagar Naik <sn...@attributor.com> on 2008/12/01 22:00:49 UTC

Hadoop datanode crashed - SIGBUS

Couple of the datanodes crashed with the following error
The /tmp is 15% occupied

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
#
[Too many errors, abort]

 Pl suggest how should I go to debug this particular problem


-Sagar

Re: Hadoop datanode crashed - SIGBUS

Posted by Chris Collins <ch...@scoutlabs.com>.

I had some pretty bad issues with leaks in _07.   _10 btw has a lot of  
bug fixes.  I dont know it would fix this problem.  As for flags I  
wouldnt know.  One thing you could try is to try and match the memory  
region that the program counter matches.  If you use jstack or jmap,  
cant remember which, it will give you a dump of all the libraries and  
their memory address range.  From that you may see if the PCounter  
matches anything interesting.

Other than that I would go with Brians recommendations.

C
On Dec 1, 2008, at 1:59 PM, Sagar Naik wrote:

>
> hi,
> I dont have additional information on it. If u know any other flag  
> tht I need to turn on , pl do tell me . The flags tht are currently  
> on  are " -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC - 
> Dcom.sun.management.jmxremote"
> But this is what is listed in stdout (datanode.out) file
>
> Java version :
> java version "1.6.0_07"
> Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
> Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)
>
>
> I will try to stress test the memory.
>
> -Sagar
>
> Chris Collins wrote:
>> Was there anything mentioned as part of the tombstone message about  
>> "problematic frame"?  What java are you using?  There are a few  
>> reasons for SIGBUS errors, one is illegal address alignment, but  
>> from java thats very unlikely....there were some issues with the  
>> native zip library in older vm's.  As Brian pointed out, sometimes  
>> this points to a hw issue.
>>
>> C
>> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>>
>>>
>>>
>>> Brian Bockelman wrote:
>>>> Hardware/memory problems?
>>> I m not sure.
>>>>
>>>> SIGBUS is relatively rare; it sometimes indicates a hardware  
>>>> error in the memory system, depending on your arch.
>>>>
>>> *uname -a : *
>>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST  
>>> 2006 i686 i686 i386 GNU/Linux
>>> *top's top*
>>> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0%  
>>> hi,  0.0% si
>>> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k  
>>> buffers
>>> Swap: 16386292k total,       68k used, 16386224k free,   522408k  
>>> cached
>>>
>>> 8 core , xeon  2GHz
>>>
>>>> Brian
>>>>
>>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>>
>>>>> Couple of the datanodes crashed with the following error
>>>>> The /tmp is 15% occupied
>>>>>
>>>>> #
>>>>> # An unexpected error has been detected by Java Runtime  
>>>>> Environment:
>>>>> #
>>>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>>> #
>>>>> [Too many errors, abort]
>>>>>
>>>>> Pl suggest how should I go to debug this particular problem
>>>>>
>>>>>
>>>>> -Sagar
>>>>
>>>
>>> Thanks to Brian
>>>
>>> -Sagar
>>
>

Re: Hadoop datanode crashed - SIGBUS

Posted by Sagar Naik <sn...@attributor.com>.

hi,
I dont have additional information on it. If u know any other flag tht I 
need to turn on , pl do tell me . The flags tht are currently on  are " 
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC 
-Dcom.sun.management.jmxremote"
But this is what is listed in stdout (datanode.out) file

Java version :
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)


I will try to stress test the memory.

-Sagar

Chris Collins wrote:
> Was there anything mentioned as part of the tombstone message about 
> "problematic frame"?  What java are you using?  There are a few 
> reasons for SIGBUS errors, one is illegal address alignment, but from 
> java thats very unlikely....there were some issues with the native zip 
> library in older vm's.  As Brian pointed out, sometimes this points to 
> a hw issue.
>
> C
> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error 
>>> in the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 
>> 2006 i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,  
>> 0.0% si
>> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k buffers
>> Swap: 16386292k total,       68k used, 16386224k free,   522408k cached
>>
>> 8 core , xeon  2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>

Re: Hadoop datanode crashed - SIGBUS

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

FYI : Datanode does not run any user code and does not link with any 
native/JNI code.

Raghu.

Chris Collins wrote:
> Was there anything mentioned as part of the tombstone message about 
> "problematic frame"?  What java are you using?  There are a few reasons 
> for SIGBUS errors, one is illegal address alignment, but from java thats 
> very unlikely....there were some issues with the native zip library in 
> older vm's.  As Brian pointed out, sometimes this points to a hw issue.
> 
> C
> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
> 
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error in 
>>> the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 
>> i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,  
>> 0.0% si
>> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k buffers
>> Swap: 16386292k total,       68k used, 16386224k free,   522408k cached
>>
>> 8 core , xeon  2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>

Re: Hadoop datanode crashed - SIGBUS

Posted by Chris Collins <ch...@scoutlabs.com>.

Was there anything mentioned as part of the tombstone message about  
"problematic frame"?  What java are you using?  There are a few  
reasons for SIGBUS errors, one is illegal address alignment, but from  
java thats very unlikely....there were some issues with the native zip  
library in older vm's.  As Brian pointed out, sometimes this points to  
a hw issue.

C
On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:

>
>
> Brian Bockelman wrote:
>> Hardware/memory problems?
> I m not sure.
>>
>> SIGBUS is relatively rare; it sometimes indicates a hardware error  
>> in the memory system, depending on your arch.
>>
> *uname -a : *
> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST  
> 2006 i686 i686 i386 GNU/Linux
> *top's top*
> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,   
> 0.0% si
> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k  
> buffers
> Swap: 16386292k total,       68k used, 16386224k free,   522408k  
> cached
>
> 8 core , xeon  2GHz
>
>> Brian
>>
>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>
>>> Couple of the datanodes crashed with the following error
>>> The /tmp is 15% occupied
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>> #
>>> [Too many errors, abort]
>>>
>>> Pl suggest how should I go to debug this particular problem
>>>
>>>
>>> -Sagar
>>
>
> Thanks to Brian
>
> -Sagar

Re: Hadoop datanode crashed - SIGBUS

Posted by Sagar Naik <sn...@attributor.com>.

None of the jobs use compression for sure

-Sagar
Brian Bockelman wrote:
> I'd run memcheck overnight on the nodes that caused the problem, just 
> to be sure.
>
> Another (unlikely) possibility is that the JNI callouts for the native 
> libraries Hadoop use (for the Compression codecs, I believe) have 
> crashed or were set up wrong, and died fatally enough to take out the 
> JVM.  Are you using any compression?  Does your job complete 
> successfully in "local" mode, if the crash correlates well with a job 
> running?
>
> Brian
>
> On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:
>
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error 
>>> in the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 
>> 2006 i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,  
>> 0.0% si
>> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k buffers
>> Swap: 16386292k total,       68k used, 16386224k free,   522408k cached
>>
>> 8 core , xeon  2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>

Re: Hadoop datanode crashed - SIGBUS

Posted by Brian Bockelman <bb...@cse.unl.edu>.

I'd run memcheck overnight on the nodes that caused the problem, just  
to be sure.

Another (unlikely) possibility is that the JNI callouts for the native  
libraries Hadoop use (for the Compression codecs, I believe) have  
crashed or were set up wrong, and died fatally enough to take out the  
JVM.  Are you using any compression?  Does your job complete  
successfully in "local" mode, if the crash correlates well with a job  
running?

Brian

On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:

>
>
> Brian Bockelman wrote:
>> Hardware/memory problems?
> I m not sure.
>>
>> SIGBUS is relatively rare; it sometimes indicates a hardware error  
>> in the memory system, depending on your arch.
>>
> *uname -a : *
> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST  
> 2006 i686 i686 i386 GNU/Linux
> *top's top*
> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,   
> 0.0% si
> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k  
> buffers
> Swap: 16386292k total,       68k used, 16386224k free,   522408k  
> cached
>
> 8 core , xeon  2GHz
>
>> Brian
>>
>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>
>>> Couple of the datanodes crashed with the following error
>>> The /tmp is 15% occupied
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>> #
>>> [Too many errors, abort]
>>>
>>> Pl suggest how should I go to debug this particular problem
>>>
>>>
>>> -Sagar
>>
>
> Thanks to Brian
>
> -Sagar

Re: Hadoop datanode crashed - SIGBUS

Posted by Sagar Naik <sn...@attributor.com>.


Brian Bockelman wrote:
> Hardware/memory problems?
I m not sure.
>
> SIGBUS is relatively rare; it sometimes indicates a hardware error in 
> the memory system, depending on your arch.
>
*uname -a : *
Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 
i686 i686 i386 GNU/Linux
*top's top*
Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,  0.0% si
Mem:   8288280k total,  1575680k used,  6712600k free,     5392k buffers
Swap: 16386292k total,       68k used, 16386224k free,   522408k cached

8 core , xeon  2GHz

> Brian
>
> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>
>> Couple of the datanodes crashed with the following error
>> The /tmp is 15% occupied
>>
>> #
>> # An unexpected error has been detected by Java Runtime Environment:
>> #
>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>> #
>> [Too many errors, abort]
>>
>> Pl suggest how should I go to debug this particular problem
>>
>>
>> -Sagar
>

Thanks to Brian

-Sagar

Re: Hadoop datanode crashed - SIGBUS

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hardware/memory problems?

SIGBUS is relatively rare; it sometimes indicates a hardware error in  
the memory system, depending on your arch.

Brian

On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:

> Couple of the datanodes crashed with the following error
> The /tmp is 15% occupied
>
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
> #
> [Too many errors, abort]
>
> Pl suggest how should I go to debug this particular problem
>
>
> -Sagar