You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sagar Naik <sn...@attributor.com> on 2008/12/01 22:00:49 UTC
Hadoop datanode crashed - SIGBUS
Couple of the datanodes crashed with the following error
The /tmp is 15% occupied
#
# An unexpected error has been detected by Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
#
[Too many errors, abort]
Pl suggest how should I go to debug this particular problem
-Sagar
Re: Hadoop datanode crashed - SIGBUS
Posted by Chris Collins <ch...@scoutlabs.com>.
I had some pretty bad issues with leaks in _07. _10 btw has a lot of
bug fixes. I dont know it would fix this problem. As for flags I
wouldnt know. One thing you could try is to try and match the memory
region that the program counter matches. If you use jstack or jmap,
cant remember which, it will give you a dump of all the libraries and
their memory address range. From that you may see if the PCounter
matches anything interesting.
Other than that I would go with Brians recommendations.
C
On Dec 1, 2008, at 1:59 PM, Sagar Naik wrote:
>
> hi,
> I dont have additional information on it. If u know any other flag
> tht I need to turn on , pl do tell me . The flags tht are currently
> on are " -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -
> Dcom.sun.management.jmxremote"
> But this is what is listed in stdout (datanode.out) file
>
> Java version :
> java version "1.6.0_07"
> Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
> Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)
>
>
> I will try to stress test the memory.
>
> -Sagar
>
> Chris Collins wrote:
>> Was there anything mentioned as part of the tombstone message about
>> "problematic frame"? What java are you using? There are a few
>> reasons for SIGBUS errors, one is illegal address alignment, but
>> from java thats very unlikely....there were some issues with the
>> native zip library in older vm's. As Brian pointed out, sometimes
>> this points to a hw issue.
>>
>> C
>> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>>
>>>
>>>
>>> Brian Bockelman wrote:
>>>> Hardware/memory problems?
>>> I m not sure.
>>>>
>>>> SIGBUS is relatively rare; it sometimes indicates a hardware
>>>> error in the memory system, depending on your arch.
>>>>
>>> *uname -a : *
>>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
>>> 2006 i686 i686 i386 GNU/Linux
>>> *top's top*
>>> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0%
>>> hi, 0.0% si
>>> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k
>>> buffers
>>> Swap: 16386292k total, 68k used, 16386224k free, 522408k
>>> cached
>>>
>>> 8 core , xeon 2GHz
>>>
>>>> Brian
>>>>
>>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>>
>>>>> Couple of the datanodes crashed with the following error
>>>>> The /tmp is 15% occupied
>>>>>
>>>>> #
>>>>> # An unexpected error has been detected by Java Runtime
>>>>> Environment:
>>>>> #
>>>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>>> #
>>>>> [Too many errors, abort]
>>>>>
>>>>> Pl suggest how should I go to debug this particular problem
>>>>>
>>>>>
>>>>> -Sagar
>>>>
>>>
>>> Thanks to Brian
>>>
>>> -Sagar
>>
>
Re: Hadoop datanode crashed - SIGBUS
Posted by Sagar Naik <sn...@attributor.com>.
hi,
I dont have additional information on it. If u know any other flag tht I
need to turn on , pl do tell me . The flags tht are currently on are "
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC
-Dcom.sun.management.jmxremote"
But this is what is listed in stdout (datanode.out) file
Java version :
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)
I will try to stress test the memory.
-Sagar
Chris Collins wrote:
> Was there anything mentioned as part of the tombstone message about
> "problematic frame"? What java are you using? There are a few
> reasons for SIGBUS errors, one is illegal address alignment, but from
> java thats very unlikely....there were some issues with the native zip
> library in older vm's. As Brian pointed out, sometimes this points to
> a hw issue.
>
> C
> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error
>>> in the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
>> 2006 i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
>> 0.0% si
>> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k buffers
>> Swap: 16386292k total, 68k used, 16386224k free, 522408k cached
>>
>> 8 core , xeon 2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>
Re: Hadoop datanode crashed - SIGBUS
Posted by Raghu Angadi <ra...@yahoo-inc.com>.
FYI : Datanode does not run any user code and does not link with any
native/JNI code.
Raghu.
Chris Collins wrote:
> Was there anything mentioned as part of the tombstone message about
> "problematic frame"? What java are you using? There are a few reasons
> for SIGBUS errors, one is illegal address alignment, but from java thats
> very unlikely....there were some issues with the native zip library in
> older vm's. As Brian pointed out, sometimes this points to a hw issue.
>
> C
> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error in
>>> the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006
>> i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
>> 0.0% si
>> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k buffers
>> Swap: 16386292k total, 68k used, 16386224k free, 522408k cached
>>
>> 8 core , xeon 2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>
Re: Hadoop datanode crashed - SIGBUS
Posted by Chris Collins <ch...@scoutlabs.com>.
Was there anything mentioned as part of the tombstone message about
"problematic frame"? What java are you using? There are a few
reasons for SIGBUS errors, one is illegal address alignment, but from
java thats very unlikely....there were some issues with the native zip
library in older vm's. As Brian pointed out, sometimes this points to
a hw issue.
C
On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>
>
> Brian Bockelman wrote:
>> Hardware/memory problems?
> I m not sure.
>>
>> SIGBUS is relatively rare; it sometimes indicates a hardware error
>> in the memory system, depending on your arch.
>>
> *uname -a : *
> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
> 2006 i686 i686 i386 GNU/Linux
> *top's top*
> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
> 0.0% si
> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k
> buffers
> Swap: 16386292k total, 68k used, 16386224k free, 522408k
> cached
>
> 8 core , xeon 2GHz
>
>> Brian
>>
>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>
>>> Couple of the datanodes crashed with the following error
>>> The /tmp is 15% occupied
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>> #
>>> [Too many errors, abort]
>>>
>>> Pl suggest how should I go to debug this particular problem
>>>
>>>
>>> -Sagar
>>
>
> Thanks to Brian
>
> -Sagar
Re: Hadoop datanode crashed - SIGBUS
Posted by Sagar Naik <sn...@attributor.com>.
None of the jobs use compression for sure
-Sagar
Brian Bockelman wrote:
> I'd run memcheck overnight on the nodes that caused the problem, just
> to be sure.
>
> Another (unlikely) possibility is that the JNI callouts for the native
> libraries Hadoop use (for the Compression codecs, I believe) have
> crashed or were set up wrong, and died fatally enough to take out the
> JVM. Are you using any compression? Does your job complete
> successfully in "local" mode, if the crash correlates well with a job
> running?
>
> Brian
>
> On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:
>
>>
>>
>> Brian Bockelman wrote:
>>> Hardware/memory problems?
>> I m not sure.
>>>
>>> SIGBUS is relatively rare; it sometimes indicates a hardware error
>>> in the memory system, depending on your arch.
>>>
>> *uname -a : *
>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
>> 2006 i686 i686 i386 GNU/Linux
>> *top's top*
>> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
>> 0.0% si
>> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k buffers
>> Swap: 16386292k total, 68k used, 16386224k free, 522408k cached
>>
>> 8 core , xeon 2GHz
>>
>>> Brian
>>>
>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>
>>>> Couple of the datanodes crashed with the following error
>>>> The /tmp is 15% occupied
>>>>
>>>> #
>>>> # An unexpected error has been detected by Java Runtime Environment:
>>>> #
>>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>> #
>>>> [Too many errors, abort]
>>>>
>>>> Pl suggest how should I go to debug this particular problem
>>>>
>>>>
>>>> -Sagar
>>>
>>
>> Thanks to Brian
>>
>> -Sagar
>
Re: Hadoop datanode crashed - SIGBUS
Posted by Brian Bockelman <bb...@cse.unl.edu>.
I'd run memcheck overnight on the nodes that caused the problem, just
to be sure.
Another (unlikely) possibility is that the JNI callouts for the native
libraries Hadoop use (for the Compression codecs, I believe) have
crashed or were set up wrong, and died fatally enough to take out the
JVM. Are you using any compression? Does your job complete
successfully in "local" mode, if the crash correlates well with a job
running?
Brian
On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:
>
>
> Brian Bockelman wrote:
>> Hardware/memory problems?
> I m not sure.
>>
>> SIGBUS is relatively rare; it sometimes indicates a hardware error
>> in the memory system, depending on your arch.
>>
> *uname -a : *
> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
> 2006 i686 i686 i386 GNU/Linux
> *top's top*
> Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
> 0.0% si
> Mem: 8288280k total, 1575680k used, 6712600k free, 5392k
> buffers
> Swap: 16386292k total, 68k used, 16386224k free, 522408k
> cached
>
> 8 core , xeon 2GHz
>
>> Brian
>>
>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>
>>> Couple of the datanodes crashed with the following error
>>> The /tmp is 15% occupied
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>> #
>>> [Too many errors, abort]
>>>
>>> Pl suggest how should I go to debug this particular problem
>>>
>>>
>>> -Sagar
>>
>
> Thanks to Brian
>
> -Sagar
Re: Hadoop datanode crashed - SIGBUS
Posted by Sagar Naik <sn...@attributor.com>.
Brian Bockelman wrote:
> Hardware/memory problems?
I m not sure.
>
> SIGBUS is relatively rare; it sometimes indicates a hardware error in
> the memory system, depending on your arch.
>
*uname -a : *
Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006
i686 i686 i386 GNU/Linux
*top's top*
Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi, 0.0% si
Mem: 8288280k total, 1575680k used, 6712600k free, 5392k buffers
Swap: 16386292k total, 68k used, 16386224k free, 522408k cached
8 core , xeon 2GHz
> Brian
>
> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>
>> Couple of the datanodes crashed with the following error
>> The /tmp is 15% occupied
>>
>> #
>> # An unexpected error has been detected by Java Runtime Environment:
>> #
>> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>> #
>> [Too many errors, abort]
>>
>> Pl suggest how should I go to debug this particular problem
>>
>>
>> -Sagar
>
Thanks to Brian
-Sagar
Re: Hadoop datanode crashed - SIGBUS
Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hardware/memory problems?
SIGBUS is relatively rare; it sometimes indicates a hardware error in
the memory system, depending on your arch.
Brian
On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
> Couple of the datanodes crashed with the following error
> The /tmp is 15% occupied
>
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> # SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
> #
> [Too many errors, abort]
>
> Pl suggest how should I go to debug this particular problem
>
>
> -Sagar