You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by zward3x <pa...@gmail.com> on 2010/03/07 21:06:45 UTC

Task process exit with nonzero status of 134...

Hi

we are having small job which indexes hbase table with lucene. Map process
will just emit hbase rows and reduce will create lucene index.

In the middle of map process, while reduce jobs doing copy, we are getting
"Task process exit with nonzero status of 134." on reduce side.

I find only one reference
(http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/200907.mbox/%3C4A65872E.8060302@vionto.com%3E)
whit same problem. But, i check and my java version is ok.

Also i cannot find any mention of errors in log. Is there any way to get log
when java process crashed?

best
-- 
View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814144.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Posted by zward3x <pa...@gmail.com>.

Rollback to u17 fixed our problem.

But thanks for information.


Ferdy-2 wrote:
> 
> Hi,
> 
> We have had a lot of these crashes in the past. Random jobs were 
> crashing with error code 134. Our environment is also linux-amd64. We 
> tried all sorts of Hadoop versions,  and JVM deployments, but it did not 
> have any positive effect.
> 
> We finally figured out it was a deep-rooted hardware problem. 
> Communication between different cores of the cpu could get corrupted 
> once and every while. This was due to a bad combination of the 
> mainboard, cpu and/or memory. In our case the problem was solved by 
> replacing all mainboards.
> 
> We could pinpoint and reproduce the problem using the following bash 
> command (run as root):
> 
> while /bin/true; do taskset -c 0 echo -ne 
> '\0272G@\0306\0256yY\0210\0304\0004\0327A\0024\0343\0034\0252\0016V\r\0232\0024\0334\0233\0333\0356\0311A\0367\0375Ewgkk\0253\0373\0351\007%' 
> | taskset -c 2 hexdump -b; done | grep 0000020 | grep -v 351
> 
> If you see any output on the console, it's means your hardware is 
> affected. If you see no output for several minutes (or perhaps one 
> hour), your machine is unlikely to be broken.
> 
> Hope this is of any help to you.
> 
> Ferdy
> 
> zward3x wrote:
>> Thanks for all help.
>>
>> Will install u17, hope that this will resolve issue.
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>>   
>>> As I feared, you use the unholy u18... please revert to u17.
>>>
>>> See this thread for more information:
>>> http://www.mail-archive.com/common-user@hadoop.apache.org/msg04633.html
>>>
>>> J-D
>>>
>>> On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pa...@gmail.com>
>>> wrote:
>>>     
>>>> $ java -version
>>>> java version "1.6.0_18"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>>>>
>>>> there is nothing in stderr, but here is part from stdout
>>>>
>>>> #
>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>> #
>>>> #  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
>>>> #
>>>> # JRE version: 6.0_18-b07
>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>>>> linux-amd64 )
>>>> # Problematic frame:
>>>> # V  [libjvm.so+0x2de34e]
>>>> #
>>>> # An error report file with more information is saved as:
>>>> #
>>>> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
>>>> #
>>>> # If you would like to submit a bug report, please visit:
>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>> #
>>>>
>>>> Also, file which is mentioned above (hs_err_pid12633.log) does not
>>>> exist.
>>>>
>>>>
>>>>
>>>> Jean-Daniel Cryans-2 wrote:
>>>>       
>>>>>> i'm using hadoop 0.20.1 and hbase 0.20.3
>>>>>>           
>>>>> Sorry I meant java version.
>>>>>
>>>>>         
>>>>>> i already try to put
>>>>>>
>>>>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>>>>>
>>>>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find
>>>>>> any
>>>>>> file on that path.
>>>>>>           
>>>>> Todd doesn't talk about that, he said:
>>>>>
>>>>>         
>>>>>> Generally along with a nonzero exit code you should see something in
>>>>>> the stderr for that attempt. If you look on the TaskTracker inside
>>>>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>>>>>> useful?
>>>>>>           
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>       
>>>     
>>
>>   
> 
> 

-- 
View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27827879.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Posted by Ferdy <fe...@kalooga.com>.

Hi,

We have had a lot of these crashes in the past. Random jobs were 
crashing with error code 134. Our environment is also linux-amd64. We 
tried all sorts of Hadoop versions,  and JVM deployments, but it did not 
have any positive effect.

We finally figured out it was a deep-rooted hardware problem. 
Communication between different cores of the cpu could get corrupted 
once and every while. This was due to a bad combination of the 
mainboard, cpu and/or memory. In our case the problem was solved by 
replacing all mainboards.

We could pinpoint and reproduce the problem using the following bash 
command (run as root):

while /bin/true; do taskset -c 0 echo -ne 
'\0272G@\0306\0256yY\0210\0304\0004\0327A\0024\0343\0034\0252\0016V\r\0232\0024\0334\0233\0333\0356\0311A\0367\0375Ewgkk\0253\0373\0351\007%' 
| taskset -c 2 hexdump -b; done | grep 0000020 | grep -v 351

If you see any output on the console, it's means your hardware is 
affected. If you see no output for several minutes (or perhaps one 
hour), your machine is unlikely to be broken.

Hope this is of any help to you.

Ferdy

zward3x wrote:
> Thanks for all help.
>
> Will install u17, hope that this will resolve issue.
>
>
>
> Jean-Daniel Cryans-2 wrote:
>   
>> As I feared, you use the unholy u18... please revert to u17.
>>
>> See this thread for more information:
>> http://www.mail-archive.com/common-user@hadoop.apache.org/msg04633.html
>>
>> J-D
>>
>> On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pa...@gmail.com>
>> wrote:
>>     
>>> $ java -version
>>> java version "1.6.0_18"
>>> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>>>
>>> there is nothing in stderr, but here is part from stdout
>>>
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
>>> #
>>> # JRE version: 6.0_18-b07
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>>> linux-amd64 )
>>> # Problematic frame:
>>> # V  [libjvm.so+0x2de34e]
>>> #
>>> # An error report file with more information is saved as:
>>> #
>>> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>> #
>>>
>>> Also, file which is mentioned above (hs_err_pid12633.log) does not exist.
>>>
>>>
>>>
>>> Jean-Daniel Cryans-2 wrote:
>>>       
>>>>> i'm using hadoop 0.20.1 and hbase 0.20.3
>>>>>           
>>>> Sorry I meant java version.
>>>>
>>>>         
>>>>> i already try to put
>>>>>
>>>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>>>>
>>>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find
>>>>> any
>>>>> file on that path.
>>>>>           
>>>> Todd doesn't talk about that, he said:
>>>>
>>>>         
>>>>> Generally along with a nonzero exit code you should see something in
>>>>> the stderr for that attempt. If you look on the TaskTracker inside
>>>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>>>>> useful?
>>>>>           
>>>>         
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>>       
>>     
>
>

Re: Task process exit with nonzero status of 134...

Posted by zward3x <pa...@gmail.com>.

Thanks for all help.

Will install u17, hope that this will resolve issue.



Jean-Daniel Cryans-2 wrote:
> 
> As I feared, you use the unholy u18... please revert to u17.
> 
> See this thread for more information:
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg04633.html
> 
> J-D
> 
> On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pa...@gmail.com>
> wrote:
>>
>>
>> $ java -version
>> java version "1.6.0_18"
>> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>>
>> there is nothing in stderr, but here is part from stdout
>>
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
>> #
>> # JRE version: 6.0_18-b07
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>> linux-amd64 )
>> # Problematic frame:
>> # V  [libjvm.so+0x2de34e]
>> #
>> # An error report file with more information is saved as:
>> #
>> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> #
>>
>> Also, file which is mentioned above (hs_err_pid12633.log) does not exist.
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>>>
>>>> i'm using hadoop 0.20.1 and hbase 0.20.3
>>>
>>> Sorry I meant java version.
>>>
>>>>
>>>> i already try to put
>>>>
>>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>>>
>>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find
>>>> any
>>>> file on that path.
>>>
>>> Todd doesn't talk about that, he said:
>>>
>>>> Generally along with a nonzero exit code you should see something in
>>>> the stderr for that attempt. If you look on the TaskTracker inside
>>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>>>> useful?
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814883.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Posted by Jean-Daniel Cryans <jd...@apache.org>.

As I feared, you use the unholy u18... please revert to u17.

See this thread for more information:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg04633.html

J-D

On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pa...@gmail.com> wrote:
>
>
> $ java -version
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>
> there is nothing in stderr, but here is part from stdout
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
> #
> # JRE version: 6.0_18-b07
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
> linux-amd64 )
> # Problematic frame:
> # V  [libjvm.so+0x2de34e]
> #
> # An error report file with more information is saved as:
> #
> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
>
> Also, file which is mentioned above (hs_err_pid12633.log) does not exist.
>
>
>
> Jean-Daniel Cryans-2 wrote:
>>
>>> i'm using hadoop 0.20.1 and hbase 0.20.3
>>
>> Sorry I meant java version.
>>
>>>
>>> i already try to put
>>>
>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>>
>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find any
>>> file on that path.
>>
>> Todd doesn't talk about that, he said:
>>
>>> Generally along with a nonzero exit code you should see something in
>>> the stderr for that attempt. If you look on the TaskTracker inside
>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>>> useful?
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Task process exit with nonzero status of 134...

Posted by zward3x <pa...@gmail.com>.


$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

there is nothing in stderr, but here is part from stdout 

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
linux-amd64 )
# Problematic frame:
# V  [libjvm.so+0x2de34e]
#
# An error report file with more information is saved as:
#
/hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Also, file which is mentioned above (hs_err_pid12633.log) does not exist.



Jean-Daniel Cryans-2 wrote:
> 
>> i'm using hadoop 0.20.1 and hbase 0.20.3
> 
> Sorry I meant java version.
> 
>>
>> i already try to put
>>
>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>
>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find any
>> file on that path.
> 
> Todd doesn't talk about that, he said:
> 
>> Generally along with a nonzero exit code you should see something in
>> the stderr for that attempt. If you look on the TaskTracker inside
>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>> useful?
> 
> 

-- 
View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Posted by Jean-Daniel Cryans <jd...@apache.org>.

> i'm using hadoop 0.20.1 and hbase 0.20.3

Sorry I meant java version.

>
> i already try to put
>
> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>
> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find any
> file on that path.

Todd doesn't talk about that, he said:

> Generally along with a nonzero exit code you should see something in
> the stderr for that attempt. If you look on the TaskTracker inside
> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
> useful?

Re: Task process exit with nonzero status of 134...

Posted by zward3x <pa...@gmail.com>.

Hi

i'm using hadoop 0.20.1 and hbase 0.20.3

i already try to put

-XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log

in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find any
file on that path.



Jean-Daniel Cryans-2 wrote:
> 
> Inline.
> 
> J-D
> 
> On Sun, Mar 7, 2010 at 12:06 PM, zward3x <pa...@gmail.com>
> wrote:
>>
>> Hi
>>
>> we are having small job which indexes hbase table with lucene. Map
>> process
>> will just emit hbase rows and reduce will create lucene index.
>>
>> In the middle of map process, while reduce jobs doing copy, we are
>> getting
>> "Task process exit with nonzero status of 134." on reduce side.
>>
>> I find only one reference
>> (http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/200907.mbox/%3C4A65872E.8060302@vionto.com%3E)
>> whit same problem. But, i check and my java version is ok.
> 
> I had these problems in the past because of bad RAM too. BTW which
> version to you use?
> 
>>
>> Also i cannot find any mention of errors in log. Is there any way to get
>> log
>> when java process crashed?
> 
> Yes read Todd's answer in there
> http://74.125.155.132/search?q=cache:bRriMx4QUH4J:mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/200907.mbox/%253C4A6F047E.3040400@vionto.com%253E+mapreduce+code+134&cd=1&hl=en&ct=clnk&gl=us
> 
> J-D
> 
> 

-- 
View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814668.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Inline.

J-D

On Sun, Mar 7, 2010 at 12:06 PM, zward3x <pa...@gmail.com> wrote:
>
> Hi
>
> we are having small job which indexes hbase table with lucene. Map process
> will just emit hbase rows and reduce will create lucene index.
>
> In the middle of map process, while reduce jobs doing copy, we are getting
> "Task process exit with nonzero status of 134." on reduce side.
>
> I find only one reference
> (http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/200907.mbox/%3C4A65872E.8060302@vionto.com%3E)
> whit same problem. But, i check and my java version is ok.

I had these problems in the past because of bad RAM too. BTW which
version to you use?

>
> Also i cannot find any mention of errors in log. Is there any way to get log
> when java process crashed?

Yes read Todd's answer in there
http://74.125.155.132/search?q=cache:bRriMx4QUH4J:mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/200907.mbox/%253C4A6F047E.3040400@vionto.com%253E+mapreduce+code+134&cd=1&hl=en&ct=clnk&gl=us

J-D