You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sunww <sp...@outlook.com> on 2016/04/24 15:07:31 UTC

System auto reboot When MR runs

Hi
    I'm using Hadoop2.7   with  cgroup enabled on Redhat7.1.
    
    When I run  large MR jobs, some nodemanager machine auto reboot. 
    If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine.
    
    /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt  like this:
CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1
...
...
[15770.097168] Call Trace:
[15770.097536]  [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0
[15770.097905]  [<ffffffff81608b97>] __schedule+0x127/0x7c0
[15770.098271]  [<ffffffff81609259>] schedule+0x29/0x70
[15770.098633]  [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130
[15770.098992]  [<ffffffff810d2e09>] futex_wait+0x179/0x280
[15770.099353]  [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80
[15770.099698]  [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10
[15770.100057]  [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0
[15770.100419]  [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80
[15770.100783]  [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0
[15770.101143]  [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160
[15770.101496]  [<ffffffff810d53d0>] SyS_futex+0x80/0x180
[15770.101852]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b


Any suggestion will be appreciated. Thanks 		 	   		  

RE: System auto reboot When MR runs

Posted by sunww <sp...@outlook.com>.
It maybe a kernel bug. 

url is https://bugs.centos.org/print_bug_page.php?bug_id=7770


From: spesun@outlook.com
To: user@hadoop.apache.org
Subject: System auto  reboot When MR runs
Date: Sun, 24 Apr 2016 13:07:31 +0000




Hi
    I'm using Hadoop2.7   with  cgroup enabled on Redhat7.1.
    
    When I run  large MR jobs, some nodemanager machine auto reboot. 
    If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine.
    
    /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt  like this:
CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1
...
...
[15770.097168] Call Trace:
[15770.097536]  [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0
[15770.097905]  [<ffffffff81608b97>] __schedule+0x127/0x7c0
[15770.098271]  [<ffffffff81609259>] schedule+0x29/0x70
[15770.098633]  [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130
[15770.098992]  [<ffffffff810d2e09>] futex_wait+0x179/0x280
[15770.099353]  [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80
[15770.099698]  [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10
[15770.100057]  [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0
[15770.100419]  [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80
[15770.100783]  [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0
[15770.101143]  [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160
[15770.101496]  [<ffffffff810d53d0>] SyS_futex+0x80/0x180
[15770.101852]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b


Any suggestion will be appreciated. Thanks 		 	   		   		 	   		  

RE: System auto reboot When MR runs

Posted by sunww <sp...@outlook.com>.
It maybe a kernel bug. 

url is https://bugs.centos.org/print_bug_page.php?bug_id=7770


From: spesun@outlook.com
To: user@hadoop.apache.org
Subject: System auto  reboot When MR runs
Date: Sun, 24 Apr 2016 13:07:31 +0000




Hi
    I'm using Hadoop2.7   with  cgroup enabled on Redhat7.1.
    
    When I run  large MR jobs, some nodemanager machine auto reboot. 
    If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine.
    
    /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt  like this:
CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1
...
...
[15770.097168] Call Trace:
[15770.097536]  [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0
[15770.097905]  [<ffffffff81608b97>] __schedule+0x127/0x7c0
[15770.098271]  [<ffffffff81609259>] schedule+0x29/0x70
[15770.098633]  [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130
[15770.098992]  [<ffffffff810d2e09>] futex_wait+0x179/0x280
[15770.099353]  [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80
[15770.099698]  [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10
[15770.100057]  [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0
[15770.100419]  [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80
[15770.100783]  [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0
[15770.101143]  [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160
[15770.101496]  [<ffffffff810d53d0>] SyS_futex+0x80/0x180
[15770.101852]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b


Any suggestion will be appreciated. Thanks 		 	   		   		 	   		  

RE: System auto reboot When MR runs

Posted by sunww <sp...@outlook.com>.
It maybe a kernel bug. 

url is https://bugs.centos.org/print_bug_page.php?bug_id=7770


From: spesun@outlook.com
To: user@hadoop.apache.org
Subject: System auto  reboot When MR runs
Date: Sun, 24 Apr 2016 13:07:31 +0000




Hi
    I'm using Hadoop2.7   with  cgroup enabled on Redhat7.1.
    
    When I run  large MR jobs, some nodemanager machine auto reboot. 
    If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine.
    
    /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt  like this:
CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1
...
...
[15770.097168] Call Trace:
[15770.097536]  [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0
[15770.097905]  [<ffffffff81608b97>] __schedule+0x127/0x7c0
[15770.098271]  [<ffffffff81609259>] schedule+0x29/0x70
[15770.098633]  [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130
[15770.098992]  [<ffffffff810d2e09>] futex_wait+0x179/0x280
[15770.099353]  [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80
[15770.099698]  [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10
[15770.100057]  [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0
[15770.100419]  [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80
[15770.100783]  [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0
[15770.101143]  [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160
[15770.101496]  [<ffffffff810d53d0>] SyS_futex+0x80/0x180
[15770.101852]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b


Any suggestion will be appreciated. Thanks 		 	   		   		 	   		  

RE: System auto reboot When MR runs

Posted by sunww <sp...@outlook.com>.
It maybe a kernel bug. 

url is https://bugs.centos.org/print_bug_page.php?bug_id=7770


From: spesun@outlook.com
To: user@hadoop.apache.org
Subject: System auto  reboot When MR runs
Date: Sun, 24 Apr 2016 13:07:31 +0000




Hi
    I'm using Hadoop2.7   with  cgroup enabled on Redhat7.1.
    
    When I run  large MR jobs, some nodemanager machine auto reboot. 
    If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine.
    
    /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt  like this:
CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1
...
...
[15770.097168] Call Trace:
[15770.097536]  [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0
[15770.097905]  [<ffffffff81608b97>] __schedule+0x127/0x7c0
[15770.098271]  [<ffffffff81609259>] schedule+0x29/0x70
[15770.098633]  [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130
[15770.098992]  [<ffffffff810d2e09>] futex_wait+0x179/0x280
[15770.099353]  [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80
[15770.099698]  [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10
[15770.100057]  [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0
[15770.100419]  [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80
[15770.100783]  [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0
[15770.101143]  [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160
[15770.101496]  [<ffffffff810d53d0>] SyS_futex+0x80/0x180
[15770.101852]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b


Any suggestion will be appreciated. Thanks