You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Asaf Mesika <as...@gmail.com> on 2012/07/12 17:09:37 UTC
HDFS + HBASE process high cpu usage
Hi,
I have a cluster of 3 DN/RS and another computer hosting NN/Master.
From some reason, two of the DataNode nodes are showing high load average (~17).
When using "top" I can see HDFS and HBASE processes are the one using the most of the cpu (95% in top).
When inspecting both HDFS and HBASE through JVisualVM on the problematic nodes, I can clearly see that the cpu usage is high.
Any ideas why its happening on those two nodes (and why the 3rd is resting happily)?
All three computers have roughly the same hardware.
The Cluster (both HBASE and HDFS) are not used currently (during my inspection).
Both HDFS and HBASE logs don't show any particular activity.
Any leads on where should I look for more would be appreciated.
Thanks!
Asaf
Re: HDFS + HBASE process high cpu usage
Posted by Asaf Mesika <as...@gmail.com>.
Thanks a lot!
That must have been it.
Unfortunately I couldn't really test this command, since the guys from ops rebooted the entire computer room during maintenance, and it fixed the issue.
(This room is a lab room of course)
Asaf
On Jul 13, 2012, at 4:27 AM, Esteban Gutierrez wrote:
> date -s "`date`"
Re: HDFS + HBASE process high cpu usage
Posted by deanforwever2010 <de...@gmail.com>.
maybe there is some slow query
I met the same problem,I found out that I query 100 thousand columns of a
row, the hbase had no response and stopped working.
2012/7/13 Esteban Gutierrez <es...@cloudera.com>
> Hi Asaf,
>
> By any chance is this issue has been going on in your boxes for the last
> few days? I won't be surprised by so many calls to futex by the JVM itself,
> but since you are giving the same symptoms as the leap second issue it
> would be good to know what OS are you using, if NTP is/was running or not
> and if the boxes have been restarted or not after jul/1. If the leap second
> issue is the cause of this, then just running date -s "`date`" as root wil
> lower the cpu usage.
>
> regards,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
>
>
> On Thu, Jul 12, 2012 at 10:12 AM, Asaf Mesika <as...@gmail.com>
> wrote:
>
> > Just adding more information.
> > The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which
> > ran for 10 seconds. From some reason futex takes 65% of the time.
> >
> > % time seconds usecs/call calls errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> > 65.06 11.097387 103 108084 53662 futex
> > 12.00 2.047692 170641 12 3 restart_syscall
> > 8.73 1.488824 23263 64 accept
> > 6.99 1.192192 5624 212 poll
> > 6.60 1.125829 22517 50 epoll_wait
> > 0.26 0.045039 506 89 close
> > 0.19 0.031703 170 187 sendto
> > 0.04 0.007508 110 68 setsockopt
> > 0.03 0.005558 27 209 recvfrom
> > 0.02 0.003000 375 8 sched_yield
> > 0.02 0.002999 107 28 1 epoll_ctl
> > 0.01 0.002000 125 16 open
> > 0.01 0.001999 167 12 getsockname
> > 0.01 0.001156 36 32 write
> > 0.01 0.001000 100 10 fstat
> > 0.01 0.001000 30 33 fcntl
> > 0.01 0.000999 15 67 dup2
> > 0.00 0.000488 98 5 rt_sigreturn
> > 0.00 0.000350 8 46 10 read
> > 0.00 0.000222 4 51 mprotect
> > 0.00 0.000167 42 4 openat
> > 0.00 0.000092 2 52 stat
> > 0.00 0.000084 2 45 statfs
> > 0.00 0.000074 4 21 mmap
> > 0.00 0.000000 0 9 munmap
> > 0.00 0.000000 0 26 rt_sigprocmask
> > 0.00 0.000000 0 3 ioctl
> > 0.00 0.000000 0 1 pipe
> > 0.00 0.000000 0 5 madvise
> > 0.00 0.000000 0 6 socket
> > 0.00 0.000000 0 6 4 connect
> > 0.00 0.000000 0 1 shutdown
> > 0.00 0.000000 0 3 getsockopt
> > 0.00 0.000000 0 7 clone
> > 0.00 0.000000 0 8 getdents
> > 0.00 0.000000 0 3 getrlimit
> > 0.00 0.000000 0 6 sysinfo
> > 0.00 0.000000 0 7 gettid
> > 0.00 0.000000 0 14 sched_getaffinity
> > 0.00 0.000000 0 1 epoll_create
> > 0.00 0.000000 0 7 set_robust_list
> > ------ ----------- ----------- --------- --------- ----------------
> > 100.00 17.057362 109518 53680 total
> >
> > On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote:
> >
> > > Hi,
> > >
> > > I have a cluster of 3 DN/RS and another computer hosting NN/Master.
> > >
> > > From some reason, two of the DataNode nodes are showing high load
> > average (~17).
> > > When using "top" I can see HDFS and HBASE processes are the one using
> > the most of the cpu (95% in top).
> > >
> > > When inspecting both HDFS and HBASE through JVisualVM on the
> problematic
> > nodes, I can clearly see that the cpu usage is high.
> > >
> > > Any ideas why its happening on those two nodes (and why the 3rd is
> > resting happily)?
> > >
> > > All three computers have roughly the same hardware.
> > > The Cluster (both HBASE and HDFS) are not used currently (during my
> > inspection).
> > >
> > > Both HDFS and HBASE logs don't show any particular activity.
> > >
> > >
> > > Any leads on where should I look for more would be appreciated.
> > >
> > >
> > > Thanks!
> > >
> > > Asaf
> > >
> >
> >
>
Re: HDFS + HBASE process high cpu usage
Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Asaf,
By any chance is this issue has been going on in your boxes for the last
few days? I won't be surprised by so many calls to futex by the JVM itself,
but since you are giving the same symptoms as the leap second issue it
would be good to know what OS are you using, if NTP is/was running or not
and if the boxes have been restarted or not after jul/1. If the leap second
issue is the cause of this, then just running date -s "`date`" as root wil
lower the cpu usage.
regards,
esteban.
--
Cloudera, Inc.
On Thu, Jul 12, 2012 at 10:12 AM, Asaf Mesika <as...@gmail.com> wrote:
> Just adding more information.
> The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which
> ran for 10 seconds. From some reason futex takes 65% of the time.
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 65.06 11.097387 103 108084 53662 futex
> 12.00 2.047692 170641 12 3 restart_syscall
> 8.73 1.488824 23263 64 accept
> 6.99 1.192192 5624 212 poll
> 6.60 1.125829 22517 50 epoll_wait
> 0.26 0.045039 506 89 close
> 0.19 0.031703 170 187 sendto
> 0.04 0.007508 110 68 setsockopt
> 0.03 0.005558 27 209 recvfrom
> 0.02 0.003000 375 8 sched_yield
> 0.02 0.002999 107 28 1 epoll_ctl
> 0.01 0.002000 125 16 open
> 0.01 0.001999 167 12 getsockname
> 0.01 0.001156 36 32 write
> 0.01 0.001000 100 10 fstat
> 0.01 0.001000 30 33 fcntl
> 0.01 0.000999 15 67 dup2
> 0.00 0.000488 98 5 rt_sigreturn
> 0.00 0.000350 8 46 10 read
> 0.00 0.000222 4 51 mprotect
> 0.00 0.000167 42 4 openat
> 0.00 0.000092 2 52 stat
> 0.00 0.000084 2 45 statfs
> 0.00 0.000074 4 21 mmap
> 0.00 0.000000 0 9 munmap
> 0.00 0.000000 0 26 rt_sigprocmask
> 0.00 0.000000 0 3 ioctl
> 0.00 0.000000 0 1 pipe
> 0.00 0.000000 0 5 madvise
> 0.00 0.000000 0 6 socket
> 0.00 0.000000 0 6 4 connect
> 0.00 0.000000 0 1 shutdown
> 0.00 0.000000 0 3 getsockopt
> 0.00 0.000000 0 7 clone
> 0.00 0.000000 0 8 getdents
> 0.00 0.000000 0 3 getrlimit
> 0.00 0.000000 0 6 sysinfo
> 0.00 0.000000 0 7 gettid
> 0.00 0.000000 0 14 sched_getaffinity
> 0.00 0.000000 0 1 epoll_create
> 0.00 0.000000 0 7 set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 17.057362 109518 53680 total
>
> On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote:
>
> > Hi,
> >
> > I have a cluster of 3 DN/RS and another computer hosting NN/Master.
> >
> > From some reason, two of the DataNode nodes are showing high load
> average (~17).
> > When using "top" I can see HDFS and HBASE processes are the one using
> the most of the cpu (95% in top).
> >
> > When inspecting both HDFS and HBASE through JVisualVM on the problematic
> nodes, I can clearly see that the cpu usage is high.
> >
> > Any ideas why its happening on those two nodes (and why the 3rd is
> resting happily)?
> >
> > All three computers have roughly the same hardware.
> > The Cluster (both HBASE and HDFS) are not used currently (during my
> inspection).
> >
> > Both HDFS and HBASE logs don't show any particular activity.
> >
> >
> > Any leads on where should I look for more would be appreciated.
> >
> >
> > Thanks!
> >
> > Asaf
> >
>
>
Re: HDFS + HBASE process high cpu usage
Posted by Asaf Mesika <as...@gmail.com>.
Just adding more information.
The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which ran for 10 seconds. From some reason futex takes 65% of the time.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.06 11.097387 103 108084 53662 futex
12.00 2.047692 170641 12 3 restart_syscall
8.73 1.488824 23263 64 accept
6.99 1.192192 5624 212 poll
6.60 1.125829 22517 50 epoll_wait
0.26 0.045039 506 89 close
0.19 0.031703 170 187 sendto
0.04 0.007508 110 68 setsockopt
0.03 0.005558 27 209 recvfrom
0.02 0.003000 375 8 sched_yield
0.02 0.002999 107 28 1 epoll_ctl
0.01 0.002000 125 16 open
0.01 0.001999 167 12 getsockname
0.01 0.001156 36 32 write
0.01 0.001000 100 10 fstat
0.01 0.001000 30 33 fcntl
0.01 0.000999 15 67 dup2
0.00 0.000488 98 5 rt_sigreturn
0.00 0.000350 8 46 10 read
0.00 0.000222 4 51 mprotect
0.00 0.000167 42 4 openat
0.00 0.000092 2 52 stat
0.00 0.000084 2 45 statfs
0.00 0.000074 4 21 mmap
0.00 0.000000 0 9 munmap
0.00 0.000000 0 26 rt_sigprocmask
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 1 pipe
0.00 0.000000 0 5 madvise
0.00 0.000000 0 6 socket
0.00 0.000000 0 6 4 connect
0.00 0.000000 0 1 shutdown
0.00 0.000000 0 3 getsockopt
0.00 0.000000 0 7 clone
0.00 0.000000 0 8 getdents
0.00 0.000000 0 3 getrlimit
0.00 0.000000 0 6 sysinfo
0.00 0.000000 0 7 gettid
0.00 0.000000 0 14 sched_getaffinity
0.00 0.000000 0 1 epoll_create
0.00 0.000000 0 7 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 17.057362 109518 53680 total
On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote:
> Hi,
>
> I have a cluster of 3 DN/RS and another computer hosting NN/Master.
>
> From some reason, two of the DataNode nodes are showing high load average (~17).
> When using "top" I can see HDFS and HBASE processes are the one using the most of the cpu (95% in top).
>
> When inspecting both HDFS and HBASE through JVisualVM on the problematic nodes, I can clearly see that the cpu usage is high.
>
> Any ideas why its happening on those two nodes (and why the 3rd is resting happily)?
>
> All three computers have roughly the same hardware.
> The Cluster (both HBASE and HDFS) are not used currently (during my inspection).
>
> Both HDFS and HBASE logs don't show any particular activity.
>
>
> Any leads on where should I look for more would be appreciated.
>
>
> Thanks!
>
> Asaf
>