You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Doug Meil <do...@explorysmedical.com> on 2012/01/11 22:37:11 UTC

HBASE-5073 impact...

Hi dev-list,

With respect to HBASE-5073 and invoking the admin API and producing
slowdowns, was the workaround (without the patch) that the client be
restarted, or the entire cluster?  I see the patch has been back-ported to
90.6 but I wanted to doc this if it was warranted.

Also, regarding...

"As Lars mentioned admin apis like flush and compact will also slow down
the client."

... in terms of "slowing down the client", is this referring to the fact
that subsequent requests will have to content with increased activity on
RegionServers (e.g., due to compaction and the file-writing) will
experience?  Or is there something else going on?

Again, wanted to doc this if it was warranted.



On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan"
<ra...@huawei.com> wrote:

>As Lars mentioned admin apis like flush and compact will also slow down
>the client.
>As part of restart of HBase cluster, clients are also restarted?
>
>Regards
>Ram
>
>-----Original Message-----
>From: Lars H [mailto:lhofhansl@yahoo.com]
>Sent: Tuesday, December 27, 2011 10:02 PM
>To: user@hbase.apache.org
>Cc: hbase-user@hadoop.apache.org
>Subject: Re: Read speed down after long running
>
>When you restart HBase are you also restarting the client process?
>Are you using HBaseAdmin.tableExists?
>If so you might be running into HBASE-5073
>
>-- Lars
>
>Yi Liang <wh...@gmail.com> schrieb:
>
>>Hi all,
>>
>>We're running hbase 0.90.3 for one read intensive application.
>>
>>We find after long running(2 weeks or 1 month or longer), the read speed
>>will become much lower.
>>
>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>size
>>every row) could take >2 second, sometimes even >5 seconds. When it
>>happens, we can see cpu_wio keeps at about 10.
>>
>>But if we restart hbase(only master and regionservers) with stop-hbase.sh
>>and start-hbase.sh, we can see the read speed back to normal immediately,
>>which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>about 2.
>>
>>When the problem appears, there's no exception in logs, and no
>>flush/compaction, nothing abnormal except a few warning logs sometimes
>>like
>>below:
>>2011-12-27 15:50:20,307 WARN
>>org.apache.hadoop.hbase.regionserver.wal.HLog:
>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>editcount=1, len~=9.8k
>>
>>Our cluster has 10 region servers, each with 25g heap size, 64% of which
>>used for cache. The're some m/r jobs keep running in another cluster to
>>feed data into the this hbase. Every night, we do flush and major
>>compaction. Usually there's no flush or compaction in the daytime.
>>
>>Could anybody explain why the read speed could become lower after long
>>running, and why it back to normal immediately after restarting hbase?
>>
>>Every advice will be highly appreciated.
>>
>>Thanks,
>>Yi
>
>

RE: HBASE-5073 impact...

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.

Hi Doug

As you said HBASE-5073 ensures that no client leaks happen.  It is a problem
in the client and it is not a reflection of RS activity.  So people using
version prior to 0.90.6 needs to do some workaround w.r.t to client and not
the RS.

Hope am answering your query.  Thanks for the follow up.

Regards
Ram

-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Tuesday, January 17, 2012 7:06 PM
To: dev@hbase.apache.org
Subject: Re: HBASE-5073 impact...


Hi folks, I just want to follow-up on this one more time.

Is there anything funky happening in the client that "slows things down"
when these methods are called, or is it a reflection of RS activity?



On 1/11/12 4:37 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi dev-list,
>
>With respect to HBASE-5073 and invoking the admin API and producing
>slowdowns, was the workaround (without the patch) that the client be
>restarted, or the entire cluster?  I see the patch has been back-ported to
>90.6 but I wanted to doc this if it was warranted.
>
>Also, regarding...
>
>"As Lars mentioned admin apis like flush and compact will also slow down
>the client."
>
>... in terms of "slowing down the client", is this referring to the fact
>that subsequent requests will have to content with increased activity on
>RegionServers (e.g., due to compaction and the file-writing) will
>experience?  Or is there something else going on?
>
>Again, wanted to doc this if it was warranted.
>
>
>
>On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan"
><ra...@huawei.com> wrote:
>
>>As Lars mentioned admin apis like flush and compact will also slow down
>>the client.
>>As part of restart of HBase cluster, clients are also restarted?
>>
>>Regards
>>Ram
>>
>>-----Original Message-----
>>From: Lars H [mailto:lhofhansl@yahoo.com]
>>Sent: Tuesday, December 27, 2011 10:02 PM
>>To: user@hbase.apache.org
>>Cc: hbase-user@hadoop.apache.org
>>Subject: Re: Read speed down after long running
>>
>>When you restart HBase are you also restarting the client process?
>>Are you using HBaseAdmin.tableExists?
>>If so you might be running into HBASE-5073
>>
>>-- Lars
>>
>>Yi Liang <wh...@gmail.com> schrieb:
>>
>>>Hi all,
>>>
>>>We're running hbase 0.90.3 for one read intensive application.
>>>
>>>We find after long running(2 weeks or 1 month or longer), the read speed
>>>will become much lower.
>>>
>>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>>size
>>>every row) could take >2 second, sometimes even >5 seconds. When it
>>>happens, we can see cpu_wio keeps at about 10.
>>>
>>>But if we restart hbase(only master and regionservers) with
>>>stop-hbase.sh
>>>and start-hbase.sh, we can see the read speed back to normal
>>>immediately,
>>>which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>>about 2.
>>>
>>>When the problem appears, there's no exception in logs, and no
>>>flush/compaction, nothing abnormal except a few warning logs sometimes
>>>like
>>>below:
>>>2011-12-27 15:50:20,307 WARN
>>>org.apache.hadoop.hbase.regionserver.wal.HLog:
>>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>>editcount=1, len~=9.8k
>>>
>>>Our cluster has 10 region servers, each with 25g heap size, 64% of which
>>>used for cache. The're some m/r jobs keep running in another cluster to
>>>feed data into the this hbase. Every night, we do flush and major
>>>compaction. Usually there's no flush or compaction in the daytime.
>>>
>>>Could anybody explain why the read speed could become lower after long
>>>running, and why it back to normal immediately after restarting hbase?
>>>
>>>Every advice will be highly appreciated.
>>>
>>>Thanks,
>>>Yi
>>
>>
>
>
>

Re: HBASE-5073 impact...

Posted by Doug Meil <do...@explorysmedical.com>.

Ahhh...  This truly was a client-side problem.  Thanks for the
clarification!





On 1/17/12 4:16 PM, "lars hofhansl" <lh...@yahoo.com> wrote:

>What happened before HBASE-5073 in 0.90.x, was that the ZK watcher (at
>the client) would pile on more and more listeners.
>On each ZK event these listeners are executed, slowing down the client
>eventually, in addition the listeners are prevented from being garbage
>collected creating a memory leak.
>
>
>So it's client only, the RSs are not affected by this.
>
>
>----- Original Message -----
>From: Doug Meil <do...@explorysmedical.com>
>To: "dev@hbase.apache.org" <de...@hbase.apache.org>
>Cc: 
>Sent: Tuesday, January 17, 2012 5:36 AM
>Subject: Re: HBASE-5073 impact...
>
>
>Hi folks, I just want to follow-up on this one more time.
>
>Is there anything funky happening in the client that "slows things down"
>when these methods are called, or is it a reflection of RS activity?
>
>
>
>On 1/11/12 4:37 PM, "Doug Meil" <do...@explorysmedical.com> wrote:
>
>>
>>Hi dev-list,
>>
>>With respect to HBASE-5073 and invoking the admin API and producing
>>slowdowns, was the workaround (without the patch) that the client be
>>restarted, or the entire cluster?  I see the patch has been back-ported
>>to
>>90.6 but I wanted to doc this if it was warranted.
>>
>>Also, regarding...
>>
>>"As Lars mentioned admin apis like flush and compact will also slow down
>>the client."
>>
>>... in terms of "slowing down the client", is this referring to the fact
>>that subsequent requests will have to content with increased activity on
>>RegionServers (e.g., due to compaction and the file-writing) will
>>experience?  Or is there something else going on?
>>
>>Again, wanted to doc this if it was warranted.
>>
>>
>>
>>On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan"
>><ra...@huawei.com> wrote:
>>
>>>As Lars mentioned admin apis like flush and compact will also slow down
>>>the client.
>>>As part of restart of HBase cluster, clients are also restarted?
>>>
>>>Regards
>>>Ram
>>>
>>>-----Original Message-----
>>>From: Lars H [mailto:lhofhansl@yahoo.com]
>>>Sent: Tuesday, December 27, 2011 10:02 PM
>>>To: user@hbase.apache.org
>>>Cc: hbase-user@hadoop.apache.org
>>>Subject: Re: Read speed down after long running
>>>
>>>When you restart HBase are you also restarting the client process?
>>>Are you using HBaseAdmin.tableExists?
>>>If so you might be running into HBASE-5073
>>>
>>>-- Lars
>>>
>>>Yi Liang <wh...@gmail.com> schrieb:
>>>
>>>>Hi all,
>>>>
>>>>We're running hbase 0.90.3 for one read intensive application.
>>>>
>>>>We find after long running(2 weeks or 1 month or longer), the read
>>>>speed
>>>>will become much lower.
>>>>
>>>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>>>size
>>>>every row) could take >2 second, sometimes even >5 seconds. When it
>>>>happens, we can see cpu_wio keeps at about 10.
>>>>
>>>>But if we restart hbase(only master and regionservers) with
>>>>stop-hbase.sh
>>>>and start-hbase.sh, we can see the read speed back to normal
>>>>immediately,
>>>>which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>>>about 2.
>>>>
>>>>When the problem appears, there's no exception in logs, and no
>>>>flush/compaction, nothing abnormal except a few warning logs sometimes
>>>>like
>>>>below:
>>>>2011-12-27 15:50:20,307 WARN
>>>>org.apache.hadoop.hbase.regionserver.wal.HLog:
>>>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>>>editcount=1, len~=9.8k
>>>>
>>>>Our cluster has 10 region servers, each with 25g heap size, 64% of
>>>>which
>>>>used for cache. The're some m/r jobs keep running in another cluster to
>>>>feed data into the this hbase. Every night, we do flush and major
>>>>compaction. Usually there's no flush or compaction in the daytime.
>>>>
>>>>Could anybody explain why the read speed could become lower after long
>>>>running, and why it back to normal immediately after restarting hbase?
>>>>
>>>>Every advice will be highly appreciated.
>>>>
>>>>Thanks,
>>>>Yi
>>>
>>>
>>
>>
>>
>

Re: HBASE-5073 impact...

Posted by lars hofhansl <lh...@yahoo.com>.

What happened before HBASE-5073 in 0.90.x, was that the ZK watcher (at the client) would pile on more and more listeners.
On each ZK event these listeners are executed, slowing down the client eventually, in addition the listeners are prevented from being garbage collected creating a memory leak.


So it's client only, the RSs are not affected by this.


----- Original Message -----
From: Doug Meil <do...@explorysmedical.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc: 
Sent: Tuesday, January 17, 2012 5:36 AM
Subject: Re: HBASE-5073 impact...


Hi folks, I just want to follow-up on this one more time.

Is there anything funky happening in the client that "slows things down"
when these methods are called, or is it a reflection of RS activity?



On 1/11/12 4:37 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi dev-list,
>
>With respect to HBASE-5073 and invoking the admin API and producing
>slowdowns, was the workaround (without the patch) that the client be
>restarted, or the entire cluster?  I see the patch has been back-ported to
>90.6 but I wanted to doc this if it was warranted.
>
>Also, regarding...
>
>"As Lars mentioned admin apis like flush and compact will also slow down
>the client."
>
>... in terms of "slowing down the client", is this referring to the fact
>that subsequent requests will have to content with increased activity on
>RegionServers (e.g., due to compaction and the file-writing) will
>experience?  Or is there something else going on?
>
>Again, wanted to doc this if it was warranted.
>
>
>
>On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan"
><ra...@huawei.com> wrote:
>
>>As Lars mentioned admin apis like flush and compact will also slow down
>>the client.
>>As part of restart of HBase cluster, clients are also restarted?
>>
>>Regards
>>Ram
>>
>>-----Original Message-----
>>From: Lars H [mailto:lhofhansl@yahoo.com]
>>Sent: Tuesday, December 27, 2011 10:02 PM
>>To: user@hbase.apache.org
>>Cc: hbase-user@hadoop.apache.org
>>Subject: Re: Read speed down after long running
>>
>>When you restart HBase are you also restarting the client process?
>>Are you using HBaseAdmin.tableExists?
>>If so you might be running into HBASE-5073
>>
>>-- Lars
>>
>>Yi Liang <wh...@gmail.com> schrieb:
>>
>>>Hi all,
>>>
>>>We're running hbase 0.90.3 for one read intensive application.
>>>
>>>We find after long running(2 weeks or 1 month or longer), the read speed
>>>will become much lower.
>>>
>>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>>size
>>>every row) could take >2 second, sometimes even >5 seconds. When it
>>>happens, we can see cpu_wio keeps at about 10.
>>>
>>>But if we restart hbase(only master and regionservers) with
>>>stop-hbase.sh
>>>and start-hbase.sh, we can see the read speed back to normal
>>>immediately,
>>>which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>>about 2.
>>>
>>>When the problem appears, there's no exception in logs, and no
>>>flush/compaction, nothing abnormal except a few warning logs sometimes
>>>like
>>>below:
>>>2011-12-27 15:50:20,307 WARN
>>>org.apache.hadoop.hbase.regionserver.wal.HLog:
>>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>>editcount=1, len~=9.8k
>>>
>>>Our cluster has 10 region servers, each with 25g heap size, 64% of which
>>>used for cache. The're some m/r jobs keep running in another cluster to
>>>feed data into the this hbase. Every night, we do flush and major
>>>compaction. Usually there's no flush or compaction in the daytime.
>>>
>>>Could anybody explain why the read speed could become lower after long
>>>running, and why it back to normal immediately after restarting hbase?
>>>
>>>Every advice will be highly appreciated.
>>>
>>>Thanks,
>>>Yi
>>
>>
>
>
>

Re: HBASE-5073 impact...

Posted by Doug Meil <do...@explorysmedical.com>.

Hi folks, I just want to follow-up on this one more time.

Is there anything funky happening in the client that "slows things down"
when these methods are called, or is it a reflection of RS activity?



On 1/11/12 4:37 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi dev-list,
>
>With respect to HBASE-5073 and invoking the admin API and producing
>slowdowns, was the workaround (without the patch) that the client be
>restarted, or the entire cluster?  I see the patch has been back-ported to
>90.6 but I wanted to doc this if it was warranted.
>
>Also, regarding...
>
>"As Lars mentioned admin apis like flush and compact will also slow down
>the client."
>
>... in terms of "slowing down the client", is this referring to the fact
>that subsequent requests will have to content with increased activity on
>RegionServers (e.g., due to compaction and the file-writing) will
>experience?  Or is there something else going on?
>
>Again, wanted to doc this if it was warranted.
>
>
>
>On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan"
><ra...@huawei.com> wrote:
>
>>As Lars mentioned admin apis like flush and compact will also slow down
>>the client.
>>As part of restart of HBase cluster, clients are also restarted?
>>
>>Regards
>>Ram
>>
>>-----Original Message-----
>>From: Lars H [mailto:lhofhansl@yahoo.com]
>>Sent: Tuesday, December 27, 2011 10:02 PM
>>To: user@hbase.apache.org
>>Cc: hbase-user@hadoop.apache.org
>>Subject: Re: Read speed down after long running
>>
>>When you restart HBase are you also restarting the client process?
>>Are you using HBaseAdmin.tableExists?
>>If so you might be running into HBASE-5073
>>
>>-- Lars
>>
>>Yi Liang <wh...@gmail.com> schrieb:
>>
>>>Hi all,
>>>
>>>We're running hbase 0.90.3 for one read intensive application.
>>>
>>>We find after long running(2 weeks or 1 month or longer), the read speed
>>>will become much lower.
>>>
>>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>>>size
>>>every row) could take >2 second, sometimes even >5 seconds. When it
>>>happens, we can see cpu_wio keeps at about 10.
>>>
>>>But if we restart hbase(only master and regionservers) with
>>>stop-hbase.sh
>>>and start-hbase.sh, we can see the read speed back to normal
>>>immediately,
>>>which is <200 ms for every get_rows operation, and the cpu_wio drops to
>>>about 2.
>>>
>>>When the problem appears, there's no exception in logs, and no
>>>flush/compaction, nothing abnormal except a few warning logs sometimes
>>>like
>>>below:
>>>2011-12-27 15:50:20,307 WARN
>>>org.apache.hadoop.hbase.regionserver.wal.HLog:
>>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>>>editcount=1, len~=9.8k
>>>
>>>Our cluster has 10 region servers, each with 25g heap size, 64% of which
>>>used for cache. The're some m/r jobs keep running in another cluster to
>>>feed data into the this hbase. Every night, we do flush and major
>>>compaction. Usually there's no flush or compaction in the daytime.
>>>
>>>Could anybody explain why the read speed could become lower after long
>>>running, and why it back to normal immediately after restarting hbase?
>>>
>>>Every advice will be highly appreciated.
>>>
>>>Thanks,
>>>Yi
>>
>>
>
>
>