You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Tuttle <te...@mentacapital.com> on 2014/08/14 01:38:11 UTC

HBase client hangs after client-side OOM

Hello-

We are running HBase v0.94.16 on an 8 node cluster.

We have a recurring problem w/ HBase clients hanging.  In latest occurrence, I observed the following sequence of events:

0) client plays w/ HBase for a long time w/o issue
1) client runs out of memory during HBase operation:

                http://pastebin.com/b5x44Lx7

3) Exception is thrown, memory is released
2) In some shutdown logic the client tries to access HBase again and hangs:

                http://pastebin.com/xU4MSq9k

Clearly I need to fix OOM.  However, the fact that client hangs is not nice.  Any ideas why?

BTW- I started by looking at zookeeper log. Not much there but here you go:

                http://pastebin.com/wZvE0Fbv

Thanks,
Ted


Re: HBase client hangs after client-side OOM

Posted by Qiang Tian <ti...@gmail.com>.
this is an interesting case. since sendthread is running fine, it caught
the error and called cleanup() correctly? so it looks the packet is not in
outgoingqueue nor in pendingqueue? perhaps 3.4.5 might have the issue has
well...a timeout could help this?



On Fri, Aug 15, 2014 at 6:30 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Hello Ted,
>
> ZooKeeper 3.4.5 is the recommended release to use in HBase 0.94.x,
> regarding compatibility across ZooKeeper releases I don't think there is
> any issue, but the ZK devs might be able to confirm.
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
>
> On Thu, Aug 14, 2014 at 3:19 PM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hello All-
> >
> > It sounds like upgrading our zookeeper client would be a good idea. Can
> > anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK
> > 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4?
>  I've
> > read a few contradictory things about ZK client/server compatibility
> across
> > 3.3/3.4 releases.
> >
> > Thanks,
> > Ted
> >
> > -----Original Message-----
> > From: Ted Tuttle [mailto:ted@mentacapital.com]
> > Sent: Thursday, August 14, 2014 12:43 PM
> > To: user@hbase.apache.org
> > Cc: dev@zookeeper.apache.org
> > Subject: RE: HBase client hangs after client-side OOM
> >
> > Hello Esteban-
> >
> > At the time of the ZK connection problems the client had an OOM event.
> > However, the client machine overall was in fine shape looking at ganglia
> > reports;  it certainly wasn't swapping or spending significant cycles on
> > I/O wait.
> >
> > Similarly, our zookeeper server was real chilled as it always is.
> >
> > Regarding client configuration:
> >
> > <property>
> >     <!--Loaded from hbase-default.xml-->
> >     <name>hbase.client.pause</name>
> >     <value>1000</value>
> > </property>
> >
> > Thanks,
> > Ted
> >
> > -----Original Message-----
> > From: Esteban Gutierrez [mailto:esteban@cloudera.com]
> > Sent: Thursday, August 14, 2014 10:47 AM
> > To: user@hbase.apache.org
> > Cc: dev@zookeeper.apache.org
> > Subject: Re: HBase client hangs after client-side OOM
> >
> > Hi Ted,
> >
> > I've see this kind of client "hangs" few times when the underlying
> > environment is under heavy swapping and with older versions of ZK as
> Rakesh
> > mentioned, also when hbase.client.pause is set to 0. Do you know if your
> > environment is experiencing a similar behavior with heavy IO due
> swapping ?
> > can you also share your client configuration too?
> >
> > cheers,
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> >
> > On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com>
> wrote:
> >
> > > The client-side thread dump in here:
> > >
> > > http://pastebin.com/xU4MSq9k
> > >
> > > SendThread appears to be active.
> > >
> > > -----Original Message-----
> > > From: Rakesh R [mailto:rakeshr@huawei.com]
> > > Sent: Thursday, August 14, 2014 7:01 AM
> > > To: dev@zookeeper.apache.org; user@hbase.apache.org
> > > Subject: RE: HBase client hangs after client-side OOM
> > >
> > > Hi,
> > >
> > > >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> > >
> > > ZK version is quite old. I could see ClientCnxn is only catching
> > > IOException and when there is OOME it will exit SendThread.
> > > I think, thats the reason for client hanging. Client side threaddump
> > > will help us to see the liveliness of SendThread.
> > >
> > > Client side exception handling has been modified in 3.4 & 3.5 branches.
> > > Can you check the possibility of upgrading to 3.4.6 latest release.
> > >
> > > Regards,
> > > Rakesh
> > >
> > > -----Original Message-----
> > > From: Qiang Tian [mailto:tianq01@gmail.com]
> > > Sent: 14 August 2014 11:03
> > > To: user@hbase.apache.org; dev@zookeeper.apache.org
> > > Subject: Re: HBase client hangs after client-side OOM
> > >
> > > the sendthread stacktrace looks not correct. Do you have the client
> log?
> > > (in case zk client code log sth there) from the zk code, it looks
> > > ClientCnxn$SendThread.run should have caught
> > > it(throwable) and done the cleanup work, e.g. notify the main thread,
> > > so that it can wake up from ClientCnxn.submitRequest..
> > >
> > > send to Zookeeper for help.
> > > thanks.
> > >
> > >
> > >
> > > On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com>
> > wrote:
> > >
> > > > Hi Lars-
> > > >
> > > > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> > > >
> > > > Thanks,
> > > > Ted
> > > >
> > > > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org>
> > wrote:
> > > > >
> > > > > Hey Ted,
> > > > >
> > > > > so this is a problem with the ZK client, it seems to not clean
> > > > > itself up
> > > > correctly upon receiving an exception at the wrong moment.
> > > > > Which version of ZK are you using?
> > > > >
> > > > >
> > > > > -- Lars
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > From: Ted Tuttle <te...@mentacapital.com>
> > > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > > Cc: Development <De...@mentacapital.com>
> > > > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > > > Subject: HBase client hangs after client-side OOM
> > > > >
> > > > > Hello-
> > > > >
> > > > > We are running HBase v0.94.16 on an 8 node cluster.
> > > > >
> > > > > We have a recurring problem w/ HBase clients hanging.  In latest
> > > > occurrence, I observed the following sequence of events:
> > > > >
> > > > > 0) client plays w/ HBase for a long time w/o issue
> > > > > 1) client runs out of memory during HBase operation:
> > > > >
> > > > >                 http://pastebin.com/b5x44Lx7
> > > > >
> > > > > 3) Exception is thrown, memory is released
> > > > > 2) In some shutdown logic the client tries to access HBase again
> > > > > and
> > > > hangs:
> > > > >
> > > > >                 http://pastebin.com/xU4MSq9k
> > > > >
> > > > > Clearly I need to fix OOM.  However, the fact that client hangs is
> > > > > not
> > > > nice.  Any ideas why?
> > > > >
> > > > > BTW- I started by looking at zookeeper log. Not much there but
> > > > > here you
> > > > go:
> > > > >
> > > > >                 http://pastebin.com/wZvE0Fbv
> > > > >
> > > > > Thanks,
> > > > > Ted
> > > > >
> > > >
> > >
> >
>

Re: HBase client hangs after client-side OOM

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hello Ted,

ZooKeeper 3.4.5 is the recommended release to use in HBase 0.94.x,
regarding compatibility across ZooKeeper releases I don't think there is
any issue, but the ZK devs might be able to confirm.

cheers,
esteban.


--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 3:19 PM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hello All-
>
> It sounds like upgrading our zookeeper client would be a good idea. Can
> anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK
> 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4?  I've
> read a few contradictory things about ZK client/server compatibility across
> 3.3/3.4 releases.
>
> Thanks,
> Ted
>
> -----Original Message-----
> From: Ted Tuttle [mailto:ted@mentacapital.com]
> Sent: Thursday, August 14, 2014 12:43 PM
> To: user@hbase.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hello Esteban-
>
> At the time of the ZK connection problems the client had an OOM event.
> However, the client machine overall was in fine shape looking at ganglia
> reports;  it certainly wasn't swapping or spending significant cycles on
> I/O wait.
>
> Similarly, our zookeeper server was real chilled as it always is.
>
> Regarding client configuration:
>
> <property>
>     <!--Loaded from hbase-default.xml-->
>     <name>hbase.client.pause</name>
>     <value>1000</value>
> </property>
>
> Thanks,
> Ted
>
> -----Original Message-----
> From: Esteban Gutierrez [mailto:esteban@cloudera.com]
> Sent: Thursday, August 14, 2014 10:47 AM
> To: user@hbase.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> Hi Ted,
>
> I've see this kind of client "hangs" few times when the underlying
> environment is under heavy swapping and with older versions of ZK as Rakesh
> mentioned, also when hbase.client.pause is set to 0. Do you know if your
> environment is experiencing a similar behavior with heavy IO due swapping ?
> can you also share your client configuration too?
>
> cheers,
> esteban.
>
> --
> Cloudera, Inc.
>
>
>
> On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > The client-side thread dump in here:
> >
> > http://pastebin.com/xU4MSq9k
> >
> > SendThread appears to be active.
> >
> > -----Original Message-----
> > From: Rakesh R [mailto:rakeshr@huawei.com]
> > Sent: Thursday, August 14, 2014 7:01 AM
> > To: dev@zookeeper.apache.org; user@hbase.apache.org
> > Subject: RE: HBase client hangs after client-side OOM
> >
> > Hi,
> >
> > >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > ZK version is quite old. I could see ClientCnxn is only catching
> > IOException and when there is OOME it will exit SendThread.
> > I think, thats the reason for client hanging. Client side threaddump
> > will help us to see the liveliness of SendThread.
> >
> > Client side exception handling has been modified in 3.4 & 3.5 branches.
> > Can you check the possibility of upgrading to 3.4.6 latest release.
> >
> > Regards,
> > Rakesh
> >
> > -----Original Message-----
> > From: Qiang Tian [mailto:tianq01@gmail.com]
> > Sent: 14 August 2014 11:03
> > To: user@hbase.apache.org; dev@zookeeper.apache.org
> > Subject: Re: HBase client hangs after client-side OOM
> >
> > the sendthread stacktrace looks not correct. Do you have the client log?
> > (in case zk client code log sth there) from the zk code, it looks
> > ClientCnxn$SendThread.run should have caught
> > it(throwable) and done the cleanup work, e.g. notify the main thread,
> > so that it can wake up from ClientCnxn.submitRequest..
> >
> > send to Zookeeper for help.
> > thanks.
> >
> >
> >
> > On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com>
> wrote:
> >
> > > Hi Lars-
> > >
> > > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> > >
> > > Thanks,
> > > Ted
> > >
> > > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org>
> wrote:
> > > >
> > > > Hey Ted,
> > > >
> > > > so this is a problem with the ZK client, it seems to not clean
> > > > itself up
> > > correctly upon receiving an exception at the wrong moment.
> > > > Which version of ZK are you using?
> > > >
> > > >
> > > > -- Lars
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Ted Tuttle <te...@mentacapital.com>
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > Cc: Development <De...@mentacapital.com>
> > > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > > Subject: HBase client hangs after client-side OOM
> > > >
> > > > Hello-
> > > >
> > > > We are running HBase v0.94.16 on an 8 node cluster.
> > > >
> > > > We have a recurring problem w/ HBase clients hanging.  In latest
> > > occurrence, I observed the following sequence of events:
> > > >
> > > > 0) client plays w/ HBase for a long time w/o issue
> > > > 1) client runs out of memory during HBase operation:
> > > >
> > > >                 http://pastebin.com/b5x44Lx7
> > > >
> > > > 3) Exception is thrown, memory is released
> > > > 2) In some shutdown logic the client tries to access HBase again
> > > > and
> > > hangs:
> > > >
> > > >                 http://pastebin.com/xU4MSq9k
> > > >
> > > > Clearly I need to fix OOM.  However, the fact that client hangs is
> > > > not
> > > nice.  Any ideas why?
> > > >
> > > > BTW- I started by looking at zookeeper log. Not much there but
> > > > here you
> > > go:
> > > >
> > > >                 http://pastebin.com/wZvE0Fbv
> > > >
> > > > Thanks,
> > > > Ted
> > > >
> > >
> >
>

Re: HBase client hangs after client-side OOM

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hello Ted,

ZooKeeper 3.4.5 is the recommended release to use in HBase 0.94.x,
regarding compatibility across ZooKeeper releases I don't think there is
any issue, but the ZK devs might be able to confirm.

cheers,
esteban.


--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 3:19 PM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hello All-
>
> It sounds like upgrading our zookeeper client would be a good idea. Can
> anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK
> 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4?  I've
> read a few contradictory things about ZK client/server compatibility across
> 3.3/3.4 releases.
>
> Thanks,
> Ted
>
> -----Original Message-----
> From: Ted Tuttle [mailto:ted@mentacapital.com]
> Sent: Thursday, August 14, 2014 12:43 PM
> To: user@hbase.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hello Esteban-
>
> At the time of the ZK connection problems the client had an OOM event.
> However, the client machine overall was in fine shape looking at ganglia
> reports;  it certainly wasn't swapping or spending significant cycles on
> I/O wait.
>
> Similarly, our zookeeper server was real chilled as it always is.
>
> Regarding client configuration:
>
> <property>
>     <!--Loaded from hbase-default.xml-->
>     <name>hbase.client.pause</name>
>     <value>1000</value>
> </property>
>
> Thanks,
> Ted
>
> -----Original Message-----
> From: Esteban Gutierrez [mailto:esteban@cloudera.com]
> Sent: Thursday, August 14, 2014 10:47 AM
> To: user@hbase.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> Hi Ted,
>
> I've see this kind of client "hangs" few times when the underlying
> environment is under heavy swapping and with older versions of ZK as Rakesh
> mentioned, also when hbase.client.pause is set to 0. Do you know if your
> environment is experiencing a similar behavior with heavy IO due swapping ?
> can you also share your client configuration too?
>
> cheers,
> esteban.
>
> --
> Cloudera, Inc.
>
>
>
> On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > The client-side thread dump in here:
> >
> > http://pastebin.com/xU4MSq9k
> >
> > SendThread appears to be active.
> >
> > -----Original Message-----
> > From: Rakesh R [mailto:rakeshr@huawei.com]
> > Sent: Thursday, August 14, 2014 7:01 AM
> > To: dev@zookeeper.apache.org; user@hbase.apache.org
> > Subject: RE: HBase client hangs after client-side OOM
> >
> > Hi,
> >
> > >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > ZK version is quite old. I could see ClientCnxn is only catching
> > IOException and when there is OOME it will exit SendThread.
> > I think, thats the reason for client hanging. Client side threaddump
> > will help us to see the liveliness of SendThread.
> >
> > Client side exception handling has been modified in 3.4 & 3.5 branches.
> > Can you check the possibility of upgrading to 3.4.6 latest release.
> >
> > Regards,
> > Rakesh
> >
> > -----Original Message-----
> > From: Qiang Tian [mailto:tianq01@gmail.com]
> > Sent: 14 August 2014 11:03
> > To: user@hbase.apache.org; dev@zookeeper.apache.org
> > Subject: Re: HBase client hangs after client-side OOM
> >
> > the sendthread stacktrace looks not correct. Do you have the client log?
> > (in case zk client code log sth there) from the zk code, it looks
> > ClientCnxn$SendThread.run should have caught
> > it(throwable) and done the cleanup work, e.g. notify the main thread,
> > so that it can wake up from ClientCnxn.submitRequest..
> >
> > send to Zookeeper for help.
> > thanks.
> >
> >
> >
> > On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com>
> wrote:
> >
> > > Hi Lars-
> > >
> > > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> > >
> > > Thanks,
> > > Ted
> > >
> > > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org>
> wrote:
> > > >
> > > > Hey Ted,
> > > >
> > > > so this is a problem with the ZK client, it seems to not clean
> > > > itself up
> > > correctly upon receiving an exception at the wrong moment.
> > > > Which version of ZK are you using?
> > > >
> > > >
> > > > -- Lars
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Ted Tuttle <te...@mentacapital.com>
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > Cc: Development <De...@mentacapital.com>
> > > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > > Subject: HBase client hangs after client-side OOM
> > > >
> > > > Hello-
> > > >
> > > > We are running HBase v0.94.16 on an 8 node cluster.
> > > >
> > > > We have a recurring problem w/ HBase clients hanging.  In latest
> > > occurrence, I observed the following sequence of events:
> > > >
> > > > 0) client plays w/ HBase for a long time w/o issue
> > > > 1) client runs out of memory during HBase operation:
> > > >
> > > >                 http://pastebin.com/b5x44Lx7
> > > >
> > > > 3) Exception is thrown, memory is released
> > > > 2) In some shutdown logic the client tries to access HBase again
> > > > and
> > > hangs:
> > > >
> > > >                 http://pastebin.com/xU4MSq9k
> > > >
> > > > Clearly I need to fix OOM.  However, the fact that client hangs is
> > > > not
> > > nice.  Any ideas why?
> > > >
> > > > BTW- I started by looking at zookeeper log. Not much there but
> > > > here you
> > > go:
> > > >
> > > >                 http://pastebin.com/wZvE0Fbv
> > > >
> > > > Thanks,
> > > > Ted
> > > >
> > >
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
Hello All-

It sounds like upgrading our zookeeper client would be a good idea. Can anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4?  I've read a few contradictory things about ZK client/server compatibility across 3.3/3.4 releases.  

Thanks,
Ted 

-----Original Message-----
From: Ted Tuttle [mailto:ted@mentacapital.com] 
Sent: Thursday, August 14, 2014 12:43 PM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: RE: HBase client hangs after client-side OOM

Hello Esteban-

At the time of the ZK connection problems the client had an OOM event. However, the client machine overall was in fine shape looking at ganglia reports;  it certainly wasn't swapping or spending significant cycles on I/O wait.

Similarly, our zookeeper server was real chilled as it always is.

Regarding client configuration:

<property>
    <!--Loaded from hbase-default.xml-->
    <name>hbase.client.pause</name>
    <value>1000</value>
</property>

Thanks,
Ted

-----Original Message-----
From: Esteban Gutierrez [mailto:esteban@cloudera.com]
Sent: Thursday, August 14, 2014 10:47 AM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

Hi Ted,

I've see this kind of client "hangs" few times when the underlying environment is under heavy swapping and with older versions of ZK as Rakesh mentioned, also when hbase.client.pause is set to 0. Do you know if your environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching 
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump 
> will help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there) from the zk code, it looks 
> ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, 
> so that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean 
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again 
> > > and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but 
> > > here you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
Hello All-

It sounds like upgrading our zookeeper client would be a good idea. Can anyone provide some guidelines on compatibility of HBase 0.94.16 with ZK 3.4.X? How about compatibility of ZK client 3.4.X w/ ZK server 3.3.4?  I've read a few contradictory things about ZK client/server compatibility across 3.3/3.4 releases.  

Thanks,
Ted 

-----Original Message-----
From: Ted Tuttle [mailto:ted@mentacapital.com] 
Sent: Thursday, August 14, 2014 12:43 PM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: RE: HBase client hangs after client-side OOM

Hello Esteban-

At the time of the ZK connection problems the client had an OOM event. However, the client machine overall was in fine shape looking at ganglia reports;  it certainly wasn't swapping or spending significant cycles on I/O wait.

Similarly, our zookeeper server was real chilled as it always is.

Regarding client configuration:

<property>
    <!--Loaded from hbase-default.xml-->
    <name>hbase.client.pause</name>
    <value>1000</value>
</property>

Thanks,
Ted

-----Original Message-----
From: Esteban Gutierrez [mailto:esteban@cloudera.com]
Sent: Thursday, August 14, 2014 10:47 AM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

Hi Ted,

I've see this kind of client "hangs" few times when the underlying environment is under heavy swapping and with older versions of ZK as Rakesh mentioned, also when hbase.client.pause is set to 0. Do you know if your environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching 
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump 
> will help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there) from the zk code, it looks 
> ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, 
> so that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean 
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again 
> > > and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but 
> > > here you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
Hello Esteban-

At the time of the ZK connection problems the client had an OOM event. However, the client machine overall was in fine shape looking at ganglia reports;  it certainly wasn't swapping or spending significant cycles on I/O wait.

Similarly, our zookeeper server was real chilled as it always is.

Regarding client configuration:

<property>
    <!--Loaded from hbase-default.xml-->
    <name>hbase.client.pause</name>
    <value>1000</value>
</property>

Thanks,
Ted

-----Original Message-----
From: Esteban Gutierrez [mailto:esteban@cloudera.com] 
Sent: Thursday, August 14, 2014 10:47 AM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

Hi Ted,

I've see this kind of client "hangs" few times when the underlying environment is under heavy swapping and with older versions of ZK as Rakesh mentioned, also when hbase.client.pause is set to 0. Do you know if your environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching 
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump 
> will help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there) from the zk code, it looks 
> ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, 
> so that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean 
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again 
> > > and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but 
> > > here you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
Hello Esteban-

At the time of the ZK connection problems the client had an OOM event. However, the client machine overall was in fine shape looking at ganglia reports;  it certainly wasn't swapping or spending significant cycles on I/O wait.

Similarly, our zookeeper server was real chilled as it always is.

Regarding client configuration:

<property>
    <!--Loaded from hbase-default.xml-->
    <name>hbase.client.pause</name>
    <value>1000</value>
</property>

Thanks,
Ted

-----Original Message-----
From: Esteban Gutierrez [mailto:esteban@cloudera.com] 
Sent: Thursday, August 14, 2014 10:47 AM
To: user@hbase.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

Hi Ted,

I've see this kind of client "hangs" few times when the underlying environment is under heavy swapping and with older versions of ZK as Rakesh mentioned, also when hbase.client.pause is set to 0. Do you know if your environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching 
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump 
> will help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there) from the zk code, it looks 
> ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, 
> so that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean 
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again 
> > > and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but 
> > > here you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

Re: HBase client hangs after client-side OOM

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Ted,

I've see this kind of client "hangs" few times when the underlying
environment is under heavy swapping and with older versions of ZK as Rakesh
mentioned, also when hbase.client.pause is set to 0. Do you know if your
environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump will
> help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there)
> from the zk code, it looks ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, so
> that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but here
> > > you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

Re: HBase client hangs after client-side OOM

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Ted,

I've see this kind of client "hangs" few times when the underlying
environment is under heavy swapping and with older versions of ZK as Rakesh
mentioned, also when hbase.client.pause is set to 0. Do you know if your
environment is experiencing a similar behavior with heavy IO due swapping ?
can you also share your client configuration too?

cheers,
esteban.

--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 9:56 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> The client-side thread dump in here:
>
> http://pastebin.com/xU4MSq9k
>
> SendThread appears to be active.
>
> -----Original Message-----
> From: Rakesh R [mailto:rakeshr@huawei.com]
> Sent: Thursday, August 14, 2014 7:01 AM
> To: dev@zookeeper.apache.org; user@hbase.apache.org
> Subject: RE: HBase client hangs after client-side OOM
>
> Hi,
>
> >> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> ZK version is quite old. I could see ClientCnxn is only catching
> IOException and when there is OOME it will exit SendThread.
> I think, thats the reason for client hanging. Client side threaddump will
> help us to see the liveliness of SendThread.
>
> Client side exception handling has been modified in 3.4 & 3.5 branches.
> Can you check the possibility of upgrading to 3.4.6 latest release.
>
> Regards,
> Rakesh
>
> -----Original Message-----
> From: Qiang Tian [mailto:tianq01@gmail.com]
> Sent: 14 August 2014 11:03
> To: user@hbase.apache.org; dev@zookeeper.apache.org
> Subject: Re: HBase client hangs after client-side OOM
>
> the sendthread stacktrace looks not correct. Do you have the client log?
> (in case zk client code log sth there)
> from the zk code, it looks ClientCnxn$SendThread.run should have caught
> it(throwable) and done the cleanup work, e.g. notify the main thread, so
> that it can wake up from ClientCnxn.submitRequest..
>
> send to Zookeeper for help.
> thanks.
>
>
>
> On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:
>
> > Hi Lars-
> >
> > We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
> >
> > Thanks,
> > Ted
> >
> > > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> > >
> > > Hey Ted,
> > >
> > > so this is a problem with the ZK client, it seems to not clean
> > > itself up
> > correctly upon receiving an exception at the wrong moment.
> > > Which version of ZK are you using?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Tuttle <te...@mentacapital.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Cc: Development <De...@mentacapital.com>
> > > Sent: Wednesday, August 13, 2014 4:38 PM
> > > Subject: HBase client hangs after client-side OOM
> > >
> > > Hello-
> > >
> > > We are running HBase v0.94.16 on an 8 node cluster.
> > >
> > > We have a recurring problem w/ HBase clients hanging.  In latest
> > occurrence, I observed the following sequence of events:
> > >
> > > 0) client plays w/ HBase for a long time w/o issue
> > > 1) client runs out of memory during HBase operation:
> > >
> > >                 http://pastebin.com/b5x44Lx7
> > >
> > > 3) Exception is thrown, memory is released
> > > 2) In some shutdown logic the client tries to access HBase again and
> > hangs:
> > >
> > >                 http://pastebin.com/xU4MSq9k
> > >
> > > Clearly I need to fix OOM.  However, the fact that client hangs is
> > > not
> > nice.  Any ideas why?
> > >
> > > BTW- I started by looking at zookeeper log. Not much there but here
> > > you
> > go:
> > >
> > >                 http://pastebin.com/wZvE0Fbv
> > >
> > > Thanks,
> > > Ted
> > >
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
The client-side thread dump in here:

http://pastebin.com/xU4MSq9k

SendThread appears to be active.

-----Original Message-----
From: Rakesh R [mailto:rakeshr@huawei.com] 
Sent: Thursday, August 14, 2014 7:01 AM
To: dev@zookeeper.apache.org; user@hbase.apache.org
Subject: RE: HBase client hangs after client-side OOM

Hi,

>> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.

ZK version is quite old. I could see ClientCnxn is only catching IOException and when there is OOME it will exit SendThread.
I think, thats the reason for client hanging. Client side threaddump will help us to see the liveliness of SendThread.

Client side exception handling has been modified in 3.4 & 3.5 branches.
Can you check the possibility of upgrading to 3.4.6 latest release.

Regards,
Rakesh

-----Original Message-----
From: Qiang Tian [mailto:tianq01@gmail.com]
Sent: 14 August 2014 11:03
To: user@hbase.apache.org; dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean 
> > itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here 
> > you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

RE: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
The client-side thread dump in here:

http://pastebin.com/xU4MSq9k

SendThread appears to be active.

-----Original Message-----
From: Rakesh R [mailto:rakeshr@huawei.com] 
Sent: Thursday, August 14, 2014 7:01 AM
To: dev@zookeeper.apache.org; user@hbase.apache.org
Subject: RE: HBase client hangs after client-side OOM

Hi,

>> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.

ZK version is quite old. I could see ClientCnxn is only catching IOException and when there is OOME it will exit SendThread.
I think, thats the reason for client hanging. Client side threaddump will help us to see the liveliness of SendThread.

Client side exception handling has been modified in 3.4 & 3.5 branches.
Can you check the possibility of upgrading to 3.4.6 latest release.

Regards,
Rakesh

-----Original Message-----
From: Qiang Tian [mailto:tianq01@gmail.com]
Sent: 14 August 2014 11:03
To: user@hbase.apache.org; dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean 
> > itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here 
> > you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

RE: HBase client hangs after client-side OOM

Posted by Rakesh R <ra...@huawei.com>.
Hi,

>> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.

ZK version is quite old. I could see ClientCnxn is only catching IOException and when there is OOME it will exit SendThread.
I think, thats the reason for client hanging. Client side threaddump will help us to see the liveliness of SendThread.

Client side exception handling has been modified in 3.4 & 3.5 branches.
Can you check the possibility of upgrading to 3.4.6 latest release.

Regards,
Rakesh

-----Original Message-----
From: Qiang Tian [mailto:tianq01@gmail.com] 
Sent: 14 August 2014 11:03
To: user@hbase.apache.org; dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean 
> > itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here 
> > you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

RE: HBase client hangs after client-side OOM

Posted by Rakesh R <ra...@huawei.com>.
Hi,

>> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.

ZK version is quite old. I could see ClientCnxn is only catching IOException and when there is OOME it will exit SendThread.
I think, thats the reason for client hanging. Client side threaddump will help us to see the liveliness of SendThread.

Client side exception handling has been modified in 3.4 & 3.5 branches.
Can you check the possibility of upgrading to 3.4.6 latest release.

Regards,
Rakesh

-----Original Message-----
From: Qiang Tian [mailto:tianq01@gmail.com] 
Sent: 14 August 2014 11:03
To: user@hbase.apache.org; dev@zookeeper.apache.org
Subject: Re: HBase client hangs after client-side OOM

the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean 
> > itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is 
> > not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here 
> > you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

Re: HBase client hangs after client-side OOM

Posted by Qiang Tian <ti...@gmail.com>.
the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so
that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

Re: HBase client hangs after client-side OOM

Posted by Qiang Tian <ti...@gmail.com>.
the sendthread stacktrace looks not correct. Do you have the client log?
(in case zk client code log sth there)
from the zk code, it looks ClientCnxn$SendThread.run should have caught
it(throwable) and done the cleanup work, e.g. notify the main thread, so
that it can wake up from ClientCnxn.submitRequest..

send to Zookeeper for help.
thanks.



On Thu, Aug 14, 2014 at 11:19 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Hi Lars-
>
> We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16.
>
> Thanks,
> Ted
>
> > On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > Hey Ted,
> >
> > so this is a problem with the ZK client, it seems to not clean itself up
> correctly upon receiving an exception at the wrong moment.
> > Which version of ZK are you using?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Tuttle <te...@mentacapital.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Cc: Development <De...@mentacapital.com>
> > Sent: Wednesday, August 13, 2014 4:38 PM
> > Subject: HBase client hangs after client-side OOM
> >
> > Hello-
> >
> > We are running HBase v0.94.16 on an 8 node cluster.
> >
> > We have a recurring problem w/ HBase clients hanging.  In latest
> occurrence, I observed the following sequence of events:
> >
> > 0) client plays w/ HBase for a long time w/o issue
> > 1) client runs out of memory during HBase operation:
> >
> >                 http://pastebin.com/b5x44Lx7
> >
> > 3) Exception is thrown, memory is released
> > 2) In some shutdown logic the client tries to access HBase again and
> hangs:
> >
> >                 http://pastebin.com/xU4MSq9k
> >
> > Clearly I need to fix OOM.  However, the fact that client hangs is not
> nice.  Any ideas why?
> >
> > BTW- I started by looking at zookeeper log. Not much there but here you
> go:
> >
> >                 http://pastebin.com/wZvE0Fbv
> >
> > Thanks,
> > Ted
> >
>

Re: HBase client hangs after client-side OOM

Posted by Ted Tuttle <te...@mentacapital.com>.
Hi Lars-

We are running ZK 3.3.4, Cloudera cdh3u3, HBase 0.94.16. 

Thanks,
Ted

> On Aug 13, 2014, at 5:36 PM, "lars hofhansl" <la...@apache.org> wrote:
> 
> Hey Ted,
> 
> so this is a problem with the ZK client, it seems to not clean itself up correctly upon receiving an exception at the wrong moment.
> Which version of ZK are you using?
> 
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Ted Tuttle <te...@mentacapital.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc: Development <De...@mentacapital.com>
> Sent: Wednesday, August 13, 2014 4:38 PM
> Subject: HBase client hangs after client-side OOM
> 
> Hello-
> 
> We are running HBase v0.94.16 on an 8 node cluster.
> 
> We have a recurring problem w/ HBase clients hanging.  In latest occurrence, I observed the following sequence of events:
> 
> 0) client plays w/ HBase for a long time w/o issue
> 1) client runs out of memory during HBase operation:
> 
>                 http://pastebin.com/b5x44Lx7
> 
> 3) Exception is thrown, memory is released
> 2) In some shutdown logic the client tries to access HBase again and hangs:
> 
>                 http://pastebin.com/xU4MSq9k
> 
> Clearly I need to fix OOM.  However, the fact that client hangs is not nice.  Any ideas why?
> 
> BTW- I started by looking at zookeeper log. Not much there but here you go:
> 
>                 http://pastebin.com/wZvE0Fbv
> 
> Thanks,
> Ted
> 

Re: HBase client hangs after client-side OOM

Posted by lars hofhansl <la...@apache.org>.
Hey Ted,

so this is a problem with the ZK client, it seems to not clean itself up correctly upon receiving an exception at the wrong moment.
Which version of ZK are you using?


-- Lars



----- Original Message -----
From: Ted Tuttle <te...@mentacapital.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: Development <De...@mentacapital.com>
Sent: Wednesday, August 13, 2014 4:38 PM
Subject: HBase client hangs after client-side OOM

Hello-

We are running HBase v0.94.16 on an 8 node cluster.

We have a recurring problem w/ HBase clients hanging.  In latest occurrence, I observed the following sequence of events:

0) client plays w/ HBase for a long time w/o issue
1) client runs out of memory during HBase operation:

                http://pastebin.com/b5x44Lx7

3) Exception is thrown, memory is released
2) In some shutdown logic the client tries to access HBase again and hangs:

                http://pastebin.com/xU4MSq9k

Clearly I need to fix OOM.  However, the fact that client hangs is not nice.  Any ideas why?

BTW- I started by looking at zookeeper log. Not much there but here you go:

                http://pastebin.com/wZvE0Fbv

Thanks,
Ted