You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Baugher,Bryan" <Br...@Cerner.com> on 2012/12/28 18:40:31 UTC

HBase client locks application during major compactions

Hi everyone,

For the past month or so we have noticed that some of our applications become frozen about once a day and need to be restarted in order to bring them back. We eventually figured out that it was caused by/happening during major compactions.

We have automated major compactions disabled and are running them manually on each table sequentially each day starting at 4am. We are running on CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough this is only happening in our dev environment with each region server serving ~650 regions.

Looking at the logs in HBase show that the compactions are occurring and this warning repeatedly while the compactions are occurring,

WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call getHTableDescriptors(), rpc version=1, client version=29, methodsFingerPrint=400804878 from ***: output error

Looking at our application logs we often see this error or a variation[1].

I took a thread dump of our application while it was locked and saw that nearly all of the threads in the application were blocked by a single thread that was waiting on HBaseClient$Call[2].

[1] - http://pastebin.com/P4skndEg
[2] - http://pastebin.com/YLZn3SRK

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Re: HBase client locks application during major compactions

Posted by "Baugher,Bryan" <Br...@Cerner.com>.

I believe that is one of our region servers which I will have to wait till
tomorrow to check gc logs.

On 12/28/12 12:45 PM, "Ted Yu" <yu...@gmail.com> wrote:

>I was talking about the server which was anonymized:
>***/***:60020
>
>Cheers
>
>On Fri, Dec 28, 2012 at 10:41 AM, Baugher,Bryan
><Br...@cerner.com>wrote:
>
>>
>>
>> On 12/28/12 12:14 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>
>> >Looks like there was socket timeout :
>> >
>> >java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> >channel to be ready for read. ch :
>> >java.nio.channels.SocketChannel[connected local=/***:39752
>> >remote=***/***:60020]
>> >
>> >Have you collected / checked GC log on the server referenced above ?
>>
>> I am not sure exactly which server you are referring to. For the
>> application server we don't currently collect gc logs. For hbase we do
>>but
>> the gc logs were truncated recently and won't help.
>>
>> >
>> >BTW Have you considered deploying 0.92.2 in your cluster ?
>>
>> Not really. We have stuck with cloudera's distribution for a couple
>>years
>> now and I don't really see us going down that track.
>>
>> >
>> >Thanks, glad to see Cerner using HBase.
>> >
>> >On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan
>> ><Br...@cerner.com>wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> For the past month or so we have noticed that some of our
>>applications
>> >> become frozen about once a day and need to be restarted in order to
>> >>bring
>> >> them back. We eventually figured out that it was caused by/happening
>> >>during
>> >> major compactions.
>> >>
>> >> We have automated major compactions disabled and are running them
>> >>manually
>> >> on each table sequentially each day starting at 4am. We are running
>>on
>> >> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough
>>this is
>> >> only happening in our dev environment with each region server serving
>> >>~650
>> >> regions.
>> >>
>> >> Looking at the logs in HBase show that the compactions are occurring
>>and
>> >> this warning repeatedly while the compactions are occurring,
>> >>
>> >> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
>> >> getHTableDescriptors(), rpc version=1, client version=29,
>> >> methodsFingerPrint=400804878 from ***: output error
>> >>
>> >> Looking at our application logs we often see this error or a
>> >>variation[1].
>> >>
>> >> I took a thread dump of our application while it was locked and saw
>>that
>> >> nearly all of the threads in the application were blocked by a single
>> >> thread that was waiting on HBaseClient$Call[2].
>> >>
>> >> [1] - http://pastebin.com/P4skndEg
>> >> [2] - http://pastebin.com/YLZn3SRK
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE This message and any included attachments are
>> >>from
>> >> Cerner Corporation and are intended only for the addressee. The
>> >>information
>> >> contained in this message is confidential and may constitute inside
>>or
>> >> non-public information under international, federal, or state
>>securities
>> >> laws. Unauthorized forwarding, printing, copying, distribution, or
>>use
>> >>of
>> >> such information is strictly prohibited and may be unlawful. If you
>>are
>> >>not
>> >> the addressee, please promptly delete this message and notify the
>> >>sender of
>> >> the delivery error by e-mail or you may call Cerner's corporate
>>offices
>> >>in
>> >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>> >>
>>
>>

Re: HBase client locks application during major compactions

Posted by Ted Yu <yu...@gmail.com>.

I was talking about the server which was anonymized:
***/***:60020

Cheers

On Fri, Dec 28, 2012 at 10:41 AM, Baugher,Bryan <Br...@cerner.com>wrote:

>
>
> On 12/28/12 12:14 PM, "Ted Yu" <yu...@gmail.com> wrote:
>
> >Looks like there was socket timeout :
> >
> >java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >channel to be ready for read. ch :
> >java.nio.channels.SocketChannel[connected local=/***:39752
> >remote=***/***:60020]
> >
> >Have you collected / checked GC log on the server referenced above ?
>
> I am not sure exactly which server you are referring to. For the
> application server we don't currently collect gc logs. For hbase we do but
> the gc logs were truncated recently and won't help.
>
> >
> >BTW Have you considered deploying 0.92.2 in your cluster ?
>
> Not really. We have stuck with cloudera's distribution for a couple years
> now and I don't really see us going down that track.
>
> >
> >Thanks, glad to see Cerner using HBase.
> >
> >On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan
> ><Br...@cerner.com>wrote:
> >
> >> Hi everyone,
> >>
> >> For the past month or so we have noticed that some of our applications
> >> become frozen about once a day and need to be restarted in order to
> >>bring
> >> them back. We eventually figured out that it was caused by/happening
> >>during
> >> major compactions.
> >>
> >> We have automated major compactions disabled and are running them
> >>manually
> >> on each table sequentially each day starting at 4am. We are running on
> >> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough this is
> >> only happening in our dev environment with each region server serving
> >>~650
> >> regions.
> >>
> >> Looking at the logs in HBase show that the compactions are occurring and
> >> this warning repeatedly while the compactions are occurring,
> >>
> >> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
> >> getHTableDescriptors(), rpc version=1, client version=29,
> >> methodsFingerPrint=400804878 from ***: output error
> >>
> >> Looking at our application logs we often see this error or a
> >>variation[1].
> >>
> >> I took a thread dump of our application while it was locked and saw that
> >> nearly all of the threads in the application were blocked by a single
> >> thread that was waiting on HBaseClient$Call[2].
> >>
> >> [1] - http://pastebin.com/P4skndEg
> >> [2] - http://pastebin.com/YLZn3SRK
> >>
> >>
> >> CONFIDENTIALITY NOTICE This message and any included attachments are
> >>from
> >> Cerner Corporation and are intended only for the addressee. The
> >>information
> >> contained in this message is confidential and may constitute inside or
> >> non-public information under international, federal, or state securities
> >> laws. Unauthorized forwarding, printing, copying, distribution, or use
> >>of
> >> such information is strictly prohibited and may be unlawful. If you are
> >>not
> >> the addressee, please promptly delete this message and notify the
> >>sender of
> >> the delivery error by e-mail or you may call Cerner's corporate offices
> >>in
> >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
> >>
>
>

Re: HBase client locks application during major compactions

Posted by "Baugher,Bryan" <Br...@Cerner.com>.


On 12/28/12 12:14 PM, "Ted Yu" <yu...@gmail.com> wrote:

>Looks like there was socket timeout :
>
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected local=/***:39752
>remote=***/***:60020]
>
>Have you collected / checked GC log on the server referenced above ?

I am not sure exactly which server you are referring to. For the
application server we don't currently collect gc logs. For hbase we do but
the gc logs were truncated recently and won't help.

>
>BTW Have you considered deploying 0.92.2 in your cluster ?

Not really. We have stuck with cloudera's distribution for a couple years
now and I don't really see us going down that track.

>
>Thanks, glad to see Cerner using HBase.
>
>On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan
><Br...@cerner.com>wrote:
>
>> Hi everyone,
>>
>> For the past month or so we have noticed that some of our applications
>> become frozen about once a day and need to be restarted in order to
>>bring
>> them back. We eventually figured out that it was caused by/happening
>>during
>> major compactions.
>>
>> We have automated major compactions disabled and are running them
>>manually
>> on each table sequentially each day starting at 4am. We are running on
>> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough this is
>> only happening in our dev environment with each region server serving
>>~650
>> regions.
>>
>> Looking at the logs in HBase show that the compactions are occurring and
>> this warning repeatedly while the compactions are occurring,
>>
>> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
>> getHTableDescriptors(), rpc version=1, client version=29,
>> methodsFingerPrint=400804878 from ***: output error
>>
>> Looking at our application logs we often see this error or a
>>variation[1].
>>
>> I took a thread dump of our application while it was locked and saw that
>> nearly all of the threads in the application were blocked by a single
>> thread that was waiting on HBaseClient$Call[2].
>>
>> [1] - http://pastebin.com/P4skndEg
>> [2] - http://pastebin.com/YLZn3SRK
>>
>>
>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>from
>> Cerner Corporation and are intended only for the addressee. The
>>information
>> contained in this message is confidential and may constitute inside or
>> non-public information under international, federal, or state securities
>> laws. Unauthorized forwarding, printing, copying, distribution, or use
>>of
>> such information is strictly prohibited and may be unlawful. If you are
>>not
>> the addressee, please promptly delete this message and notify the
>>sender of
>> the delivery error by e-mail or you may call Cerner's corporate offices
>>in
>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>

Re: HBase client locks application during major compactions

Posted by Ted Yu <yu...@gmail.com>.

Looks like there was socket timeout :

java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/***:39752
remote=***/***:60020]

Have you collected / checked GC log on the server referenced above ?

BTW Have you considered deploying 0.92.2 in your cluster ?

Thanks, glad to see Cerner using HBase.

On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan <Br...@cerner.com>wrote:

> Hi everyone,
>
> For the past month or so we have noticed that some of our applications
> become frozen about once a day and need to be restarted in order to bring
> them back. We eventually figured out that it was caused by/happening during
> major compactions.
>
> We have automated major compactions disabled and are running them manually
> on each table sequentially each day starting at 4am. We are running on
> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough this is
> only happening in our dev environment with each region server serving ~650
> regions.
>
> Looking at the logs in HBase show that the compactions are occurring and
> this warning repeatedly while the compactions are occurring,
>
> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
> getHTableDescriptors(), rpc version=1, client version=29,
> methodsFingerPrint=400804878 from ***: output error
>
> Looking at our application logs we often see this error or a variation[1].
>
> I took a thread dump of our application while it was locked and saw that
> nearly all of the threads in the application were blocked by a single
> thread that was waiting on HBaseClient$Call[2].
>
> [1] - http://pastebin.com/P4skndEg
> [2] - http://pastebin.com/YLZn3SRK
>
>
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>