You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alexander Ilyin <al...@weborama.com> on 2016/10/24 12:44:45 UTC

Spikes in HBase response time

Hi,

I'm experiencing some spikes in HBase response time. The problem appeared
in one moment: before all the requests used to take 4-30ms to complete and
now some of them (several percent) take as long as 800ms decreasing
significantly an average processing time.

This happens with the requests to all the tables I have so it doesn't seem
to be related to some problem in data schema. And it didn't change anyway.

I don't see any problems with the lack of resources: there is enough CPU,
memory on datanodes.

Automatic splitting is off, no cluster balancing is taking place so it
shouldn't be the client retries as well.

There are some stacktraces in datanode log but as far as I understand these
are normal and anyway they had existed before the problem appeared:
2016-10-24 05:42:19,186 INFO
[B.defaultRpcServer.handler=17,queue=2,port=16020]
shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could not
load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to
InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control
error while attempting to set up short-circuit access to
/apps/hbase/data/data/profile/DailyUserScores/84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9c3f0e85Block
token with block_token_identifier (expiryDate=1477279957879,
keyId=2039256459, userId=hbase,
blockPoolId=BP-415105157-10.190.240.206-1468423545420, blockId=1075792477,
access modes=[READ]) is expired.

At the moment I'm a bit stuck so any advices on what to check next are
welcome.

PS. HBase version is 1.1.2

Re: Spikes in HBase response time

Posted by Alexander Ilyin <al...@weborama.com>.
I'm still struggling with the "slow" requests described above.

As a temporary solution, I want to limit request execution time and just
drop those messages that make HBase so long to respond. Could you advise
which timeout fits best for this task. I see the following options:
* hbase.rpc.timeout
* hbase.client.operation.timeout
* something else I missed?


On Tue, Oct 25, 2016 at 1:21 PM, Alexander Ilyin <al...@weborama.com>
wrote:

> Hi Ted, Stack,
>
> thanks for the answers.
>
> We're not using Phoenix but the server process was running on some
> datanodes. We stopped it to be sure, nothing changed.
>
> Ted, I suppose you meant region server jstack traces, here it is:
> http://pastebin.com/CHzyrBp9
> http://pastebin.com/PdNczfVa
>
> Stack, our application pulls the messages from RabbitMQ queue and
> processes them using HBase so there is no dependency on external spikes in
> load. At least no obvious one.
> It looks like random requests taking much more time to complete that the
> others. It happens all the time starting Friday, not during some specific
> time windows, several percent of requests just slow.
> Region server GC pauses look normal -- less than 0.1 second every several
> seconds.
>
> The fact that most of "long" requests have execution time in the range of
> 800-900ms makes me think that we're hitting some timeout which is not
> logged properly. But this is just an idea.
>
>
> On Mon, Oct 24, 2016 at 7:20 PM, Stack <st...@duboce.net> wrote:
>
>> Any correlation with a spike in load -- more i/os, more traffic, other
>> tasks running on the box -- or a long GC. Is it random requests or over a
>> particular period all requests suffer the 800ms penalty?
>> Thanks,
>> St.Ack
>>
>> On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <al...@weborama.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I'm experiencing some spikes in HBase response time. The problem
>> appeared
>> > in one moment: before all the requests used to take 4-30ms to complete
>> and
>> > now some of them (several percent) take as long as 800ms decreasing
>> > significantly an average processing time.
>> >
>> > This happens with the requests to all the tables I have so it doesn't
>> seem
>> > to be related to some problem in data schema. And it didn't change
>> anyway.
>> >
>> > I don't see any problems with the lack of resources: there is enough
>> CPU,
>> > memory on datanodes.
>> >
>> > Automatic splitting is off, no cluster balancing is taking place so it
>> > shouldn't be the client retries as well.
>> >
>> > There are some stacktraces in datanode log but as far as I understand
>> these
>> > are normal and anyway they had existed before the problem appeared:
>> > 2016-10-24 05:42:19,186 INFO
>> > [B.defaultRpcServer.handler=17,queue=2,port=16020]
>> > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could
>> not
>> > load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to
>> > InvalidToken exception.
>> > org.apache.hadoop.security.token.SecretManager$InvalidToken: access
>> > control
>> > error while attempting to set up short-circuit access to
>> > /apps/hbase/data/data/profile/DailyUserScores/
>> > 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9
>> c3f0e85Block
>> > token with block_token_identifier (expiryDate=1477279957879,
>> > keyId=2039256459, userId=hbase,
>> > blockPoolId=BP-415105157-10.190.240.206-1468423545420,
>> blockId=1075792477,
>> > access modes=[READ]) is expired.
>> >
>> > At the moment I'm a bit stuck so any advices on what to check next are
>> > welcome.
>> >
>> > PS. HBase version is 1.1.2
>> >
>>
>
>

Re: Spikes in HBase response time

Posted by Alexander Ilyin <al...@weborama.com>.
Hi Ted, Stack,

thanks for the answers.

We're not using Phoenix but the server process was running on some
datanodes. We stopped it to be sure, nothing changed.

Ted, I suppose you meant region server jstack traces, here it is:
http://pastebin.com/CHzyrBp9
http://pastebin.com/PdNczfVa

Stack, our application pulls the messages from RabbitMQ queue and processes
them using HBase so there is no dependency on external spikes in load. At
least no obvious one.
It looks like random requests taking much more time to complete that the
others. It happens all the time starting Friday, not during some specific
time windows, several percent of requests just slow.
Region server GC pauses look normal -- less than 0.1 second every several
seconds.

The fact that most of "long" requests have execution time in the range of
800-900ms makes me think that we're hitting some timeout which is not
logged properly. But this is just an idea.

On Mon, Oct 24, 2016 at 7:20 PM, Stack <st...@duboce.net> wrote:

> Any correlation with a spike in load -- more i/os, more traffic, other
> tasks running on the box -- or a long GC. Is it random requests or over a
> particular period all requests suffer the 800ms penalty?
> Thanks,
> St.Ack
>
> On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <al...@weborama.com>
> wrote:
>
> > Hi,
> >
> > I'm experiencing some spikes in HBase response time. The problem appeared
> > in one moment: before all the requests used to take 4-30ms to complete
> and
> > now some of them (several percent) take as long as 800ms decreasing
> > significantly an average processing time.
> >
> > This happens with the requests to all the tables I have so it doesn't
> seem
> > to be related to some problem in data schema. And it didn't change
> anyway.
> >
> > I don't see any problems with the lack of resources: there is enough CPU,
> > memory on datanodes.
> >
> > Automatic splitting is off, no cluster balancing is taking place so it
> > shouldn't be the client retries as well.
> >
> > There are some stacktraces in datanode log but as far as I understand
> these
> > are normal and anyway they had existed before the problem appeared:
> > 2016-10-24 05:42:19,186 INFO
> > [B.defaultRpcServer.handler=17,queue=2,port=16020]
> > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could not
> > load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to
> > InvalidToken exception.
> > org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> > control
> > error while attempting to set up short-circuit access to
> > /apps/hbase/data/data/profile/DailyUserScores/
> > 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9c3f0e85Block
> > token with block_token_identifier (expiryDate=1477279957879,
> > keyId=2039256459, userId=hbase,
> > blockPoolId=BP-415105157-10.190.240.206-1468423545420,
> blockId=1075792477,
> > access modes=[READ]) is expired.
> >
> > At the moment I'm a bit stuck so any advices on what to check next are
> > welcome.
> >
> > PS. HBase version is 1.1.2
> >
>

Re: Spikes in HBase response time

Posted by Stack <st...@duboce.net>.
Any correlation with a spike in load -- more i/os, more traffic, other
tasks running on the box -- or a long GC. Is it random requests or over a
particular period all requests suffer the 800ms penalty?
Thanks,
St.Ack

On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <al...@weborama.com>
wrote:

> Hi,
>
> I'm experiencing some spikes in HBase response time. The problem appeared
> in one moment: before all the requests used to take 4-30ms to complete and
> now some of them (several percent) take as long as 800ms decreasing
> significantly an average processing time.
>
> This happens with the requests to all the tables I have so it doesn't seem
> to be related to some problem in data schema. And it didn't change anyway.
>
> I don't see any problems with the lack of resources: there is enough CPU,
> memory on datanodes.
>
> Automatic splitting is off, no cluster balancing is taking place so it
> shouldn't be the client retries as well.
>
> There are some stacktraces in datanode log but as far as I understand these
> are normal and anyway they had existed before the problem appeared:
> 2016-10-24 05:42:19,186 INFO
> [B.defaultRpcServer.handler=17,queue=2,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could not
> load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to
> InvalidToken exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control
> error while attempting to set up short-circuit access to
> /apps/hbase/data/data/profile/DailyUserScores/
> 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9c3f0e85Block
> token with block_token_identifier (expiryDate=1477279957879,
> keyId=2039256459, userId=hbase,
> blockPoolId=BP-415105157-10.190.240.206-1468423545420, blockId=1075792477,
> access modes=[READ]) is expired.
>
> At the moment I'm a bit stuck so any advices on what to check next are
> welcome.
>
> PS. HBase version is 1.1.2
>

Re: Spikes in HBase response time

Posted by Ted Yu <yu...@gmail.com>.
Can you take a few jstack traces when this happens (and pastebin the
traces) ?

Do you use Phoenix in this cluster ?

Thanks

On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <al...@weborama.com>
wrote:

> Hi,
>
> I'm experiencing some spikes in HBase response time. The problem appeared
> in one moment: before all the requests used to take 4-30ms to complete and
> now some of them (several percent) take as long as 800ms decreasing
> significantly an average processing time.
>
> This happens with the requests to all the tables I have so it doesn't seem
> to be related to some problem in data schema. And it didn't change anyway.
>
> I don't see any problems with the lack of resources: there is enough CPU,
> memory on datanodes.
>
> Automatic splitting is off, no cluster balancing is taking place so it
> shouldn't be the client retries as well.
>
> There are some stacktraces in datanode log but as far as I understand these
> are normal and anyway they had existed before the problem appeared:
> 2016-10-24 05:42:19,186 INFO
> [B.defaultRpcServer.handler=17,queue=2,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could not
> load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to
> InvalidToken exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control
> error while attempting to set up short-circuit access to
> /apps/hbase/data/data/profile/DailyUserScores/
> 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9c3f0e85Block
> token with block_token_identifier (expiryDate=1477279957879,
> keyId=2039256459, userId=hbase,
> blockPoolId=BP-415105157-10.190.240.206-1468423545420, blockId=1075792477,
> access modes=[READ]) is expired.
>
> At the moment I'm a bit stuck so any advices on what to check next are
> welcome.
>
> PS. HBase version is 1.1.2
>