You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ameya Kantikar <am...@groupon.com> on 2013/06/05 22:47:22 UTC

Region servers going down under heavy write load

Hi,

We have heavy map reduce write jobs running against our cluster. Every once
in a while, we see a region server going down.

We are on : 0.94.2-cdh4.2.0, r

We have done some tuning for heavy map reduce jobs, and have increased
scanner timeouts, lease timeouts, have also tuned memstore as follows:

hbase.hregion.memstore.block.multiplier: 4
hbase.hregion.memstore.flush.size: 134217728
hbase.hstore.blockingStoreFiles: 100

So now, we are still facing issues. Looking at the logs it looks like due
to zoo keeper timeout. We have tuned zookeeper settings as follows on
hbase-sie.xml:

zookeeper.session.timeout: 300000
hbase.zookeeper.property.tickTime: 6000


The actual log looks like:


2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":13468,"call":"next(6723331143689528698, 1000), rpc
version=1, client version=29, methodsFingerPrint=54742778","client":"
10.20.73.65:41721
","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}

2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor [.snappy]

2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
java.io.EOFException: Premature EOF: no length prefix available
        at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)

2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper: *We
slept 48686ms instead of 3000ms*, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2013-06-05 11:48:03,094 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled exception:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890 as
dead server

(Not sure why it says 3000ms when we have timeout at 300000ms)

We have done some GC tuning as well. Wondering what I can tune from making
RS going down? Any ideas?
This is batch heavy cluster, and we care less about read latency. We can
increase RAM bit more but not much (Already RS has 20GB memory)

Thanks in advance.

Ameya

Re: Region servers going down under heavy write load

Posted by Stack <st...@duboce.net>.
On Thu, Jun 6, 2013 at 8:15 AM, Stack <st...@duboce.net> wrote:

>
>>
>> bq.  increase tickTime in zoo.cfg?
>>
>> For shared zookeeper quorum, the above should be done.
>>
>>
> What?
>
>
What I meant to say is, how does this answer the question "Is this even
relevant anymore? hbase.zookeeper.property.tickTime ?"?

St.Ack

Re: Region servers going down under heavy write load

Posted by Ted Yu <yu...@gmail.com>.
Thanks for the pointer to HBase book.

I should have provided more background in my last email.

There is only one place in http://hbase.apache.org/book.html where
'hbase.zookeeper.property.tickTime' is mentioned - see bottom of the email.

There was no mentioning of HBASE_MANAGES_ZK variable in the vicinity of
this text. Some users would assume that the two referenced config
parameters, once set in hbase-site.xml, would be effective for future
zookeeper session timeout. However, if the zookeeper quorum is shared
across multiple HBase clusters, user's expectation wouldn't be met.

This is because zookeeper currently doesn't support different tickTimes for
different clusters. Without properly setting tickTime, the requested
session timeout wouldn't be satisfied.
There is a second, minor, issue with the text below. 1200000, in
milliseconds, represents 20 minutes instead of the declared 120 seconds.

Considering the above two points, I suggested using a JIRA to refine
src/docbkx/troubleshooting.xml.

Is that Okay ?

Here is text from 12.9.2.7. 'ZooKeeper SessionExpired events' of the HBase
book:

If you wish to increase the session timeout, add the following to your
hbase-site.xml to increase the timeout from the default of 60 seconds to
120 seconds.

<property>
    <name>zookeeper.session.timeout</name>
    <value>1200000</value>
</property>
<property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
</property>


On Wed, Jun 5, 2013 at 11:15 PM, Stack <st...@duboce.net> wrote:

> On Thu, Jun 6, 2013 at 4:57 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq.  I just dont find this "hbase.zookeeper.property.tickTime" anywhere
> in
> > the code base.
> >
> > Neither do I. Mind filing a JIRA to correct this in troubleshooting.xml ?
> >
>
>
> It intentionally does not exist in the hbase code base.  Read the first
> paragraph in our zookeeper chapter on how zk configs work:
> http://hbase.apache.org/book.html#zookeeper
>
>
>
> >
> > bq.  increase tickTime in zoo.cfg?
> >
> > For shared zookeeper quorum, the above should be done.
> >
> >
> What?
>
> St.Ack
>

Re: Region servers going down under heavy write load

Posted by Stack <st...@duboce.net>.
On Thu, Jun 6, 2013 at 4:57 AM, Ted Yu <yu...@gmail.com> wrote:

> bq.  I just dont find this "hbase.zookeeper.property.tickTime" anywhere in
> the code base.
>
> Neither do I. Mind filing a JIRA to correct this in troubleshooting.xml ?
>


It intentionally does not exist in the hbase code base.  Read the first
paragraph in our zookeeper chapter on how zk configs work:
http://hbase.apache.org/book.html#zookeeper



>
> bq.  increase tickTime in zoo.cfg?
>
> For shared zookeeper quorum, the above should be done.
>
>
What?

St.Ack

Re: Region servers going down under heavy write load

Posted by Ted Yu <yu...@gmail.com>.
bq.  I just dont find this "hbase.zookeeper.property.tickTime" anywhere in
the code base.

Neither do I. Mind filing a JIRA to correct this in troubleshooting.xml ?

bq.  increase tickTime in zoo.cfg?

For shared zookeeper quorum, the above should be done.

On Wed, Jun 5, 2013 at 5:45 PM, Ameya Kantikar <am...@groupon.com> wrote:

> One more thing. I just dont find this "hbase.zookeeper.property.tickTime"
> anywhere in the code base.
> Also, I could not find ZooKeeper API that takes tickTime from client.
>
> http://zookeeper.apache.org/doc/r3.3.3/api/org/apache/zookeeper/ZooKeeper.html
> It takes sessionTime out value, but not tickTime.
>
> Is this even relevant anymore? hbase.zookeeper.property.tickTime ?
>
> So whats the solution, increase tickTime in zoo.cfg? (and not
> hbase.zookeeper.property.tickTime
> in hbase-site.xml?)
>
> Ameya
>
>
> On Wed, Jun 5, 2013 at 3:18 PM, Ameya Kantikar <am...@groupon.com> wrote:
>
> > Which tickTime is honored?
> >
> > One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?
> >
> > My understanding now is, whichever tickTime is honored, session time can
> > not be more than 20 times the value.
> >
> > I think this is whats happening on my cluster:
> >
> > My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
> > value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
> > uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for
> my
> > RS sessions.
> >
> > I'll try increasing hbase.zookeeper.property.tickTime value in
> > hbase-site.xml and will monitor my cluster over next few days.
> >
> > Thanks Kevin & Ted for your help.
> >
> > Ameya
> >
> >
> >
> >
> > On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> bq. I thought this property in hbase-site.xml takes care of that:
> >> zookeeper.session.timeout
> >>
> >> From
> >>
> >>
> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
> >> :
> >>
> >> The client sends a requested timeout, the server responds with the
> timeout
> >> that it can give the client. The current implementation requires that
> the
> >> timeout be a minimum of 2 times the tickTime (as set in the server
> >> configuration) and a maximum of 20 times the tickTime. The ZooKeeper
> >> client
> >> API allows access to the negotiated timeout.
> >> The above means the shared zookeeper quorum may return timeout value
> >> different from that of zookeeper.session.timeout
> >>
> >> Cheers
> >>
> >> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <am...@groupon.com>
> wrote:
> >>
> >> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks
> >> like:
> >> >
> >> > tickTime=2000
> >> > initLimit=10
> >> > syncLimit=5
> >> >
> >> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
> >> > increasing this value from zoo.cfg.
> >> > However is it possible to set this value cluster specific?
> >> > I thought this property in hbase-site.xml takes care of that:
> >> > zookeeper.session.timeout
> >> >
> >> >
> >> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <
> kevin.odell@cloudera.com
> >> > >wrote:
> >> >
> >> > > Ameya,
> >> > >
> >> > >   What does your zoo.cfg say for your timeout value?
> >> > >
> >> > >
> >> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > We have heavy map reduce write jobs running against our cluster.
> >> Every
> >> > > once
> >> > > > in a while, we see a region server going down.
> >> > > >
> >> > > > We are on : 0.94.2-cdh4.2.0, r
> >> > > >
> >> > > > We have done some tuning for heavy map reduce jobs, and have
> >> increased
> >> > > > scanner timeouts, lease timeouts, have also tuned memstore as
> >> follows:
> >> > > >
> >> > > > hbase.hregion.memstore.block.multiplier: 4
> >> > > > hbase.hregion.memstore.flush.size: 134217728
> >> > > > hbase.hstore.blockingStoreFiles: 100
> >> > > >
> >> > > > So now, we are still facing issues. Looking at the logs it looks
> >> like
> >> > due
> >> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows
> >> on
> >> > > > hbase-sie.xml:
> >> > > >
> >> > > > zookeeper.session.timeout: 300000
> >> > > > hbase.zookeeper.property.tickTime: 6000
> >> > > >
> >> > > >
> >> > > > The actual log looks like:
> >> > > >
> >> > > >
> >> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> >> > > > (responseTooSlow):
> >> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
> >> rpc
> >> > > > version=1, client version=29,
> >> methodsFingerPrint=54742778","client":"
> >> > > > 10.20.73.65:41721
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> >> > > >
> >> > > > 2013-06-05 11:46:54,988 INFO
> >> org.apache.hadoop.io.compress.CodecPool:
> >> > Got
> >> > > > brand-new decompressor [.snappy]
> >> > > >
> >> > > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> >> > > > DFSOutputStream ResponseProcessor exception  for block
> >> > > >
> >> BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> >> > > > java.io.EOFException: Premature EOF: no length prefix available
> >> > > >         at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> >> > > >         at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> >> > > >         at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> >> > > >
> >> > > > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper:
> >> *We
> >> > > > slept 48686ms instead of 3000ms*, this is likely due to a long
> >> garbage
> >> > > > collecting pause and it's usually bad, see
> >> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> >> > > >
> >> > > > 2013-06-05 11:48:03,094 FATAL
> >> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
> region
> >> > > server
> >> > > > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled
> >> exception:
> >> > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected;
> >> > > > currently processing
> >> smartdeals-hbase14-snc1.snc1,60020,1370373396890
> >> > as
> >> > > > dead server
> >> > > >
> >> > > > (Not sure why it says 3000ms when we have timeout at 300000ms)
> >> > > >
> >> > > > We have done some GC tuning as well. Wondering what I can tune
> from
> >> > > making
> >> > > > RS going down? Any ideas?
> >> > > > This is batch heavy cluster, and we care less about read latency.
> We
> >> > can
> >> > > > increase RAM bit more but not much (Already RS has 20GB memory)
> >> > > >
> >> > > > Thanks in advance.
> >> > > >
> >> > > > Ameya
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Kevin O'Dell
> >> > > Systems Engineer, Cloudera
> >> > >
> >> >
> >>
> >
> >
>

Re: Region servers going down under heavy write load

Posted by Ameya Kantikar <am...@groupon.com>.
One more thing. I just dont find this "hbase.zookeeper.property.tickTime"
anywhere in the code base.
Also, I could not find ZooKeeper API that takes tickTime from client.
http://zookeeper.apache.org/doc/r3.3.3/api/org/apache/zookeeper/ZooKeeper.html
It takes sessionTime out value, but not tickTime.

Is this even relevant anymore? hbase.zookeeper.property.tickTime ?

So whats the solution, increase tickTime in zoo.cfg? (and not
hbase.zookeeper.property.tickTime
in hbase-site.xml?)

Ameya


On Wed, Jun 5, 2013 at 3:18 PM, Ameya Kantikar <am...@groupon.com> wrote:

> Which tickTime is honored?
>
> One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?
>
> My understanding now is, whichever tickTime is honored, session time can
> not be more than 20 times the value.
>
> I think this is whats happening on my cluster:
>
> My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
> value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
> uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for my
> RS sessions.
>
> I'll try increasing hbase.zookeeper.property.tickTime value in
> hbase-site.xml and will monitor my cluster over next few days.
>
> Thanks Kevin & Ted for your help.
>
> Ameya
>
>
>
>
> On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. I thought this property in hbase-site.xml takes care of that:
>> zookeeper.session.timeout
>>
>> From
>>
>> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
>> :
>>
>> The client sends a requested timeout, the server responds with the timeout
>> that it can give the client. The current implementation requires that the
>> timeout be a minimum of 2 times the tickTime (as set in the server
>> configuration) and a maximum of 20 times the tickTime. The ZooKeeper
>> client
>> API allows access to the negotiated timeout.
>> The above means the shared zookeeper quorum may return timeout value
>> different from that of zookeeper.session.timeout
>>
>> Cheers
>>
>> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <am...@groupon.com> wrote:
>>
>> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks
>> like:
>> >
>> > tickTime=2000
>> > initLimit=10
>> > syncLimit=5
>> >
>> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
>> > increasing this value from zoo.cfg.
>> > However is it possible to set this value cluster specific?
>> > I thought this property in hbase-site.xml takes care of that:
>> > zookeeper.session.timeout
>> >
>> >
>> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <kevin.odell@cloudera.com
>> > >wrote:
>> >
>> > > Ameya,
>> > >
>> > >   What does your zoo.cfg say for your timeout value?
>> > >
>> > >
>> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We have heavy map reduce write jobs running against our cluster.
>> Every
>> > > once
>> > > > in a while, we see a region server going down.
>> > > >
>> > > > We are on : 0.94.2-cdh4.2.0, r
>> > > >
>> > > > We have done some tuning for heavy map reduce jobs, and have
>> increased
>> > > > scanner timeouts, lease timeouts, have also tuned memstore as
>> follows:
>> > > >
>> > > > hbase.hregion.memstore.block.multiplier: 4
>> > > > hbase.hregion.memstore.flush.size: 134217728
>> > > > hbase.hstore.blockingStoreFiles: 100
>> > > >
>> > > > So now, we are still facing issues. Looking at the logs it looks
>> like
>> > due
>> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows
>> on
>> > > > hbase-sie.xml:
>> > > >
>> > > > zookeeper.session.timeout: 300000
>> > > > hbase.zookeeper.property.tickTime: 6000
>> > > >
>> > > >
>> > > > The actual log looks like:
>> > > >
>> > > >
>> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
>> rpc
>> > > > version=1, client version=29,
>> methodsFingerPrint=54742778","client":"
>> > > > 10.20.73.65:41721
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
>> > > >
>> > > > 2013-06-05 11:46:54,988 INFO
>> org.apache.hadoop.io.compress.CodecPool:
>> > Got
>> > > > brand-new decompressor [.snappy]
>> > > >
>> > > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > > DFSOutputStream ResponseProcessor exception  for block
>> > > >
>> BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
>> > > > java.io.EOFException: Premature EOF: no length prefix available
>> > > >         at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
>> > > >         at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
>> > > >         at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
>> > > >
>> > > > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper:
>> *We
>> > > > slept 48686ms instead of 3000ms*, this is likely due to a long
>> garbage
>> > > > collecting pause and it's usually bad, see
>> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > > >
>> > > > 2013-06-05 11:48:03,094 FATAL
>> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> > > server
>> > > > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled
>> exception:
>> > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
>> > > > currently processing
>> smartdeals-hbase14-snc1.snc1,60020,1370373396890
>> > as
>> > > > dead server
>> > > >
>> > > > (Not sure why it says 3000ms when we have timeout at 300000ms)
>> > > >
>> > > > We have done some GC tuning as well. Wondering what I can tune from
>> > > making
>> > > > RS going down? Any ideas?
>> > > > This is batch heavy cluster, and we care less about read latency. We
>> > can
>> > > > increase RAM bit more but not much (Already RS has 20GB memory)
>> > > >
>> > > > Thanks in advance.
>> > > >
>> > > > Ameya
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Kevin O'Dell
>> > > Systems Engineer, Cloudera
>> > >
>> >
>>
>
>

Re: Region servers going down under heavy write load

Posted by Ameya Kantikar <am...@groupon.com>.
Which tickTime is honored?

One in zoo.cfg or hbase.zookeeper.property.tickTime in hbase-site.xml?

My understanding now is, whichever tickTime is honored, session time can
not be more than 20 times the value.

I think this is whats happening on my cluster:

My hbase.zookeeper.property.tickTime value is 6000 ms. However my timeout
value is 300000 ms which is outside of 20 times tickTime. Hence ZooKeeper
uses its syncLimit of 5, to generate 6000*5 = 30000 as timeout value for my
RS sessions.

I'll try increasing hbase.zookeeper.property.tickTime value in
hbase-site.xml and will monitor my cluster over next few days.

Thanks Kevin & Ted for your help.

Ameya




On Wed, Jun 5, 2013 at 2:45 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. I thought this property in hbase-site.xml takes care of that:
> zookeeper.session.timeout
>
> From
>
> http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions
> :
>
> The client sends a requested timeout, the server responds with the timeout
> that it can give the client. The current implementation requires that the
> timeout be a minimum of 2 times the tickTime (as set in the server
> configuration) and a maximum of 20 times the tickTime. The ZooKeeper client
> API allows access to the negotiated timeout.
> The above means the shared zookeeper quorum may return timeout value
> different from that of zookeeper.session.timeout
>
> Cheers
>
> On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <am...@groupon.com> wrote:
>
> > In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:
> >
> > tickTime=2000
> > initLimit=10
> > syncLimit=5
> >
> > We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
> > increasing this value from zoo.cfg.
> > However is it possible to set this value cluster specific?
> > I thought this property in hbase-site.xml takes care of that:
> > zookeeper.session.timeout
> >
> >
> > On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <kevin.odell@cloudera.com
> > >wrote:
> >
> > > Ameya,
> > >
> > >   What does your zoo.cfg say for your timeout value?
> > >
> > >
> > > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have heavy map reduce write jobs running against our cluster.
> Every
> > > once
> > > > in a while, we see a region server going down.
> > > >
> > > > We are on : 0.94.2-cdh4.2.0, r
> > > >
> > > > We have done some tuning for heavy map reduce jobs, and have
> increased
> > > > scanner timeouts, lease timeouts, have also tuned memstore as
> follows:
> > > >
> > > > hbase.hregion.memstore.block.multiplier: 4
> > > > hbase.hregion.memstore.flush.size: 134217728
> > > > hbase.hstore.blockingStoreFiles: 100
> > > >
> > > > So now, we are still facing issues. Looking at the logs it looks like
> > due
> > > > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > > > hbase-sie.xml:
> > > >
> > > > zookeeper.session.timeout: 300000
> > > > hbase.zookeeper.property.tickTime: 6000
> > > >
> > > >
> > > > The actual log looks like:
> > > >
> > > >
> > > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000),
> rpc
> > > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > > 10.20.73.65:41721
> > > >
> > > >
> > >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> > > >
> > > > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new decompressor [.snappy]
> > > >
> > > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > > > DFSOutputStream ResponseProcessor exception  for block
> > > > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > > > java.io.EOFException: Premature EOF: no length prefix available
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> > > >
> > > > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper:
> *We
> > > > slept 48686ms instead of 3000ms*, this is likely due to a long
> garbage
> > > > collecting pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > >
> > > > 2013-06-05 11:48:03,094 FATAL
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > server
> > > > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled
> exception:
> > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > > currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890
> > as
> > > > dead server
> > > >
> > > > (Not sure why it says 3000ms when we have timeout at 300000ms)
> > > >
> > > > We have done some GC tuning as well. Wondering what I can tune from
> > > making
> > > > RS going down? Any ideas?
> > > > This is batch heavy cluster, and we care less about read latency. We
> > can
> > > > increase RAM bit more but not much (Already RS has 20GB memory)
> > > >
> > > > Thanks in advance.
> > > >
> > > > Ameya
> > > >
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Systems Engineer, Cloudera
> > >
> >
>

Re: Region servers going down under heavy write load

Posted by Ted Yu <yu...@gmail.com>.
bq. I thought this property in hbase-site.xml takes care of that:
zookeeper.session.timeout

From
http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions:

The client sends a requested timeout, the server responds with the timeout
that it can give the client. The current implementation requires that the
timeout be a minimum of 2 times the tickTime (as set in the server
configuration) and a maximum of 20 times the tickTime. The ZooKeeper client
API allows access to the negotiated timeout.
The above means the shared zookeeper quorum may return timeout value
different from that of zookeeper.session.timeout

Cheers

On Wed, Jun 5, 2013 at 2:34 PM, Ameya Kantikar <am...@groupon.com> wrote:

> In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:
>
> tickTime=2000
> initLimit=10
> syncLimit=5
>
> We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
> increasing this value from zoo.cfg.
> However is it possible to set this value cluster specific?
> I thought this property in hbase-site.xml takes care of that:
> zookeeper.session.timeout
>
>
> On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
>
> > Ameya,
> >
> >   What does your zoo.cfg say for your timeout value?
> >
> >
> > On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com>
> wrote:
> >
> > > Hi,
> > >
> > > We have heavy map reduce write jobs running against our cluster. Every
> > once
> > > in a while, we see a region server going down.
> > >
> > > We are on : 0.94.2-cdh4.2.0, r
> > >
> > > We have done some tuning for heavy map reduce jobs, and have increased
> > > scanner timeouts, lease timeouts, have also tuned memstore as follows:
> > >
> > > hbase.hregion.memstore.block.multiplier: 4
> > > hbase.hregion.memstore.flush.size: 134217728
> > > hbase.hstore.blockingStoreFiles: 100
> > >
> > > So now, we are still facing issues. Looking at the logs it looks like
> due
> > > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > > hbase-sie.xml:
> > >
> > > zookeeper.session.timeout: 300000
> > > hbase.zookeeper.property.tickTime: 6000
> > >
> > >
> > > The actual log looks like:
> > >
> > >
> > > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":13468,"call":"next(6723331143689528698, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.73.65:41721
> > >
> > >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> > >
> > > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new decompressor [.snappy]
> > >
> > > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DFSOutputStream ResponseProcessor exception  for block
> > > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > > java.io.EOFException: Premature EOF: no length prefix available
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> > >
> > > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper: *We
> > > slept 48686ms instead of 3000ms*, this is likely due to a long garbage
> > > collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >
> > > 2013-06-05 11:48:03,094 FATAL
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server
> > > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled exception:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890
> as
> > > dead server
> > >
> > > (Not sure why it says 3000ms when we have timeout at 300000ms)
> > >
> > > We have done some GC tuning as well. Wondering what I can tune from
> > making
> > > RS going down? Any ideas?
> > > This is batch heavy cluster, and we care less about read latency. We
> can
> > > increase RAM bit more but not much (Already RS has 20GB memory)
> > >
> > > Thanks in advance.
> > >
> > > Ameya
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Systems Engineer, Cloudera
> >
>

Re: Region servers going down under heavy write load

Posted by Ameya Kantikar <am...@groupon.com>.
In zoo.cfg I have not setup this value explicitly. My zoo.cfg looks like:

tickTime=2000
initLimit=10
syncLimit=5

We use common zoo keeper cluster for 2 of our HBase clusters. I'll try
increasing this value from zoo.cfg.
However is it possible to set this value cluster specific?
I thought this property in hbase-site.xml takes care of that:
zookeeper.session.timeout


On Wed, Jun 5, 2013 at 1:49 PM, Kevin O'dell <ke...@cloudera.com>wrote:

> Ameya,
>
>   What does your zoo.cfg say for your timeout value?
>
>
> On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com> wrote:
>
> > Hi,
> >
> > We have heavy map reduce write jobs running against our cluster. Every
> once
> > in a while, we see a region server going down.
> >
> > We are on : 0.94.2-cdh4.2.0, r
> >
> > We have done some tuning for heavy map reduce jobs, and have increased
> > scanner timeouts, lease timeouts, have also tuned memstore as follows:
> >
> > hbase.hregion.memstore.block.multiplier: 4
> > hbase.hregion.memstore.flush.size: 134217728
> > hbase.hstore.blockingStoreFiles: 100
> >
> > So now, we are still facing issues. Looking at the logs it looks like due
> > to zoo keeper timeout. We have tuned zookeeper settings as follows on
> > hbase-sie.xml:
> >
> > zookeeper.session.timeout: 300000
> > hbase.zookeeper.property.tickTime: 6000
> >
> >
> > The actual log looks like:
> >
> >
> > 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":13468,"call":"next(6723331143689528698, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.73.65:41721
> >
> >
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
> >
> > 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor [.snappy]
> >
> > 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception  for block
> > BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> > java.io.EOFException: Premature EOF: no length prefix available
> >         at
> >
> >
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> >         at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> >
> > 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper: *We
> > slept 48686ms instead of 3000ms*, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> >
> > 2013-06-05 11:48:03,094 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
> > smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled exception:
> > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890 as
> > dead server
> >
> > (Not sure why it says 3000ms when we have timeout at 300000ms)
> >
> > We have done some GC tuning as well. Wondering what I can tune from
> making
> > RS going down? Any ideas?
> > This is batch heavy cluster, and we care less about read latency. We can
> > increase RAM bit more but not much (Already RS has 20GB memory)
> >
> > Thanks in advance.
> >
> > Ameya
> >
>
>
>
> --
> Kevin O'Dell
> Systems Engineer, Cloudera
>

Re: Region servers going down under heavy write load

Posted by Kevin O'dell <ke...@cloudera.com>.
Ameya,

  What does your zoo.cfg say for your timeout value?


On Wed, Jun 5, 2013 at 4:47 PM, Ameya Kantikar <am...@groupon.com> wrote:

> Hi,
>
> We have heavy map reduce write jobs running against our cluster. Every once
> in a while, we see a region server going down.
>
> We are on : 0.94.2-cdh4.2.0, r
>
> We have done some tuning for heavy map reduce jobs, and have increased
> scanner timeouts, lease timeouts, have also tuned memstore as follows:
>
> hbase.hregion.memstore.block.multiplier: 4
> hbase.hregion.memstore.flush.size: 134217728
> hbase.hstore.blockingStoreFiles: 100
>
> So now, we are still facing issues. Looking at the logs it looks like due
> to zoo keeper timeout. We have tuned zookeeper settings as follows on
> hbase-sie.xml:
>
> zookeeper.session.timeout: 300000
> hbase.zookeeper.property.tickTime: 6000
>
>
> The actual log looks like:
>
>
> 2013-06-05 11:46:40,405 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":13468,"call":"next(6723331143689528698, 1000), rpc
> version=1, client version=29, methodsFingerPrint=54742778","client":"
> 10.20.73.65:41721
>
> ","starttimems":1370432786933,"queuetimems":1,"class":"HRegionServer","responsesize":39611416,"method":"next"}
>
> 2013-06-05 11:46:54,988 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor [.snappy]
>
> 2013-06-05 11:48:03,017 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-53741567-10.20.73.56-1351630463427:blk_9026156240355850298_8775246
> java.io.EOFException: Premature EOF: no length prefix available
>         at
>
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
>         at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
>         at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
>
> 2013-06-05 11:48:03,020 WARN org.apache.hadoop.hbase.util.Sleeper: *We
> slept 48686ms instead of 3000ms*, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2013-06-05 11:48:03,094 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> smartdeals-hbase14-snc1.snc1,60020,1370373396890: Unhandled exception:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing smartdeals-hbase14-snc1.snc1,60020,1370373396890 as
> dead server
>
> (Not sure why it says 3000ms when we have timeout at 300000ms)
>
> We have done some GC tuning as well. Wondering what I can tune from making
> RS going down? Any ideas?
> This is batch heavy cluster, and we care less about read latency. We can
> increase RAM bit more but not much (Already RS has 20GB memory)
>
> Thanks in advance.
>
> Ameya
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera