You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Salabhanjika S <sa...@gmail.com> on 2014/03/14 08:12:26 UTC

Region server slowdown

Devs,

We are using hbase version 0.90.6 (please don't complain of old
version. we are in process of upgrading) in our production and we are
noticing a strange problem arbitrarily for every few weeks. Region
server goes extremely slow.
We have to restart Region Server once this happens. There is no unique
pattern of this problem. This happens on different region servers,
different tables/regions and different times.

Here are observations & findings from our analysis.
- We are using LZO compression (0.4.10).

- [RS Dashboard] Flush is running for more than 6 hours. It is in
"creating writer" status for long time. Other previous flushes (600MB
to 1.5GB) takes

- [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
thread is in same state Configuration.loadResource
"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
nid=0x35e9 runnable [0x00007efcad9c5000]
   java.lang.Thread.State: RUNNABLE
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
    - locked <0x00007f02ccc2ef78> (a
sun.net.www.protocol.file.FileURLConnection)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
    ... [cutting down some stack to keep mail compact. all this stack
is in com.sun.org.apache.xerces...]
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
    at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
    at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
    at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)

Any leads on this please?

-S

Re: Region server slowdown

Posted by Salabhanjika S <sa...@gmail.com>.

My bad.
> - I strongly feel this issue has something to do with HBase version. I
> verified the code paths of the stack I posted.

Read this as "I DON'T feel this issue has something to do with HBase version."

On Tue, Mar 18, 2014 at 10:12 AM, Salabhanjika S
<sa...@gmail.com> wrote:
> Thanks Rodinov & Enis for responding. I agree with you that we need to upgrade.
>
> As I mentioned in my first mail, we are in process of upgrade.
>>> >>> We are using hbase version 0.90.6 (please don't complain of old
>>> >>> version. we are in process of upgrading)
>
> - Suboptimal (as per me) code snippets I posted in followup mail holds
> good for trunk as well.
>
> - I strongly feel this issue has something to do with HBase version. I
> verified the code paths of the stack I posted.
> I don't see any significant changes in current version in this code
> (Flusher - getCompressor).
>
>
> On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar <en...@gmail.com> wrote:
>> Hi
>>
>> Agreed with Vladimir. I doubt anybody will spend the time to debug the
>> issue. It would be easier if you can upgrade your HBase cluster. Also you
>> will have to upgrade your Hadoop cluster as well. You should go with
>> 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
>> for the upgrade process.
>>
>> Enis
>>
>>
>> On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov <vrodionov@carrieriq.com
>>> wrote:
>>
>>> I think, 0.90.6 has reached EOL a couple years ago. The best you can do
>>> right now is
>>> start planning upgrading to the latest stable 0.94 or 0.96.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>> ________________________________________
>>> From: Salabhanjika S [salabhanjika9@gmail.com]
>>> Sent: Monday, March 17, 2014 2:55 AM
>>> To: dev@hbase.apache.org
>>> Subject: Re: Region server slowdown
>>>
>>> @Devs, please respond if you can provide me some hints on this problem.
>>>
>>> Did some more analysis. While going through the code in stack track I
>>> noticed something sub-optimal.
>>> This may not be a root cause of our slowdown but I felt it may be some
>>> thing worthy to optimize/fix.
>>>
>>> HBase is making a call to Compressor *WITHOUT* config object. This is
>>> resulting in configuration reload for every call.
>>> Should this be calling with existing config object as a parameter so
>>> that configuration reload (discovery & xml parsing) will not happen so
>>> frequently?
>>>
>>>
>>> http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
>>> {code}
>>> 309 public Compressor getCompressor() {
>>> 310 CompressionCodec codec = getCodec(conf);
>>> 311 if (codec != null) {
>>> 312 Compressor compressor = CodecPool.getCompressor(codec);
>>> 313 if (compressor != null) {
>>> {code}
>>>
>>>
>>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
>>> {code}
>>> 162 public static Compressor getCompressor(CompressionCodec codec) {
>>> 163 return getCompressor(codec, null);
>>> 164 }
>>> {code}
>>>
>>> On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <sa...@gmail.com>
>>> wrote:
>>> > Thanks for quick response Ted.
>>> >
>>> > - Hadoop version is 0.20.2
>>> > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>>> >
>>> > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
>>> >> What Hadoop version are you using ?
>>> >>
>>> >> Btw, the sentence about previous flushes was incomplete.
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com>
>>> wrote:
>>> >>
>>> >>> Devs,
>>> >>>
>>> >>> We are using hbase version 0.90.6 (please don't complain of old
>>> >>> version. we are in process of upgrading) in our production and we are
>>> >>> noticing a strange problem arbitrarily for every few weeks. Region
>>> >>> server goes extremely slow.
>>> >>> We have to restart Region Server once this happens. There is no unique
>>> >>> pattern of this problem. This happens on different region servers,
>>> >>> different tables/regions and different times.
>>> >>>
>>> >>> Here are observations & findings from our analysis.
>>> >>> - We are using LZO compression (0.4.10).
>>> >>>
>>> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>>> >>> "creating writer" status for long time. Other previous flushes (600MB
>>> >>> to 1.5GB) takes
>>> >>>
>>> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>>> >>> thread is in same state Configuration.loadResource
>>> >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>>> >>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>> >>>   java.lang.Thread.State: RUNNABLE
>>> >>>    at
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>> >>>    at
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>> >>>    - locked <0x00007f02ccc2ef78> (a
>>> >>> sun.net.www.protocol.file.FileURLConnection)
>>> >>>    at
>>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>> >>>    ... [cutting down some stack to keep mail compact. all this stack
>>> >>> is in com.sun.org.apache.xerces...]
>>> >>>    at
>>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>> >>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>> >>>    - locked <0x00007f014f1543b8> (a
>>> org.apache.hadoop.conf.Configuration)
>>> >>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>> >>>    at
>>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>> >>>    at
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>> >>>    at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>> >>>    at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>> >>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>> >>>
>>> >>> Any leads on this please?
>>> >>>
>>> >>> -S
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>

Re: Region server slowdown

Posted by Salabhanjika S <sa...@gmail.com>.

Thanks Rodinov & Enis for responding. I agree with you that we need to upgrade.

As I mentioned in my first mail, we are in process of upgrade.
>> >>> We are using hbase version 0.90.6 (please don't complain of old
>> >>> version. we are in process of upgrading)

- Suboptimal (as per me) code snippets I posted in followup mail holds
good for trunk as well.

- I strongly feel this issue has something to do with HBase version. I
verified the code paths of the stack I posted.
I don't see any significant changes in current version in this code
(Flusher - getCompressor).


On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar <en...@gmail.com> wrote:
> Hi
>
> Agreed with Vladimir. I doubt anybody will spend the time to debug the
> issue. It would be easier if you can upgrade your HBase cluster. Also you
> will have to upgrade your Hadoop cluster as well. You should go with
> 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
> for the upgrade process.
>
> Enis
>
>
> On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov <vrodionov@carrieriq.com
>> wrote:
>
>> I think, 0.90.6 has reached EOL a couple years ago. The best you can do
>> right now is
>> start planning upgrading to the latest stable 0.94 or 0.96.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>> ________________________________________
>> From: Salabhanjika S [salabhanjika9@gmail.com]
>> Sent: Monday, March 17, 2014 2:55 AM
>> To: dev@hbase.apache.org
>> Subject: Re: Region server slowdown
>>
>> @Devs, please respond if you can provide me some hints on this problem.
>>
>> Did some more analysis. While going through the code in stack track I
>> noticed something sub-optimal.
>> This may not be a root cause of our slowdown but I felt it may be some
>> thing worthy to optimize/fix.
>>
>> HBase is making a call to Compressor *WITHOUT* config object. This is
>> resulting in configuration reload for every call.
>> Should this be calling with existing config object as a parameter so
>> that configuration reload (discovery & xml parsing) will not happen so
>> frequently?
>>
>>
>> http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
>> {code}
>> 309 public Compressor getCompressor() {
>> 310 CompressionCodec codec = getCodec(conf);
>> 311 if (codec != null) {
>> 312 Compressor compressor = CodecPool.getCompressor(codec);
>> 313 if (compressor != null) {
>> {code}
>>
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
>> {code}
>> 162 public static Compressor getCompressor(CompressionCodec codec) {
>> 163 return getCompressor(codec, null);
>> 164 }
>> {code}
>>
>> On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <sa...@gmail.com>
>> wrote:
>> > Thanks for quick response Ted.
>> >
>> > - Hadoop version is 0.20.2
>> > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>> >
>> > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
>> >> What Hadoop version are you using ?
>> >>
>> >> Btw, the sentence about previous flushes was incomplete.
>> >>
>> >> Cheers
>> >>
>> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com>
>> wrote:
>> >>
>> >>> Devs,
>> >>>
>> >>> We are using hbase version 0.90.6 (please don't complain of old
>> >>> version. we are in process of upgrading) in our production and we are
>> >>> noticing a strange problem arbitrarily for every few weeks. Region
>> >>> server goes extremely slow.
>> >>> We have to restart Region Server once this happens. There is no unique
>> >>> pattern of this problem. This happens on different region servers,
>> >>> different tables/regions and different times.
>> >>>
>> >>> Here are observations & findings from our analysis.
>> >>> - We are using LZO compression (0.4.10).
>> >>>
>> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>> >>> "creating writer" status for long time. Other previous flushes (600MB
>> >>> to 1.5GB) takes
>> >>>
>> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>> >>> thread is in same state Configuration.loadResource
>> >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>> >>> nid=0x35e9 runnable [0x00007efcad9c5000]
>> >>>   java.lang.Thread.State: RUNNABLE
>> >>>    at
>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>> >>>    at
>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>> >>>    - locked <0x00007f02ccc2ef78> (a
>> >>> sun.net.www.protocol.file.FileURLConnection)
>> >>>    at
>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>> >>>    ... [cutting down some stack to keep mail compact. all this stack
>> >>> is in com.sun.org.apache.xerces...]
>> >>>    at
>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>> >>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>> >>>    at
>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>> >>>    - locked <0x00007f014f1543b8> (a
>> org.apache.hadoop.conf.Configuration)
>> >>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>> >>>    at
>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>> >>>    at
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>> >>>    at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>> >>>    at
>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>> >>>    at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>> >>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>> >>>    at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>> >>>
>> >>> Any leads on this please?
>> >>>
>> >>> -S
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>

Re: Region server slowdown

Posted by Enis Söztutar <en...@gmail.com>.

Hi

Agreed with Vladimir. I doubt anybody will spend the time to debug the
issue. It would be easier if you can upgrade your HBase cluster. Also you
will have to upgrade your Hadoop cluster as well. You should go with
0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
for the upgrade process.

Enis


On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov <vrodionov@carrieriq.com
> wrote:

> I think, 0.90.6 has reached EOL a couple years ago. The best you can do
> right now is
> start planning upgrading to the latest stable 0.94 or 0.96.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Salabhanjika S [salabhanjika9@gmail.com]
> Sent: Monday, March 17, 2014 2:55 AM
> To: dev@hbase.apache.org
> Subject: Re: Region server slowdown
>
> @Devs, please respond if you can provide me some hints on this problem.
>
> Did some more analysis. While going through the code in stack track I
> noticed something sub-optimal.
> This may not be a root cause of our slowdown but I felt it may be some
> thing worthy to optimize/fix.
>
> HBase is making a call to Compressor *WITHOUT* config object. This is
> resulting in configuration reload for every call.
> Should this be calling with existing config object as a parameter so
> that configuration reload (discovery & xml parsing) will not happen so
> frequently?
>
>
> http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
> {code}
> 309 public Compressor getCompressor() {
> 310 CompressionCodec codec = getCodec(conf);
> 311 if (codec != null) {
> 312 Compressor compressor = CodecPool.getCompressor(codec);
> 313 if (compressor != null) {
> {code}
>
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
> {code}
> 162 public static Compressor getCompressor(CompressionCodec codec) {
> 163 return getCompressor(codec, null);
> 164 }
> {code}
>
> On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <sa...@gmail.com>
> wrote:
> > Thanks for quick response Ted.
> >
> > - Hadoop version is 0.20.2
> > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
> >
> > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
> >> What Hadoop version are you using ?
> >>
> >> Btw, the sentence about previous flushes was incomplete.
> >>
> >> Cheers
> >>
> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com>
> wrote:
> >>
> >>> Devs,
> >>>
> >>> We are using hbase version 0.90.6 (please don't complain of old
> >>> version. we are in process of upgrading) in our production and we are
> >>> noticing a strange problem arbitrarily for every few weeks. Region
> >>> server goes extremely slow.
> >>> We have to restart Region Server once this happens. There is no unique
> >>> pattern of this problem. This happens on different region servers,
> >>> different tables/regions and different times.
> >>>
> >>> Here are observations & findings from our analysis.
> >>> - We are using LZO compression (0.4.10).
> >>>
> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
> >>> "creating writer" status for long time. Other previous flushes (600MB
> >>> to 1.5GB) takes
> >>>
> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
> >>> thread is in same state Configuration.loadResource
> >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
> >>> nid=0x35e9 runnable [0x00007efcad9c5000]
> >>>   java.lang.Thread.State: RUNNABLE
> >>>    at
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
> >>>    at
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> >>>    - locked <0x00007f02ccc2ef78> (a
> >>> sun.net.www.protocol.file.FileURLConnection)
> >>>    at
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
> >>>    ... [cutting down some stack to keep mail compact. all this stack
> >>> is in com.sun.org.apache.xerces...]
> >>>    at
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
> >>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
> >>>    at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
> >>>    at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
> >>>    at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
> >>>    - locked <0x00007f014f1543b8> (a
> org.apache.hadoop.conf.Configuration)
> >>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
> >>>    at
> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
> >>>    at
> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
> >>>    at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
> >>>    at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
> >>>    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
> >>>    at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
> >>>    at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
> >>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
> >>>    at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
> >>>    at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
> >>>    at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
> >>>    at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
> >>>    at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
> >>>    at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
> >>>    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
> >>>    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
> >>>    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
> >>>
> >>> Any leads on this please?
> >>>
> >>> -S
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: Region server slowdown

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

I think, 0.90.6 has reached EOL a couple years ago. The best you can do right now is
start planning upgrading to the latest stable 0.94 or 0.96.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Salabhanjika S [salabhanjika9@gmail.com]
Sent: Monday, March 17, 2014 2:55 AM
To: dev@hbase.apache.org
Subject: Re: Region server slowdown

@Devs, please respond if you can provide me some hints on this problem.

Did some more analysis. While going through the code in stack track I
noticed something sub-optimal.
This may not be a root cause of our slowdown but I felt it may be some
thing worthy to optimize/fix.

HBase is making a call to Compressor *WITHOUT* config object. This is
resulting in configuration reload for every call.
Should this be calling with existing config object as a parameter so
that configuration reload (discovery & xml parsing) will not happen so
frequently?

http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
{code}
309 public Compressor getCompressor() {
310 CompressionCodec codec = getCodec(conf);
311 if (codec != null) {
312 Compressor compressor = CodecPool.getCompressor(codec);
313 if (compressor != null) {
{code}

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
{code}
162 public static Compressor getCompressor(CompressionCodec codec) {
163 return getCompressor(codec, null);
164 }
{code}

On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <sa...@gmail.com> wrote:
> Thanks for quick response Ted.
>
> - Hadoop version is 0.20.2
> - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>
> On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
>> What Hadoop version are you using ?
>>
>> Btw, the sentence about previous flushes was incomplete.
>>
>> Cheers
>>
>> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com> wrote:
>>
>>> Devs,
>>>
>>> We are using hbase version 0.90.6 (please don't complain of old
>>> version. we are in process of upgrading) in our production and we are
>>> noticing a strange problem arbitrarily for every few weeks. Region
>>> server goes extremely slow.
>>> We have to restart Region Server once this happens. There is no unique
>>> pattern of this problem. This happens on different region servers,
>>> different tables/regions and different times.
>>>
>>> Here are observations & findings from our analysis.
>>> - We are using LZO compression (0.4.10).
>>>
>>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>>> "creating writer" status for long time. Other previous flushes (600MB
>>> to 1.5GB) takes
>>>
>>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>>> thread is in same state Configuration.loadResource
>>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>>   java.lang.Thread.State: RUNNABLE
>>>    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>    - locked <0x00007f02ccc2ef78> (a
>>> sun.net.www.protocol.file.FileURLConnection)
>>>    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>>    ... [cutting down some stack to keep mail compact. all this stack
>>> is in com.sun.org.apache.xerces...]
>>>    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>>    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>>    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>>    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>>    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
>>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>>    at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>>    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>>    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>>    at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>>    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>>    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>>    at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>>
>>> Any leads on this please?
>>>
>>> -S

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Region server slowdown

Posted by Salabhanjika S <sa...@gmail.com>.

@Devs, please respond if you can provide me some hints on this problem.

Did some more analysis. While going through the code in stack track I
noticed something sub-optimal.
This may not be a root cause of our slowdown but I felt it may be some
thing worthy to optimize/fix.

HBase is making a call to Compressor *WITHOUT* config object. This is
resulting in configuration reload for every call.
Should this be calling with existing config object as a parameter so
that configuration reload (discovery & xml parsing) will not happen so
frequently?

http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
{code}
309 public Compressor getCompressor() {
310 CompressionCodec codec = getCodec(conf);
311 if (codec != null) {
312 Compressor compressor = CodecPool.getCompressor(codec);
313 if (compressor != null) {
{code}

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
{code}
162 public static Compressor getCompressor(CompressionCodec codec) {
163 return getCompressor(codec, null);
164 }
{code}

On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <sa...@gmail.com> wrote:
> Thanks for quick response Ted.
>
> - Hadoop version is 0.20.2
> - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>
> On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
>> What Hadoop version are you using ?
>>
>> Btw, the sentence about previous flushes was incomplete.
>>
>> Cheers
>>
>> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com> wrote:
>>
>>> Devs,
>>>
>>> We are using hbase version 0.90.6 (please don't complain of old
>>> version. we are in process of upgrading) in our production and we are
>>> noticing a strange problem arbitrarily for every few weeks. Region
>>> server goes extremely slow.
>>> We have to restart Region Server once this happens. There is no unique
>>> pattern of this problem. This happens on different region servers,
>>> different tables/regions and different times.
>>>
>>> Here are observations & findings from our analysis.
>>> - We are using LZO compression (0.4.10).
>>>
>>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>>> "creating writer" status for long time. Other previous flushes (600MB
>>> to 1.5GB) takes
>>>
>>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>>> thread is in same state Configuration.loadResource
>>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>>   java.lang.Thread.State: RUNNABLE
>>>    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>    - locked <0x00007f02ccc2ef78> (a
>>> sun.net.www.protocol.file.FileURLConnection)
>>>    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>>    ... [cutting down some stack to keep mail compact. all this stack
>>> is in com.sun.org.apache.xerces...]
>>>    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>>    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>>    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>>    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>>    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
>>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>>    at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>>    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>>    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>>    at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>>    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>>    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>>    at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>>    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>>
>>> Any leads on this please?
>>>
>>> -S

Re: Region server slowdown

Posted by Salabhanjika S <sa...@gmail.com>.

Thanks for quick response Ted.

- Hadoop version is 0.20.2
- Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds

On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yu...@gmail.com> wrote:
> What Hadoop version are you using ?
>
> Btw, the sentence about previous flushes was incomplete.
>
> Cheers
>
> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com> wrote:
>
>> Devs,
>>
>> We are using hbase version 0.90.6 (please don't complain of old
>> version. we are in process of upgrading) in our production and we are
>> noticing a strange problem arbitrarily for every few weeks. Region
>> server goes extremely slow.
>> We have to restart Region Server once this happens. There is no unique
>> pattern of this problem. This happens on different region servers,
>> different tables/regions and different times.
>>
>> Here are observations & findings from our analysis.
>> - We are using LZO compression (0.4.10).
>>
>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>> "creating writer" status for long time. Other previous flushes (600MB
>> to 1.5GB) takes
>>
>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>> thread is in same state Configuration.loadResource
>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>   java.lang.Thread.State: RUNNABLE
>>    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>    - locked <0x00007f02ccc2ef78> (a
>> sun.net.www.protocol.file.FileURLConnection)
>>    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>    ... [cutting down some stack to keep mail compact. all this stack
>> is in com.sun.org.apache.xerces...]
>>    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>    at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>    at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>    at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>
>> Any leads on this please?
>>
>> -S

Re: Region server slowdown

Posted by Vladimir Rodionov <vl...@gmail.com>.

0.90.6? Win 95?


On Fri, Mar 14, 2014 at 12:51 AM, Ted Yu <yu...@gmail.com> wrote:

> What Hadoop version are you using ?
>
> Btw, the sentence about previous flushes was incomplete.
>
> Cheers
>
> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com>
> wrote:
>
> > Devs,
> >
> > We are using hbase version 0.90.6 (please don't complain of old
> > version. we are in process of upgrading) in our production and we are
> > noticing a strange problem arbitrarily for every few weeks. Region
> > server goes extremely slow.
> > We have to restart Region Server once this happens. There is no unique
> > pattern of this problem. This happens on different region servers,
> > different tables/regions and different times.
> >
> > Here are observations & findings from our analysis.
> > - We are using LZO compression (0.4.10).
> >
> > - [RS Dashboard] Flush is running for more than 6 hours. It is in
> > "creating writer" status for long time. Other previous flushes (600MB
> > to 1.5GB) takes
> >
> > - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
> > thread is in same state Configuration.loadResource
> > "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
> > nid=0x35e9 runnable [0x00007efcad9c5000]
> >   java.lang.Thread.State: RUNNABLE
> >    at
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
> >    at
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> >    - locked <0x00007f02ccc2ef78> (a
> > sun.net.www.protocol.file.FileURLConnection)
> >    at
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
> >    ... [cutting down some stack to keep mail compact. all this stack
> > is in com.sun.org.apache.xerces...]
> >    at
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
> >    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
> >    at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
> >    at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
> >    at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
> >    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
> >    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
> >    at
> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
> >    at
> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
> >    at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
> >    at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
> >    at
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
> >    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
> >    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
> >    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
> >    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
> >    at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
> >    at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
> >    at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
> >    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
> >    at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
> >    at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
> >    at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
> >    at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
> >    at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
> >    at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
> >    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
> >    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
> >    at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
> >
> > Any leads on this please?
> >
> > -S
>

Re: Region server slowdown

Posted by Ted Yu <yu...@gmail.com>.

What Hadoop version are you using ?

Btw, the sentence about previous flushes was incomplete. 

Cheers

On Mar 14, 2014, at 12:12 AM, Salabhanjika S <sa...@gmail.com> wrote:

> Devs,
> 
> We are using hbase version 0.90.6 (please don't complain of old
> version. we are in process of upgrading) in our production and we are
> noticing a strange problem arbitrarily for every few weeks. Region
> server goes extremely slow.
> We have to restart Region Server once this happens. There is no unique
> pattern of this problem. This happens on different region servers,
> different tables/regions and different times.
> 
> Here are observations & findings from our analysis.
> - We are using LZO compression (0.4.10).
> 
> - [RS Dashboard] Flush is running for more than 6 hours. It is in
> "creating writer" status for long time. Other previous flushes (600MB
> to 1.5GB) takes
> 
> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
> thread is in same state Configuration.loadResource
> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
> nid=0x35e9 runnable [0x00007efcad9c5000]
>   java.lang.Thread.State: RUNNABLE
>    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>    - locked <0x00007f02ccc2ef78> (a
> sun.net.www.protocol.file.FileURLConnection)
>    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>    ... [cutting down some stack to keep mail compact. all this stack
> is in com.sun.org.apache.xerces...]
>    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>    - locked <0x00007f014f1543b8> (a org.apache.hadoop.conf.Configuration)
>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>    at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>    at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>    at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>    at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>    at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>    at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>    at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
> 
> Any leads on this please?
> 
> -S