You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Viral Bajaria <vi...@gmail.com> on 2013/05/16 11:16:56 UTC

GET performance degrades over time

Hi,

My setup is as follows:
24 regionservers (7GB RAM, 8-core CPU, 5GB heap space)
hbase 0.94.4
5-7 regions per regionserver

I am doing an avg of 4k-5k random gets per regionserver per second and the
performance is acceptable in the beginning. I have also done ~10K gets for
a single regionserver and got the results back in 600-800ms. After a while
the performance of the GETs starts degrading. The same ~10K random gets
start taking upwards of 9s-10s.

With regards to hbase settings that I have modified, I have disabled major
compaction, increase region size to 100G and bumped up the handler count to
100.

I monitored ganglia for metrics that vary when the performance shifts from
good to bad and found that the fsPreadLatency_avg_time is almost 25x in the
bad performing regionserver. fsReadLatency_avg_time is also slightly higher
but not that much (it's around 2x).

I took a thread dump of the regionserver process and also did CPU
utilization monitoring. The CPU cycles were being spent
on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads
running that function is below this email.

Any pointers on why positional reads degrade over time ? Or is this just an
issue of disk I/O and I should start looking into that ?

Thanks,
Viral

====stacktrace for one of the handler doing blockread====
"IPC Server handler 98 on 60020" - Thread t@147
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324)
- locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561)
- locked <74d81ea7> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

Re: GET performance degrades over time

Posted by Anoop John <an...@gmail.com>.

>Yes bloom filters have been enabled: ROWCOL
Can u try with ROW bloom?

-Anoop-


On Fri, May 17, 2013 at 12:20 PM, Viral Bajaria <vi...@gmail.com>wrote:

> Thanks for all the help in advance!
>
> Answers inline..
>
> Hi Viral,
> >
> > some questions:
> >
> >
> > Are you adding new data or deleting data over time?
> >
>
> Yes I am continuously adding new data. The puts have not slowed down but
> that could also be an after effect of deferred log flush.
>
>
> > Do you have bloom filters enabled?
> >
>
> Yes bloom filters have been enabled: ROWCOL
>
>
> > Which version of Hadoop?
> >
>
> Using 1.0.4
>
>
> > Anything funny the Datanode logs?
> >
>
> I haven't seen anything funny, not a lot of timeouts either but I will look
> into it more. For some reason my datanode metrics refused to show up in
> ganglia while regionserver metrics work fine.
>
> Thanks,
> Viral
>

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll <ph...@gmail.com> wrote:

> Look at how much Hard Disk utilization you have (IOPS / Svctm). You may
> just be under scaled for the QPS you desire for both read + write load. If
> you are performing random gets, you could expect around the low to mid
> 100's IOPS/sec per HDD. Use bonnie++ / IOZone / IOPing to verify.
>
> Also you could see how efficient your cache is (Saving Disk IOPS).
>

Thanks for the tips Jeremy. I have used bonnie++ to benchmark both the fast
and slow servers and the outputs of bonnie are very similar. I haven't
tried running bonnie++ when the load was high but I can try and do it later
today since I just restarted my load test again. It takes a few hours
before the performance starts degrading.

Regarding the IOPS/Svctm, I have run iostat for a while when performance
was bad and saw that the tps was pretty spiky. I have a striped RAID0 on my
4 disks and see the tps hovering anywhere between 100tps to 4000tps. Each
disk individually max's out at 1000 tps.

I checked another region server which handles almost equal amounts of data
but the rowkey size on that box is bigger by 8 bytes than the box that is
slow (fast server: rk is 24 bytes, cf is 1 byte, cq is 6 bytes, val can be
25 bytes to 1.5KB). That box shows tps of max 200 and the GETs that are
sent to that regionserver finish 10K requests in a second (not great but
acceptable).

Given the region size are almost same (off by 300MB), I am still not clear
what else to debug. Maybe I can try and split the region and see if that
speeds up things but I wanted to try that as my last option.

Thanks,
Viral

Re: GET performance degrades over time

Posted by Jeremy Carroll <ph...@gmail.com>.

Look at how much Hard Disk utilization you have (IOPS / Svctm). You may
just be under scaled for the QPS you desire for both read + write load. If
you are performing random gets, you could expect around the low to mid
100's IOPS/sec per HDD. Use bonnie++ / IOZone / IOPing to verify.

Also you could see how efficient your cache is (Saving Disk IOPS).

On Thu, May 16, 2013 at 11:50 PM, Viral Bajaria <vi...@gmail.com>wrote:

> Thanks for all the help in advance!
>
> Answers inline..
>
> Hi Viral,
> >
> > some questions:
> >
> >
> > Are you adding new data or deleting data over time?
> >
>
> Yes I am continuously adding new data. The puts have not slowed down but
> that could also be an after effect of deferred log flush.
>
>
> > Do you have bloom filters enabled?
> >
>
> Yes bloom filters have been enabled: ROWCOL
>
>
> > Which version of Hadoop?
> >
>
> Using 1.0.4
>
>
> > Anything funny the Datanode logs?
> >
>
> I haven't seen anything funny, not a lot of timeouts either but I will look
> into it more. For some reason my datanode metrics refused to show up in
> ganglia while regionserver metrics work fine.
>
> Thanks,
> Viral
>

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

Thanks for all the help in advance!

Answers inline..

Hi Viral,
>
> some questions:
>
>
> Are you adding new data or deleting data over time?
>

Yes I am continuously adding new data. The puts have not slowed down but
that could also be an after effect of deferred log flush.


> Do you have bloom filters enabled?
>

Yes bloom filters have been enabled: ROWCOL


> Which version of Hadoop?
>

Using 1.0.4


> Anything funny the Datanode logs?
>

I haven't seen anything funny, not a lot of timeouts either but I will look
into it more. For some reason my datanode metrics refused to show up in
ganglia while regionserver metrics work fine.

Thanks,
Viral

Re: GET performance degrades over time

Posted by lars hofhansl <la...@apache.org>.

Hi Viral,

some questions:


Are you adding new data or deleting data over time?
Do you have bloom filters enabled?
Which version of Hadoop?


Anything funny the Datanode logs?

-- Lars



----- Original Message -----
From: Viral Bajaria <vi...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Thursday, May 16, 2013 2:16 AM
Subject: GET performance degrades over time

Hi,

My setup is as follows:
24 regionservers (7GB RAM, 8-core CPU, 5GB heap space)
hbase 0.94.4
5-7 regions per regionserver

I am doing an avg of 4k-5k random gets per regionserver per second and the
performance is acceptable in the beginning. I have also done ~10K gets for
a single regionserver and got the results back in 600-800ms. After a while
the performance of the GETs starts degrading. The same ~10K random gets
start taking upwards of 9s-10s.

With regards to hbase settings that I have modified, I have disabled major
compaction, increase region size to 100G and bumped up the handler count to
100.

I monitored ganglia for metrics that vary when the performance shifts from
good to bad and found that the fsPreadLatency_avg_time is almost 25x in the
bad performing regionserver. fsReadLatency_avg_time is also slightly higher
but not that much (it's around 2x).

I took a thread dump of the regionserver process and also did CPU
utilization monitoring. The CPU cycles were being spent
on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads
running that function is below this email.

Any pointers on why positional reads degrade over time ? Or is this just an
issue of disk I/O and I should start looking into that ?

Thanks,
Viral

====stacktrace for one of the handler doing blockread====
"IPC Server handler 98 on 60020" - Thread t@147
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324)
- locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
- locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561)
- locked <74d81ea7> (a
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

> If you're not swapping then don't worry about it.
>
> My comment was that even though you set the swap to 0, and I'm going from
> memory, its possible for some swap to occur.
> (But I could be wrong. )
>

Thanks for sharing this info. Will remember for future debugging too.
Checked the vm.swappiness and as suggested by Jean-Marc and it is
definitely not set to 0. But since we are not swapping I doubt that's the
issue here.


>
> You really don't have a lot of memory, and you have a  5GB heap... MSLABS
> on? Could you be facing a GC pause?
>

MSLABS is on or rather I have not modified it and if I recall correctly it
should be ON by default in 0.94.x. I have GC logs on and don't see stop the
world GC pauses. GC logs are filling up quickly but I have noticed that on
my high RAM instances too.

Re: GET performance degrades over time

Posted by Michael Segel <mi...@hotmail.com>.

If you're not swapping then don't worry about it. 

My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. 
(But I could be wrong. )

You really don't have a lot of memory, and you have a  5GB heap... MSLABS on? Could you be facing a GC pause? 

On May 16, 2013, at 1:53 PM, Viral Bajaria <vi...@gmail.com> wrote:

>> Going from memory, the swap value setting to 0 is a suggestion. You may
>> still actually swap, but I think its a 'last resort' type of thing.
>> 
>> When you look at top, at the top of the page, how much swap do you see?
>> 
> 
> When I look at top it says: 0K total, 0K used, 0K free (as expected). I can
> try and add some swap but will do it as a last resort as suggested by you.

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

> Going from memory, the swap value setting to 0 is a suggestion. You may
> still actually swap, but I think its a 'last resort' type of thing.
>
> When you look at top, at the top of the page, how much swap do you see?
>

When I look at top it says: 0K total, 0K used, 0K free (as expected). I can
try and add some swap but will do it as a last resort as suggested by you.

Re: GET performance degrades over time

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Michael is correct.

More information available about swap value on wikipedia:
http://en.wikipedia.org/wiki/Swappiness

2013/5/16 Michael Segel <mi...@hotmail.com>

> Going from memory, the swap value setting to 0 is a suggestion. You may
> still actually swap, but I think its a 'last resort' type of thing.
>
> When you look at top, at the top of the page, how much swap do you see?
>
>
> On May 16, 2013, at 1:43 PM, Viral Bajaria <vi...@gmail.com>
> wrote:
>
> >> Have you checked your HBase environment? I think it perhaps come from:
> >> 1) System uses more swap frequently when your continue to execute Gets
> >> operation?
> >>
> >
> > I have set swap to 0. AFAIK, that's a recommended practice. Let me know
> if
> > that should not be followed for nodes running HBase.
> >
> >
> >> 2) check the setting "hfile.block.cache.size" in your hbase-site.xml.
> >>
> >
> > It's the default i.e. 0.25
>
>

Re: GET performance degrades over time

Posted by Michael Segel <mi...@hotmail.com>.

Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. 

When you look at top, at the top of the page, how much swap do you see? 

On May 16, 2013, at 1:43 PM, Viral Bajaria <vi...@gmail.com> wrote:

>> Have you checked your HBase environment? I think it perhaps come from:
>> 1) System uses more swap frequently when your continue to execute Gets
>> operation?
>> 
> 
> I have set swap to 0. AFAIK, that's a recommended practice. Let me know if
> that should not be followed for nodes running HBase.
> 
> 
>> 2) check the setting "hfile.block.cache.size" in your hbase-site.xml.
>> 
> 
> It's the default i.e. 0.25

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

> Have you checked your HBase environment? I think it perhaps come from:
> 1) System uses more swap frequently when your continue to execute Gets
> operation?
>

I have set swap to 0. AFAIK, that's a recommended practice. Let me know if
that should not be followed for nodes running HBase.


> 2) check the setting "hfile.block.cache.size" in your hbase-site.xml.
>

It's the default i.e. 0.25

Re: GET performance degrades over time

Posted by Viral Bajaria <vi...@gmail.com>.

> This generally happens when the same block is accessed for the HFile.  Are
> you seeing any contention on the HDFS side?
>

When you say contention what should I be looking for ? slow operations to
respond to data block requests ? or some specific metric in ganglia ?

-Viral

Re: GET performance degrades over time

Posted by ramkrishna vasudevan <ra...@gmail.com>.

This generally happens when the same block is accessed for the HFile.  Are
you seeing any contention on the HDFS side?

Regards
Ram

On Thu, May 16, 2013 at 4:19 PM, Bing Jiang <ji...@gmail.com>wrote:

> Have you checked your HBase environment? I think it perhaps come from:
> 1) System uses more swap frequently when your continue to execute Gets
> operation?
> 2) check the setting "hfile.block.cache.size" in your hbase-site.xml.
>
>
>

Re: GET performance degrades over time

Posted by Bing Jiang <ji...@gmail.com>.

Have you checked your HBase environment? I think it perhaps come from:
1) System uses more swap frequently when your continue to execute Gets
operation?
2) check the setting "hfile.block.cache.size" in your hbase-site.xml.