You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Keller <br...@gmail.com> on 2013/06/04 19:01:52 UTC
Re: Poor HBase map-reduce scan performance

Thanks Enis, I'll see if I can backport this patch - it is exactly what I was going to try. This should solve my scan performance problems if I can get it to work.

On May 29, 2013, at 1:29 PM, Enis Söztutar <en...@hortonworks.com> wrote:

> Hi,
> 
> Regarding running raw scans on top of Hfiles, you can try a version of the
> patch attached at https://issues.apache.org/jira/browse/HBASE-8369, which
> enables exactly this. However, the patch is for trunk.
> 
> In that, we open one region from snapshot files in each record reader, and
> run a scan through using an internal region scanner. Since this bypasses
> the client + rpc + server daemon layers, it should be able to give optimum
> scan performance.
> 
> There is also a tool called HFilePerformanceBenchmark that intends to
> measure raw performance for HFiles. I've had to do a lot of changes to make
> is workable, but it might be worth to take a look to see whether there is
> any perf difference between scanning a sequence file from hdfs vs scanning
> an hfile.
> 
> Enis
> 
> 
> On Fri, May 24, 2013 at 10:50 PM, lars hofhansl <la...@apache.org> wrote:
> 
>> Sorry. Haven't gotten to this, yet.
>> 
>> Scanning in HBase being about 3x slower than straight HDFS is in the right
>> ballpark, though. It has to a bit more work.
>> 
>> Generally, HBase is great at honing in to a subset (some 10-100m rows) of
>> the data. Raw scan performance is not (yet) a strength of HBase.
>> 
>> So with HDFS you get to 75% of the theoretical maximum read throughput;
>> hence with HBase you to 25% of the theoretical cluster wide maximum disk
>> throughput?
>> 
>> 
>> -- Lars
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Bryan Keller <br...@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Friday, May 10, 2013 8:46 AM
>> Subject: Re: Poor HBase map-reduce scan performance
>> 
>> FYI, I ran tests with compression on and off.
>> 
>> With a plain HDFS sequence file and compression off, I am getting very
>> good I/O numbers, roughly 75% of theoretical max for reads. With snappy
>> compression on with a sequence file, I/O speed is about 3x slower. However
>> the file size is 3x smaller so it takes about the same time to scan.
>> 
>> With HBase, the results are equivalent (just much slower than a sequence
>> file). Scanning a compressed table is about 3x slower I/O than an
>> uncompressed table, but the table is 3x smaller, so the time to scan is
>> about the same. Scanning an HBase table takes about 3x as long as scanning
>> the sequence file export of the table, either compressed or uncompressed.
>> The sequence file export file size ends up being just barely larger than
>> the table, either compressed or uncompressed
>> 
>> So in sum, compression slows down I/O 3x, but the file is 3x smaller so
>> the time to scan is about the same. Adding in HBase slows things down
>> another 3x. So I'm seeing 9x faster I/O scanning an uncompressed sequence
>> file vs scanning a compressed table.
>> 
>> 
>> On May 8, 2013, at 10:15 AM, Bryan Keller <br...@gmail.com> wrote:
>> 
>>> Thanks for the offer Lars! I haven't made much progress speeding things
>> up.
>>> 
>>> I finally put together a test program that populates a table that is
>> similar to my production dataset. I have a readme that should describe
>> things, hopefully enough to make it useable. There is code to populate a
>> test table, code to scan the table, and code to scan sequence files from an
>> export (to compare HBase w/ raw HDFS). I use a gradle build script.
>>> 
>>> You can find the code here:
>>> 
>>> https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip
>>> 
>>> 
>>> On May 4, 2013, at 6:33 PM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>>> The blockbuffers are not reused, but that by itself should not be a
>> problem as they are all the same size (at least I have never identified
>> that as one in my profiling sessions).
>>>> 
>>>> My offer still stands to do some profiling myself if there is an easy
>> way to generate data of similar shape.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Bryan Keller <br...@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Sent: Friday, May 3, 2013 3:44 AM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>> 
>>>> 
>>>> Actually I'm not too confident in my results re block size, they may
>> have been related to major compaction. I'm going to rerun before drawing
>> any conclusions.
>>>> 
>>>> On May 3, 2013, at 12:17 AM, Bryan Keller <br...@gmail.com> wrote:
>>>> 
>>>>> I finally made some progress. I tried a very large HBase block size
>> (16mb), and it significantly improved scan performance. I went from 45-50
>> min to 24 min. Not great but much better. Before I had it set to 128k.
>> Scanning an equivalent sequence file takes 10 min. My random read
>> performance will probably suffer with such a large block size
>> (theoretically), so I probably can't keep it this big. I care about random
>> read performance too. I've read having a block size this big is not
>> recommended, is that correct?
>>>>> 
>>>>> I haven't dug too deeply into the code, are the block buffers reused
>> or is each new block read a new allocation? Perhaps a buffer pool could
>> help here if there isn't one already. When doing a scan, HBase could reuse
>> previously allocated block buffers instead of allocating a new one for each
>> block. Then block size shouldn't affect scan performance much.
>>>>> 
>>>>> I'm not using a block encoder. Also, I'm still sifting through the
>> profiler results, I'll see if I can make more sense of it and run some more
>> experiments.
>>>>> 
>>>>> On May 2, 2013, at 5:46 PM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>>> Interesting. If you can try 0.94.7 (but it'll probably not have
>> changed that much from 0.94.4)
>>>>>> 
>>>>>> 
>>>>>> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If
>> so, try without. They currently need to reallocate a ByteBuffer for each
>> single KV.
>>>>>> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably
>> have not enabled encoding, just checking).
>>>>>> 
>>>>>> 
>>>>>> And do you have a stack trace for the ByteBuffer.allocate(). That is
>> a strange one since it never came up in my profiling (unless you enabled
>> block encoding).
>>>>>> (You can get traces from VisualVM by creating a snapshot, but you'd
>> have to drill in to find the allocate()).
>>>>>> 
>>>>>> 
>>>>>> During normal scanning (again, without encoding) there should be no
>> allocation happening except for blocks read from disk (and they should all
>> be the same size, thus allocation should be cheap).
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Bryan Keller <br...@gmail.com>
>>>>>> To: user@hbase.apache.org
>>>>>> Sent: Thursday, May 2, 2013 10:54 AM
>>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>>> 
>>>>>> 
>>>>>> I ran one of my regionservers through VisualVM. It looks like the top
>> hot spots are HFileReaderV2$ScannerV2.getKeyValue() and
>> ByteBuffer.allocate(). It appears at first glance that memory allocations
>> may be an issue. Decompression was next below that but less of an issue it
>> seems.
>>>>>> 
>>>>>> Would changing the block size, either HDFS or HBase, help here?
>>>>>> 
>>>>>> Also, if anyone has tips on how else to profile, that would be
>> appreciated. VisualVM can produce a lot of noise that is hard to sift
>> through.
>>>>>> 
>>>>>> 
>>>>>> On May 1, 2013, at 9:49 PM, Bryan Keller <br...@gmail.com> wrote:
>>>>>> 
>>>>>>> I used exactly 0.94.4, pulled from the tag in subversion.
>>>>>>> 
>>>>>>> On May 1, 2013, at 9:41 PM, lars hofhansl <la...@apache.org> wrote:
>>>>>>> 
>>>>>>>> Hmm... Did you actually use exactly version 0.94.4, or the latest
>> 0.94.7.
>>>>>>>> I would be very curious to see profiling data.
>>>>>>>> 
>>>>>>>> -- Lars
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----- Original Message -----
>>>>>>>> From: Bryan Keller <br...@gmail.com>
>>>>>>>> To: "user@hbase.apache.org" <us...@hbase.apache.org>
>>>>>>>> Cc:
>>>>>>>> Sent: Wednesday, May 1, 2013 6:01 PM
>>>>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>>>>> 
>>>>>>>> I tried running my test with 0.94.4, unfortunately performance was
>> about the same. I'm planning on profiling the regionserver and trying some
>> other things tonight and tomorrow and will report back.
>>>>>>>> 
>>>>>>>> On May 1, 2013, at 8:00 AM, Bryan Keller <br...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Yes I would like to try this, if you can point me to the pom.xml
>> patch that would save me some time.
>>>>>>>>> 
>>>>>>>>> On Tuesday, April 30, 2013, lars hofhansl wrote:
>>>>>>>>> If you can, try 0.94.4+; it should significantly reduce the amount
>> of bytes copied around in RAM during scanning, especially if you have wide
>> rows and/or large key portions. That in turns makes scans scale better
>> across cores, since RAM is shared resource between cores (much like disk).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It's not hard to build the latest HBase against Cloudera's version
>> of Hadoop. I can send along a simple patch to pom.xml to do that.
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Bryan Keller <br...@gmail.com>
>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>> Sent: Tuesday, April 30, 2013 11:02 PM
>>>>>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The table has hashed keys so rows are evenly distributed amongst
>> the regionservers, and load on each regionserver is pretty much the same. I
>> also have per-table balancing turned on. I get mostly data local mappers
>> with only a few rack local (maybe 10 of the 250 mappers).
>>>>>>>>> 
>>>>>>>>> Currently the table is a wide table schema, with lists of data
>> structures stored as columns with column prefixes grouping the data
>> structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I
>> was thinking of moving those data structures to protobuf which would cut
>> down on the number of columns. The downside is I can't filter on one value
>> with that, but it is a tradeoff I would make for performance. I was also
>> considering restructuring the table into a tall table.
>>>>>>>>> 
>>>>>>>>> Something interesting is that my old regionserver machines had
>> five 15k SCSI drives instead of 2 SSDs, and performance was about the same.
>> Also, my old network was 1gbit, now it is 10gbit. So neither network nor
>> disk I/O appear to be the bottleneck. The CPU is rather high for the
>> regionserver so it seems like the best candidate to investigate. I will try
>> profiling it tomorrow and will report back. I may revisit compression on vs
>> off since that is adding load to the CPU.
>>>>>>>>> 
>>>>>>>>> I'll also come up with a sample program that generates data
>> similar to my table.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org>
>> wrote:
>>>>>>>>> 
>>>>>>>>>> Your average row is 35k so scanner caching would not make a huge
>> difference, although I would have expected some improvements by setting it
>> to 10 or 50 since you have a wide 10ge pipe.
>>>>>>>>>> 
>>>>>>>>>> I assume your table is split sufficiently to touch all
>> RegionServer... Do you see the same load/IO on all region servers?
>>>>>>>>>> 
>>>>>>>>>> A bunch of scan improvements went into HBase since 0.94.2.
>>>>>>>>>> I blogged about some of these changes here:
>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>>>>>>>>>> 
>>>>>>>>>> In your case - since you have many columns, each of which carry
>> the rowkey - you might benefit a lot from HBASE-7279.
>>>>>>>>>> 
>>>>>>>>>> In the end HBase *is* slower than straight HDFS for full scans.
>> How could it not be?
>>>>>>>>>> So I would start by looking at HDFS first. Make sure Nagle's is
>> disbaled in both HBase and HDFS.
>>>>>>>>>> 
>>>>>>>>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy
>> Purtell is listening, I think he did some tests with HBase on SSDs.
>>>>>>>>>> With rotating media you typically see an improvement with
>> compression. With SSDs the added CPU needed for decompression might
>> outweigh the benefits.
>>>>>>>>>> 
>>>>>>>>>> At the risk of starting a larger discussion here, I would posit
>> that HBase's LSM based design, which trades random IO with sequential IO,
>> might be a bit more questionable on SSDs.
>>>>>>>>>> 
>>>>>>>>>> If you can, it would be nice to run a profiler against one of the
>> RegionServers (or maybe do it with the single RS setup) and see where it is
>> bottlenecked.
>>>>>>>>>> (And if you send me a sample program to generate some data - not
>> 700g, though :) - I'll try to do a bit of profiling during the next days as
>> my day job permits, but I do not have any machines with SSDs).
>>>>>>>>>> 
>>>>>>>>>> -- Lars
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ________________________________
>>>>>>>>>> From: Bryan Keller <br...@gmail.com>
>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>> Sent: Tuesday, April 30, 2013 9:31 PM
>>>>>>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I have tried various settings for setCaching() and I have
>> setCacheBlocks(false)
>>>>>>>>>> 
>>>>>>>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> From http://hbase.apache.org/book.html#mapreduce.example :
>>>>>>>>>>> 
>>>>>>>>>>> scan.setCaching(500);        // 1 is the default in Scan, which
>> will
>>>>>>>>>>> be bad for MapReduce jobs
>>>>>>>>>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>>>>>>>>> 
>>>>>>>>>>> I guess you have used the above setting.
>>>>>>>>>>> 
>>>>>>>>>>> 0.94.x releases are compatible. Have you considered upgrading
>> to, say
>>>>>>>>>>> 0.94.7 which was recently released ?
>>>>>>>>>>> 
>>>>>>>>>>> Cheers
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm
>>>>>>>> 
>>> 
>> 
>>