You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Steinmaurer Thomas <Th...@scch.at> on 2011/08/01 08:48:13 UTC

RE: Row-key design (was: GZ better than LZO?)

Hello Chris,

thanks a lot for your insights. Much appreciated!

In our test there were 1000 differents vehicle_ids, inserted via our
multi-threaded client, so while sequential integers (basically the
iterator value of the for loop starting the threads) not strictly
inserted in that order.

Regarding padding. I thought that we need some sort of fixed width stuff
per "element" (what's the correct term here) in the row key, to enable
the possibility to do range scans. Our growing factor in the system is a
growing number of vehicles which needs to be supported. While we have
the vehicle_id at the beginning of the rowkey, you mean, moving the
vehicle_id value to the left will give better distribution? We still
need to have range scans though.

In real life, the vehicle in the master data might not be uniquely
identified by an integer, but an alphanumeric serial number, so I guess
this will make a difference then and should be included in our tests
compared to sequential integers as part of the row key.

Still. For range scans, I thought we need some sort of fixed width row
keys, thus padding the row key data with "0".

Thanks!

Thomas

-----Original Message-----
From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris
Tarnas
Sent: Freitag, 29. Juli 2011 18:49
To: user@hbase.apache.org
Subject: Re: GZ better than LZO?

Your region distribution across the nodes is not great, for both cases
most of your data is going to one server, spreading the regions out
across multiple servers would be best.

How many different vehicle_ids are being used, and are they all
sequential integers in your tests? Hbase performs better when not doing
sequential inserts. You could try reversing the vehicle ids to get
around that (see the many discussions on the list about using reverse
timestamps as a rowkey)

Looking at your key construction I would suggest, unless your app
requires it, to not left-pad  your ids with zeros and rather use a
delimiter between the key components. That will lead to smaller keys, if
you use a tab as your delimiter that character falls before all other
alphanumeric and punctuation characters (other than LF, CR, etc -
characters that should not be in your IDs) so the keys will sort the
same and left padded ones. 

I've had good luck with converting sequential numeric IDs to base 64 and
then reversing them - that leads to very good key distribution across
regions and shorter keys for any given number. Another option - if you
don't care if your rowkeys are plaintext, is to convert the IDs to
binary numbers and then reverse the bytes - that would be the most
compact. If you do that you would go back to not using delimiters and
just have fixed offsets for each component.

Once you have a rowkey design you can then go ahead and create your
tables pre-split with multiple empty regions. That should perform much
better over all for inserts, especially when the DB is new and empty to
start.

How did the load with 4 million records perform?

-chris

On Jul 29, 2011, at 12:36 AM, Steinmaurer Thomas wrote:

> Hi Chris!
> 
> Your questions are somehow hard to answer for me, because I'm not 
> really in charge for the test cluster from an administration/setup
POV.
> 
> Basically, when running:
> http://xxx:60010/master.jsp
> 
> I see 7 region servers. Each with a "maxHeap" value of 995.
> 
> When clicking on the different tables depending on the compression 
> type, I get the following information:
> 
> GZ compressed table: 3 regions hosted by one region server LZO 
> compressed table: 8 regions hosted by two region servers, where the 
> start region is hosted by one region server and all other 7 regions 
> are hosted on the second region server
> 
> Regarding the insert pattern etc... please have a look on my reply to 
> Chiku, where I describe the test data generator and the table layout 
> etc ... a bit.
> 
> Thanks,
> Thomas
> 
> -----Original Message-----
> From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris 
> Tarnas
> Sent: Donnerstag, 28. Juli 2011 19:43
> To: user@hbase.apache.org
> Subject: Re: GZ better than LZO?
> 
> During the load did you add enough data to do a flush or compaction? 
> P, In our cluster that amount of data inserted would not necessarily 
> be enough to actually flush store files. Performance really depends on

> how the table's regions are laid out, the insert pattern, the number 
> of regionservers and the amount of RAM allocated to each regionserver.

> If you don't see any flushes or compactions in the log try repeating 
> that test and then flushing the table and do a compaction (or add more

> data so it happens automatically) and timing everything. It would be 
> interesting to see if the GZ benefit holds up.
> 
> -chris
> 
> On Jul 28, 2011, at 6:31 AM, Steinmaurer Thomas wrote:
> 
>> Hello,
>> 
>> 
>> 
>> we ran a test client generating data into GZ and LZO compressed
table.
>> Equal data sets (number of rows: 1008000 and the same table schema). 
>> ~
>> 7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ 
>> is
> 
>> ~
>> 444 MB, so basically half of LZO.
>> 
>> 
>> 
>> Execution time of the data generating client was 1373 seconds into 
>> the
> 
>> uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The 
>> data
> 
>> generation client is based on HTablePool and using batch operations.
>> 
>> 
>> 
>> So in our (simple) test, GZ beats LZO in both, disk usage and 
>> execution time of the client. We haven't tried reads yet.
>> 
>> 
>> 
>> Is this an expected result? I thought LZO is the recommended 
>> compression algorithm? Or does LZO outperforms GZ with a growing 
>> amount of data or in read scenarios?
>> 
>> 
>> 
>> Regards,
>> 
>> Thomas
>> 
>> 
>> 
> 


Re: Row-key design (was: GZ better than LZO?)

Posted by Chris Tarnas <cf...@email.com>.
Glad to be able to help. You don't need fixed width fields to do range scans, you just need delimiters that are lexicographically less than any valid character in your fields. If your fields are are printable non-whitespace characters then tab, ASCII 0x09, works very well. That will guarantee correct overall sorting. For example if vehicle_id is your first field and device_id is your second field: 

1\t1
1\t2
10\t1
10\t2
2\t1
2\t2
.
.

You can then do prefix scans on for a particular vehicle_id, just be sure to append the delimiter to the vehicle_id and using that as your prefix.

I have also used the null character (ASCII 0x00) as a delimiter as well as combined the null delimiters and fixed width binary fields for more complex index type keys.

-chris



On Jul 31, 2011, at 11:48 PM, Steinmaurer Thomas wrote:

> Hello Chris,
> 
> thanks a lot for your insights. Much appreciated!
> 
> In our test there were 1000 differents vehicle_ids, inserted via our
> multi-threaded client, so while sequential integers (basically the
> iterator value of the for loop starting the threads) not strictly
> inserted in that order.
> 
> Regarding padding. I thought that we need some sort of fixed width stuff
> per "element" (what's the correct term here) in the row key, to enable
> the possibility to do range scans. Our growing factor in the system is a
> growing number of vehicles which needs to be supported. While we have
> the vehicle_id at the beginning of the rowkey, you mean, moving the
> vehicle_id value to the left will give better distribution? We still
> need to have range scans though.
> 
> In real life, the vehicle in the master data might not be uniquely
> identified by an integer, but an alphanumeric serial number, so I guess
> this will make a difference then and should be included in our tests
> compared to sequential integers as part of the row key.
> 
> Still. For range scans, I thought we need some sort of fixed width row
> keys, thus padding the row key data with "0".
> 



> Thanks!
> 
> Thomas
> 
> -----Original Message-----
> From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris
> Tarnas
> Sent: Freitag, 29. Juli 2011 18:49
> To: user@hbase.apache.org
> Subject: Re: GZ better than LZO?
> 
> Your region distribution across the nodes is not great, for both cases
> most of your data is going to one server, spreading the regions out
> across multiple servers would be best.
> 
> How many different vehicle_ids are being used, and are they all
> sequential integers in your tests? Hbase performs better when not doing
> sequential inserts. You could try reversing the vehicle ids to get
> around that (see the many discussions on the list about using reverse
> timestamps as a rowkey)
> 
> Looking at your key construction I would suggest, unless your app
> requires it, to not left-pad  your ids with zeros and rather use a
> delimiter between the key components. That will lead to smaller keys, if
> you use a tab as your delimiter that character falls before all other
> alphanumeric and punctuation characters (other than LF, CR, etc -
> characters that should not be in your IDs) so the keys will sort the
> same and left padded ones. 
> 
> I've had good luck with converting sequential numeric IDs to base 64 and
> then reversing them - that leads to very good key distribution across
> regions and shorter keys for any given number. Another option - if you
> don't care if your rowkeys are plaintext, is to convert the IDs to
> binary numbers and then reverse the bytes - that would be the most
> compact. If you do that you would go back to not using delimiters and
> just have fixed offsets for each component.
> 
> Once you have a rowkey design you can then go ahead and create your
> tables pre-split with multiple empty regions. That should perform much
> better over all for inserts, especially when the DB is new and empty to
> start.
> 
> How did the load with 4 million records perform?
> 
> -chris
> 
> On Jul 29, 2011, at 12:36 AM, Steinmaurer Thomas wrote:
> 
>> Hi Chris!
>> 
>> Your questions are somehow hard to answer for me, because I'm not 
>> really in charge for the test cluster from an administration/setup
> POV.
>> 
>> Basically, when running:
>> http://xxx:60010/master.jsp
>> 
>> I see 7 region servers. Each with a "maxHeap" value of 995.
>> 
>> When clicking on the different tables depending on the compression 
>> type, I get the following information:
>> 
>> GZ compressed table: 3 regions hosted by one region server LZO 
>> compressed table: 8 regions hosted by two region servers, where the 
>> start region is hosted by one region server and all other 7 regions 
>> are hosted on the second region server
>> 
>> Regarding the insert pattern etc... please have a look on my reply to 
>> Chiku, where I describe the test data generator and the table layout 
>> etc ... a bit.
>> 
>> Thanks,
>> Thomas
>> 
>> -----Original Message-----
>> From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris 
>> Tarnas
>> Sent: Donnerstag, 28. Juli 2011 19:43
>> To: user@hbase.apache.org
>> Subject: Re: GZ better than LZO?
>> 
>> During the load did you add enough data to do a flush or compaction? 
>> P, In our cluster that amount of data inserted would not necessarily 
>> be enough to actually flush store files. Performance really depends on
> 
>> how the table's regions are laid out, the insert pattern, the number 
>> of regionservers and the amount of RAM allocated to each regionserver.
> 
>> If you don't see any flushes or compactions in the log try repeating 
>> that test and then flushing the table and do a compaction (or add more
> 
>> data so it happens automatically) and timing everything. It would be 
>> interesting to see if the GZ benefit holds up.
>> 
>> -chris
>> 
>> On Jul 28, 2011, at 6:31 AM, Steinmaurer Thomas wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> we ran a test client generating data into GZ and LZO compressed
> table.
>>> Equal data sets (number of rows: 1008000 and the same table schema). 
>>> ~
>>> 7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ 
>>> is
>> 
>>> ~
>>> 444 MB, so basically half of LZO.
>>> 
>>> 
>>> 
>>> Execution time of the data generating client was 1373 seconds into 
>>> the
>> 
>>> uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The 
>>> data
>> 
>>> generation client is based on HTablePool and using batch operations.
>>> 
>>> 
>>> 
>>> So in our (simple) test, GZ beats LZO in both, disk usage and 
>>> execution time of the client. We haven't tried reads yet.
>>> 
>>> 
>>> 
>>> Is this an expected result? I thought LZO is the recommended 
>>> compression algorithm? Or does LZO outperforms GZ with a growing 
>>> amount of data or in read scenarios?
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Thomas
>>> 
>>> 
>>> 
>> 
>