You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Pamecha, Abhishek" <ap...@x.com> on 2012/08/24 01:21:15 UTC

limit on number of blocks per HFile and files per region

Hi
I have a few questions on blocks/file and file/region.


1.       Can there be multiple row keys per block and then per  HFile? Or is a block or Hfile dedicated to a single row key?



I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case,  puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single  rowkey at a time.



Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous  writes to a region/Hfile reserved for rowkey N.  This way, I can leverage the block cache and have the entire/most of  rowkeyN fit in there for that session.



2.       Is there a limit on number of HFiles that can exist per region? Basically, on what criteria does a rowkey data gets split in two regions [on the same region server]. I am assuming there can be many regions per region server. And multiple regions for the same table can belong in the same region server.


3.       Also, is there a limit on the number of blocks that are created per HFile? What determines whether a split is required?



Thanks,
Abhishek


Re: limit on number of blocks per HFile and files per region

Posted by "Pamecha, Abhishek" <ap...@x.com>.
Thanks Jean-daniel. I did go through  the documentation, but there was no clear answer to interleaving puts from two or more row keys or if there was a way to reserve contiguous blocks per rowkey. I made some derivations but clearly, I was incorrect in some of them as you pointed out  too. The questions were partly validations and partly doubt-riddance. :)

Thanks
Abhishek 

i Sent from my iPad with iMstakes 

On Aug 23, 2012, at 17:19, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

> Inline. In general I'd recommend you read the documentation more
> closely and/or get the book.
> 
> J-D
> 
> On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek <ap...@x.com> wrote:
>> 1.       Can there be multiple row keys per block and then per  HFile? Or is a block or Hfile dedicated to a single row key?
> 
> Multiple row keys per HFile block. Read
> http://hbase.apache.org/book.html#hfilev2
> 
>> I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case,  puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single  rowkey at a time.
>> Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous  writes to a region/Hfile reserved for rowkey N.  This way, I can leverage the block cache and have the entire/most of  rowkeyN fit in there for that session.
> 
> The row keys are sorted according to their lexicographical order. See
> http://hbase.apache.org/book.html#row
> 
> If you don't want the big rows coexisting with the small rows, put
> them in different column families or different tables.
> 
>> 2.       Is there a limit on number of HFiles that can exist per region?
> 
> I think your understanding of HFiles being a bit wrong prompted you to
> ask this, my previous answers probably make it so that you don't need
> this answer anymore, but there it is just in case:
> 
> The HFiles are compacted when reaching
> hbase.hstore.compactionThreshold (default of 3) per family, and you
> can have no more than hbase.hstore.blockingStoreFiles (default of 7).
> 
> " Basically, on what criteria does a rowkey data gets split in two
> regions [on the same region server]. I am assuming there can be many
> regions per region server. And multiple regions for the same table can
> belong in the same region server.
> 
> A row key only lives in a single region since the regions are split
> based on row keys.
> 
>> 3.       Also, is there a limit on the number of blocks that are created per HFile?
> 
> No.
> 
>> What determines whether a split is required?
> 
> hbase.hregion.max.filesize, also see
> http://hbase.apache.org/book.html#disable.splitting if you want to
> change that.

Re: limit on number of blocks per HFile and files per region

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Inline. In general I'd recommend you read the documentation more
closely and/or get the book.

J-D

On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek <ap...@x.com> wrote:
> 1.       Can there be multiple row keys per block and then per  HFile? Or is a block or Hfile dedicated to a single row key?

Multiple row keys per HFile block. Read
http://hbase.apache.org/book.html#hfilev2

> I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case,  puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single  rowkey at a time.
> Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous  writes to a region/Hfile reserved for rowkey N.  This way, I can leverage the block cache and have the entire/most of  rowkeyN fit in there for that session.

The row keys are sorted according to their lexicographical order. See
http://hbase.apache.org/book.html#row

If you don't want the big rows coexisting with the small rows, put
them in different column families or different tables.

> 2.       Is there a limit on number of HFiles that can exist per region?

I think your understanding of HFiles being a bit wrong prompted you to
ask this, my previous answers probably make it so that you don't need
this answer anymore, but there it is just in case:

The HFiles are compacted when reaching
hbase.hstore.compactionThreshold (default of 3) per family, and you
can have no more than hbase.hstore.blockingStoreFiles (default of 7).

" Basically, on what criteria does a rowkey data gets split in two
regions [on the same region server]. I am assuming there can be many
regions per region server. And multiple regions for the same table can
belong in the same region server.

A row key only lives in a single region since the regions are split
based on row keys.

> 3.       Also, is there a limit on the number of blocks that are created per HFile?

No.

> What determines whether a split is required?

hbase.hregion.max.filesize, also see
http://hbase.apache.org/book.html#disable.splitting if you want to
change that.