You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2014/04/15 11:45:50 UTC

Weird behavior splitting regions

I have a table in Hbase that sizes around 96Gb,

I generate 4 regions of 30Gb. Some time, table starts to split because the
max size for region is 1Gb (I just realize of that, I'm going to change it
or create more pre-splits.).

There're two things that I don't understand. how is it creating the splits?
right now I have 130 regions and growing. The problem is the size of the
new regions:

1.7 M    /hbase/filters/4ddbc34a2242e44c03121ae4608788a2
1.6 G    /hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
3.1 G    /hbase/filters/58b50df089bd9d4d1f079f53238e060d
2.5 M    /hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
1.9 G    /hbase/filters/5b0a35b5735a473b7e804c4b045ce374
883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
1.7 M    /hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2

There're some new regions that they're just a some KBytes!. Why they are so
small?? When does HBase decide to split? because it started to split two
hours later to create the table.

One, I create the table and insert data, I don't insert new data or modify
them.


Another interested point it's why there're major compactions:
2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming compacted file at
hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
to
hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
2014-04-15 11:33:47,407 INFO
org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
(CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
Completed major compaction of 1 file*(s) in d of
filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
789.1 M
2014-04-15 11:33:47,416 INFO
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
completed compaction:
regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
storeName=d, fileCount=1, fileSize=1.5 G, priority=6, time=414761474510060;
duration=7sec

I thought major compaction just happen once at day and compact many files
per region. Data is always the same here, I don't inject new data.


I'm working with 0.94.6 CDH44. I'm going to change the size of the regions,
but, I would like to understand why things happen.

Thank you.

Re: Weird behavior splitting regions

Posted by Guillermo Ortiz <ko...@gmail.com>.
I read the article, that's why I typed the question, because I didn't
understand the result I got.

Oh, yes!!, that's true, so silly.
I think some of the files are pretty small because the table has two
families and one of them is much smaller than the another one. So, it has
been splitted many  times. The big regions get a size close to 1Gb, but the
smaller regions has a final size pretty small because they have been
splitted a lot of times.

What I don't know, it's why HBase decides to split the table so late, not
when I create the table presplitted if not, two hours later or whatever.
Anyway, that's my error, I'm just curious about it.


2014-04-15 12:17 GMT+02:00 divye sheth <di...@gmail.com>:

> The default split policy in hbase0.94.x is IncreaseToUpperBound rather than
> ConstantSizeSplitPolicy which was the default in the older versions of
> hbase.
>
> Please refer to the link given below to understand how a
> IncreaseToUpperBoundSplitPolicy works:
> http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
> check the auto-splitting section
>
> Hope this answers your question
>
> Thanks
> Divye Sheth
>
>
>
> On Tue, Apr 15, 2014 at 3:36 PM, Bharath Vissapragada <
> bharathv@cloudera.com
> > wrote:
>
> > >There're some new regions that they're just a some KBytes!. Why they are
> > so
> > small?? When does HBase decide to split? because it started to split two
> > hours later to create the table.
> >
> > When hbase does a split, it doesn't actually split at the disk/file
> level.
> > Its just a metadata operation which creates new regions that contain the
> > reference files that still point to old HFiles. That is the reason you
> find
> > KB size regions.
> >
> > >I thought major compaction just happen once at day and compact many
> files
> > per region. Data is always the same here, I don't inject new data.
> >
> > IIRC sometimes minor compactions get promoted to major compactions based
> on
> > some criteria, but I'll leave it for others to answer!
> >
> >
> >
> > On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz <konstt2000@gmail.com
> > >wrote:
> >
> > > I have a table in Hbase that sizes around 96Gb,
> > >
> > > I generate 4 regions of 30Gb. Some time, table starts to split because
> > the
> > > max size for region is 1Gb (I just realize of that, I'm going to change
> > it
> > > or create more pre-splits.).
> > >
> > > There're two things that I don't understand. how is it creating the
> > splits?
> > > right now I have 130 regions and growing. The problem is the size of
> the
> > > new regions:
> > >
> > > 1.7 M    /hbase/filters/4ddbc34a2242e44c03121ae4608788a2
> > > 1.6 G    /hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
> > > 3.1 G    /hbase/filters/58b50df089bd9d4d1f079f53238e060d
> > > 2.5 M    /hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
> > > 1.9 G    /hbase/filters/5b0a35b5735a473b7e804c4b045ce374
> > > 883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
> > > 1.7 M    /hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
> > > 632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2
> > >
> > > There're some new regions that they're just a some KBytes!. Why they
> are
> > so
> > > small?? When does HBase decide to split? because it started to split
> two
> > > hours later to create the table.
> > >
> > > One, I create the table and insert data, I don't insert new data or
> > modify
> > > them.
> > >
> > >
> > > Another interested point it's why there're major compactions:
> > > 2014-04-15 11:33:47,400 INFO
> org.apache.hadoop.hbase.regionserver.Store:
> > > Renaming compacted file at
> > >
> > >
> >
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
> > > to
> > >
> > >
> >
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
> > > 2014-04-15 11:33:47,407 INFO
> > > org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
> > > (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
> > > 2014-04-15 11:33:47,416 INFO
> org.apache.hadoop.hbase.regionserver.Store:*
> > > Completed major compaction of 1 file*(s) in d of
> > > filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
> > > df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
> > > 789.1 M
> > > 2014-04-15 11:33:47,416 INFO
> > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
> > > completed compaction:
> > > regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
> > > storeName=d, fileCount=1, fileSize=1.5 G, priority=6,
> > time=414761474510060;
> > > duration=7sec
> > >
> > > I thought major compaction just happen once at day and compact many
> files
> > > per region. Data is always the same here, I don't inject new data.
> > >
> > >
> > > I'm working with 0.94.6 CDH44. I'm going to change the size of the
> > regions,
> > > but, I would like to understand why things happen.
> > >
> > > Thank you.
> > >
> >
> >
> >
> > --
> > Bharath Vissapragada
> > <http://www.cloudera.com>
> >
>

Re: Weird behavior splitting regions

Posted by divye sheth <di...@gmail.com>.
The default split policy in hbase0.94.x is IncreaseToUpperBound rather than
ConstantSizeSplitPolicy which was the default in the older versions of
hbase.

Please refer to the link given below to understand how a
IncreaseToUpperBoundSplitPolicy works:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
check the auto-splitting section

Hope this answers your question

Thanks
Divye Sheth



On Tue, Apr 15, 2014 at 3:36 PM, Bharath Vissapragada <bharathv@cloudera.com
> wrote:

> >There're some new regions that they're just a some KBytes!. Why they are
> so
> small?? When does HBase decide to split? because it started to split two
> hours later to create the table.
>
> When hbase does a split, it doesn't actually split at the disk/file level.
> Its just a metadata operation which creates new regions that contain the
> reference files that still point to old HFiles. That is the reason you find
> KB size regions.
>
> >I thought major compaction just happen once at day and compact many files
> per region. Data is always the same here, I don't inject new data.
>
> IIRC sometimes minor compactions get promoted to major compactions based on
> some criteria, but I'll leave it for others to answer!
>
>
>
> On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz <konstt2000@gmail.com
> >wrote:
>
> > I have a table in Hbase that sizes around 96Gb,
> >
> > I generate 4 regions of 30Gb. Some time, table starts to split because
> the
> > max size for region is 1Gb (I just realize of that, I'm going to change
> it
> > or create more pre-splits.).
> >
> > There're two things that I don't understand. how is it creating the
> splits?
> > right now I have 130 regions and growing. The problem is the size of the
> > new regions:
> >
> > 1.7 M    /hbase/filters/4ddbc34a2242e44c03121ae4608788a2
> > 1.6 G    /hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
> > 3.1 G    /hbase/filters/58b50df089bd9d4d1f079f53238e060d
> > 2.5 M    /hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
> > 1.9 G    /hbase/filters/5b0a35b5735a473b7e804c4b045ce374
> > 883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
> > 1.7 M    /hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
> > 632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2
> >
> > There're some new regions that they're just a some KBytes!. Why they are
> so
> > small?? When does HBase decide to split? because it started to split two
> > hours later to create the table.
> >
> > One, I create the table and insert data, I don't insert new data or
> modify
> > them.
> >
> >
> > Another interested point it's why there're major compactions:
> > 2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Renaming compacted file at
> >
> >
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
> > to
> >
> >
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
> > 2014-04-15 11:33:47,407 INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
> > (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
> > 2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
> > Completed major compaction of 1 file*(s) in d of
> > filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
> > df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
> > 789.1 M
> > 2014-04-15 11:33:47,416 INFO
> > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
> > completed compaction:
> > regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
> > storeName=d, fileCount=1, fileSize=1.5 G, priority=6,
> time=414761474510060;
> > duration=7sec
> >
> > I thought major compaction just happen once at day and compact many files
> > per region. Data is always the same here, I don't inject new data.
> >
> >
> > I'm working with 0.94.6 CDH44. I'm going to change the size of the
> regions,
> > but, I would like to understand why things happen.
> >
> > Thank you.
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: Weird behavior splitting regions

Posted by Bharath Vissapragada <bh...@cloudera.com>.
>There're some new regions that they're just a some KBytes!. Why they are so
small?? When does HBase decide to split? because it started to split two
hours later to create the table.

When hbase does a split, it doesn't actually split at the disk/file level.
Its just a metadata operation which creates new regions that contain the
reference files that still point to old HFiles. That is the reason you find
KB size regions.

>I thought major compaction just happen once at day and compact many files
per region. Data is always the same here, I don't inject new data.

IIRC sometimes minor compactions get promoted to major compactions based on
some criteria, but I'll leave it for others to answer!



On Tue, Apr 15, 2014 at 3:15 PM, Guillermo Ortiz <ko...@gmail.com>wrote:

> I have a table in Hbase that sizes around 96Gb,
>
> I generate 4 regions of 30Gb. Some time, table starts to split because the
> max size for region is 1Gb (I just realize of that, I'm going to change it
> or create more pre-splits.).
>
> There're two things that I don't understand. how is it creating the splits?
> right now I have 130 regions and growing. The problem is the size of the
> new regions:
>
> 1.7 M    /hbase/filters/4ddbc34a2242e44c03121ae4608788a2
> 1.6 G    /hbase/filters/548bdcec79cfe9a99fa57cb18f801be2
> 3.1 G    /hbase/filters/58b50df089bd9d4d1f079f53238e060d
> 2.5 M    /hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f
> 1.9 G    /hbase/filters/5b0a35b5735a473b7e804c4b045ce374
> 883.4 M  /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c
> 1.7 M    /hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7
> 632.4 M  /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2
>
> There're some new regions that they're just a some KBytes!. Why they are so
> small?? When does HBase decide to split? because it started to split two
> hours later to create the table.
>
> One, I create the table and insert data, I don't insert new data or modify
> them.
>
>
> Another interested point it's why there're major compactions:
> 2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store:
> Renaming compacted file at
>
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c
> to
>
> hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c
> 2014-04-15 11:33:47,407 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL
> (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c
> 2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:*
> Completed major compaction of 1 file*(s) in d of
> filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into
> df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is
> 789.1 M
> 2014-04-15 11:33:47,416 INFO
> org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
> completed compaction:
> regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4.,
> storeName=d, fileCount=1, fileSize=1.5 G, priority=6, time=414761474510060;
> duration=7sec
>
> I thought major compaction just happen once at day and compact many files
> per region. Data is always the same here, I don't inject new data.
>
>
> I'm working with 0.94.6 CDH44. I'm going to change the size of the regions,
> but, I would like to understand why things happen.
>
> Thank you.
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>