You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Andy Sautins <an...@returnpath.net> on 2011/03/28 02:06:19 UTC

Performance of region merges...

    I have an issue I'm hoping to get some insight into.  We currently have a table that has roughly 18k regions.  When we originally created the table we didn't realize we should make the regions bigger  and have subsequently changed MAX_FILESIZE to something larger.  We are no longer rapidly creating new regions, but we still have the large number of regions.  I've been investigating using the merge tool to try to reduce the number of regions to something more reasonable for our needs.  The issue I've run into is that the merge tool seems to run somewhat slowly.  On a test table that has a sample of the data in our main table I have roughly 8MM rows each approximately 1k across 48 regions.  Using the merge tool I can reduce the number of regions down to 24 by running the merge tool over pairs of regions and all seems to work well.  However, for those 48 regions it takes roughly 30 minutes.  It's not the end of the world for this table if it takes a while, but given the fact that the cluster needs to be offline when using the merge tool merging has a larger impact that I'd like it to have.

    I guess the question I have is if I have a lot more regions than I want is there a way to merge the regions down to a smaller number in a reasonably efficient manner.  Can I run the merge tool on multiple regions at the same time?  Are there alternatives to the merge tool?  Could I export/import the data or some other method?

   We are currently running 0.90.1.

   Any insights would be much appreciated.

   Andy

Re: Performance of region merges...

Posted by Ted Yu <yu...@gmail.com>.
It seems HMerge doesn't have main().
Refer to TestMergeTable.testMergeTable() for usage:
      HMerge.merge(c, FileSystem.get(c), desc.getName());

Cheers

On Sun, Mar 27, 2011 at 7:57 PM, Andy Sautins
<an...@returnpath.net>wrote:

>
>  Thank you Ted.  I had not heard of HMerge yet but will take a look.
>
>  I appreciate the help.
>
>   Andy
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Sunday, March 27, 2011 8:54 PM
> To: user@hbase.apache.org
> Subject: Re: Performance of region merges...
>
> Merge.java currently only accepts two regions.
>
> Have you looked at HMerge ?
> Its condition seems to satisfy your requirement:
>   * When merging a normal table, the HBase instance must be online, but the
>   * table must be disabled.
>
>
> On Sun, Mar 27, 2011 at 5:06 PM, Andy Sautins
> <an...@returnpath.net>wrote:
>
> >
> >    I have an issue I'm hoping to get some insight into.  We currently
> have
> > a table that has roughly 18k regions.  When we originally created the
> table
> > we didn't realize we should make the regions bigger  and have
> subsequently
> > changed MAX_FILESIZE to something larger.  We are no longer rapidly
> creating
> > new regions, but we still have the large number of regions.  I've been
> > investigating using the merge tool to try to reduce the number of regions
> to
> > something more reasonable for our needs.  The issue I've run into is that
> > the merge tool seems to run somewhat slowly.  On a test table that has a
> > sample of the data in our main table I have roughly 8MM rows each
> > approximately 1k across 48 regions.  Using the merge tool I can reduce
> the
> > number of regions down to 24 by running the merge tool over pairs of
> regions
> > and all seems to work well.  However, for those 48 regions it takes
> roughly
> > 30 minutes.  It's not the end of the world for this table if it takes a
> > while, but given the fact that the cluster needs to be offline when using
> > the merge tool merging has a larger impact that I'd like it to have.
> >
> >    I guess the question I have is if I have a lot more regions than I
> want
> > is there a way to merge the regions down to a smaller number in a
> reasonably
> > efficient manner.  Can I run the merge tool on multiple regions at the
> same
> > time?  Are there alternatives to the merge tool?  Could I export/import
> the
> > data or some other method?
> >
> >   We are currently running 0.90.1.
> >
> >   Any insights would be much appreciated.
> >
> >   Andy
> >
>

RE: Performance of region merges...

Posted by Andy Sautins <an...@returnpath.net>.
  Thank you Ted.  I had not heard of HMerge yet but will take a look.

  I appreciate the help.

   Andy

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Sunday, March 27, 2011 8:54 PM
To: user@hbase.apache.org
Subject: Re: Performance of region merges...

Merge.java currently only accepts two regions.

Have you looked at HMerge ?
Its condition seems to satisfy your requirement:
   * When merging a normal table, the HBase instance must be online, but the
   * table must be disabled.


On Sun, Mar 27, 2011 at 5:06 PM, Andy Sautins
<an...@returnpath.net>wrote:

>
>    I have an issue I'm hoping to get some insight into.  We currently have
> a table that has roughly 18k regions.  When we originally created the table
> we didn't realize we should make the regions bigger  and have subsequently
> changed MAX_FILESIZE to something larger.  We are no longer rapidly creating
> new regions, but we still have the large number of regions.  I've been
> investigating using the merge tool to try to reduce the number of regions to
> something more reasonable for our needs.  The issue I've run into is that
> the merge tool seems to run somewhat slowly.  On a test table that has a
> sample of the data in our main table I have roughly 8MM rows each
> approximately 1k across 48 regions.  Using the merge tool I can reduce the
> number of regions down to 24 by running the merge tool over pairs of regions
> and all seems to work well.  However, for those 48 regions it takes roughly
> 30 minutes.  It's not the end of the world for this table if it takes a
> while, but given the fact that the cluster needs to be offline when using
> the merge tool merging has a larger impact that I'd like it to have.
>
>    I guess the question I have is if I have a lot more regions than I want
> is there a way to merge the regions down to a smaller number in a reasonably
> efficient manner.  Can I run the merge tool on multiple regions at the same
> time?  Are there alternatives to the merge tool?  Could I export/import the
> data or some other method?
>
>   We are currently running 0.90.1.
>
>   Any insights would be much appreciated.
>
>   Andy
>

Re: Performance of region merges...

Posted by Ted Yu <yu...@gmail.com>.
Merge.java currently only accepts two regions.

Have you looked at HMerge ?
Its condition seems to satisfy your requirement:
   * When merging a normal table, the HBase instance must be online, but the
   * table must be disabled.


On Sun, Mar 27, 2011 at 5:06 PM, Andy Sautins
<an...@returnpath.net>wrote:

>
>    I have an issue I'm hoping to get some insight into.  We currently have
> a table that has roughly 18k regions.  When we originally created the table
> we didn't realize we should make the regions bigger  and have subsequently
> changed MAX_FILESIZE to something larger.  We are no longer rapidly creating
> new regions, but we still have the large number of regions.  I've been
> investigating using the merge tool to try to reduce the number of regions to
> something more reasonable for our needs.  The issue I've run into is that
> the merge tool seems to run somewhat slowly.  On a test table that has a
> sample of the data in our main table I have roughly 8MM rows each
> approximately 1k across 48 regions.  Using the merge tool I can reduce the
> number of regions down to 24 by running the merge tool over pairs of regions
> and all seems to work well.  However, for those 48 regions it takes roughly
> 30 minutes.  It's not the end of the world for this table if it takes a
> while, but given the fact that the cluster needs to be offline when using
> the merge tool merging has a larger impact that I'd like it to have.
>
>    I guess the question I have is if I have a lot more regions than I want
> is there a way to merge the regions down to a smaller number in a reasonably
> efficient manner.  Can I run the merge tool on multiple regions at the same
> time?  Are there alternatives to the merge tool?  Could I export/import the
> data or some other method?
>
>   We are currently running 0.90.1.
>
>   Any insights would be much appreciated.
>
>   Andy
>