You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/12/30 17:25:43 UTC

Best way to compact a region after a move?

Hi,

When I'm balancing manually the regions on my cluster, and I want to
make sure they are local, so I want to major_compact them each time
I'm moving them.

On the balanceCluster method, we are returning a list of region to
move. Which mean they are not yet moved, so I can't compact them
there.

Is there a place where I shoud hook to compact those regions?

So far, the only idea I found was to start a thread on the
balancerCluster, wait 1 minute, and compact all the regions I
returned. But I'm wondering if there is a better way to achieve that?
Is there a queue where I should place those regions to compact
instead? Also, I need to know (even if it's just in the logs) when
those compactions are done.

Thanks,

JM

Re: Best way to compact a region after a move?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Exactly what I was looking for ;)

Thanks a lot!

JM

2012/12/30, Ted Yu <yu...@gmail.com>:
> I guess you would want custom compaction only on user tables.
> Take a look at the following config param in
> http://hbase.apache.org/book.html:
> hbase.coprocessor.region.classesCheers
>
> On Sun, Dec 30, 2012 at 10:25 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Thanks for the hints. I will look there too.
>>
>> Is there a way to attach id to ALL the tables and not specificly some
>> tables? Or should I attached it to the tables one by one?
>>
>> 2012/12/30, Ted Yu <yu...@gmail.com>:
>> > You can find how to dynamically load coprocessor in
>> > hbase-server/src/main/ruby/shell/commands/alter.rb
>> >
>> > There're ample test cases which show you how to use RegionObserver,
>> > e.g.
>> >
>> src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
>> >
>> > Yes, you can attach your coprocessor to the table(s) which you want the
>> > custom compaction. The coprocessor would be deployed on region servers.
>> >
>> > Cheers
>> >
>> > On Sun, Dec 30, 2012 at 9:47 AM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi Ted,
>> >>
>> >> Thanks for your reply.
>> >>
>> >> I looked at the RegionObserver and I will dig this way.  I think I
>> >> found what I need in it.
>> >>
>> >> How can I attach it to HBase? Should I do that on all the servers? On
>> >> the master only and it will replicate? Should I attached it to each
>> >> regions? Or directly to the table?
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >>
>> >>
>> >> 2012/12/30, Ted Yu <yu...@gmail.com>:
>> >> > balancerCluster() executes on master. Compaction is region server
>> >> activity.
>> >> > So they don't pair naturally.
>> >> >
>> >> > I answered first part of the question in the thread titled 'How to
>> know
>> >> > it's time for a major compaction?':
>> >> >
>> >> > In RegionObserver, we already have the following hook:
>> >> >
>> >> >   /**
>> >> >    * Called after the region is reported as open to the master.
>> >> >    * @param c the environment provided by the region server
>> >> >    */
>> >> >   void postOpen(final ObserverContext<
>> >> > RegionCoprocessorEnvironment> c);
>> >> >
>> >> > Auto-compaction logic can be triggered through the above hook.
>> >> >
>> >> > Take a look at the following hook for the second part of your
>> question:
>> >> >
>> >> >   void postCompact(final
>> >> > ObserverContext<RegionCoprocessorEnvironment>
>> >> > c,
>> >> > final HStore store,
>> >> >       StoreFile resultFile) throws IOException;
>> >> >
>> >> > Cheers
>> >> >
>> >> > On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> When I'm balancing manually the regions on my cluster, and I want
>> >> >> to
>> >> >> make sure they are local, so I want to major_compact them each time
>> >> >> I'm moving them.
>> >> >>
>> >> >> On the balanceCluster method, we are returning a list of region to
>> >> >> move. Which mean they are not yet moved, so I can't compact them
>> >> >> there.
>> >> >>
>> >> >> Is there a place where I shoud hook to compact those regions?
>> >> >>
>> >> >> So far, the only idea I found was to start a thread on the
>> >> >> balancerCluster, wait 1 minute, and compact all the regions I
>> >> >> returned. But I'm wondering if there is a better way to achieve
>> >> >> that?
>> >> >> Is there a queue where I should place those regions to compact
>> >> >> instead? Also, I need to know (even if it's just in the logs) when
>> >> >> those compactions are done.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> JM
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Best way to compact a region after a move?

Posted by Ted Yu <yu...@gmail.com>.
I guess you would want custom compaction only on user tables.
Take a look at the following config param in
http://hbase.apache.org/book.html:
hbase.coprocessor.region.classesCheers

On Sun, Dec 30, 2012 at 10:25 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Thanks for the hints. I will look there too.
>
> Is there a way to attach id to ALL the tables and not specificly some
> tables? Or should I attached it to the tables one by one?
>
> 2012/12/30, Ted Yu <yu...@gmail.com>:
> > You can find how to dynamically load coprocessor in
> > hbase-server/src/main/ruby/shell/commands/alter.rb
> >
> > There're ample test cases which show you how to use RegionObserver, e.g.
> >
> src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
> >
> > Yes, you can attach your coprocessor to the table(s) which you want the
> > custom compaction. The coprocessor would be deployed on region servers.
> >
> > Cheers
> >
> > On Sun, Dec 30, 2012 at 9:47 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi Ted,
> >>
> >> Thanks for your reply.
> >>
> >> I looked at the RegionObserver and I will dig this way.  I think I
> >> found what I need in it.
> >>
> >> How can I attach it to HBase? Should I do that on all the servers? On
> >> the master only and it will replicate? Should I attached it to each
> >> regions? Or directly to the table?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >>
> >>
> >> 2012/12/30, Ted Yu <yu...@gmail.com>:
> >> > balancerCluster() executes on master. Compaction is region server
> >> activity.
> >> > So they don't pair naturally.
> >> >
> >> > I answered first part of the question in the thread titled 'How to
> know
> >> > it's time for a major compaction?':
> >> >
> >> > In RegionObserver, we already have the following hook:
> >> >
> >> >   /**
> >> >    * Called after the region is reported as open to the master.
> >> >    * @param c the environment provided by the region server
> >> >    */
> >> >   void postOpen(final ObserverContext<
> >> > RegionCoprocessorEnvironment> c);
> >> >
> >> > Auto-compaction logic can be triggered through the above hook.
> >> >
> >> > Take a look at the following hook for the second part of your
> question:
> >> >
> >> >   void postCompact(final ObserverContext<RegionCoprocessorEnvironment>
> >> > c,
> >> > final HStore store,
> >> >       StoreFile resultFile) throws IOException;
> >> >
> >> > Cheers
> >> >
> >> > On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> When I'm balancing manually the regions on my cluster, and I want to
> >> >> make sure they are local, so I want to major_compact them each time
> >> >> I'm moving them.
> >> >>
> >> >> On the balanceCluster method, we are returning a list of region to
> >> >> move. Which mean they are not yet moved, so I can't compact them
> >> >> there.
> >> >>
> >> >> Is there a place where I shoud hook to compact those regions?
> >> >>
> >> >> So far, the only idea I found was to start a thread on the
> >> >> balancerCluster, wait 1 minute, and compact all the regions I
> >> >> returned. But I'm wondering if there is a better way to achieve that?
> >> >> Is there a queue where I should place those regions to compact
> >> >> instead? Also, I need to know (even if it's just in the logs) when
> >> >> those compactions are done.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> JM
> >> >>
> >> >
> >>
> >
>

Re: Best way to compact a region after a move?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks for the hints. I will look there too.

Is there a way to attach id to ALL the tables and not specificly some
tables? Or should I attached it to the tables one by one?

2012/12/30, Ted Yu <yu...@gmail.com>:
> You can find how to dynamically load coprocessor in
> hbase-server/src/main/ruby/shell/commands/alter.rb
>
> There're ample test cases which show you how to use RegionObserver, e.g.
> src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
>
> Yes, you can attach your coprocessor to the table(s) which you want the
> custom compaction. The coprocessor would be deployed on region servers.
>
> Cheers
>
> On Sun, Dec 30, 2012 at 9:47 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi Ted,
>>
>> Thanks for your reply.
>>
>> I looked at the RegionObserver and I will dig this way.  I think I
>> found what I need in it.
>>
>> How can I attach it to HBase? Should I do that on all the servers? On
>> the master only and it will replicate? Should I attached it to each
>> regions? Or directly to the table?
>>
>> Thanks,
>>
>> JM
>>
>>
>>
>> 2012/12/30, Ted Yu <yu...@gmail.com>:
>> > balancerCluster() executes on master. Compaction is region server
>> activity.
>> > So they don't pair naturally.
>> >
>> > I answered first part of the question in the thread titled 'How to know
>> > it's time for a major compaction?':
>> >
>> > In RegionObserver, we already have the following hook:
>> >
>> >   /**
>> >    * Called after the region is reported as open to the master.
>> >    * @param c the environment provided by the region server
>> >    */
>> >   void postOpen(final ObserverContext<
>> > RegionCoprocessorEnvironment> c);
>> >
>> > Auto-compaction logic can be triggered through the above hook.
>> >
>> > Take a look at the following hook for the second part of your question:
>> >
>> >   void postCompact(final ObserverContext<RegionCoprocessorEnvironment>
>> > c,
>> > final HStore store,
>> >       StoreFile resultFile) throws IOException;
>> >
>> > Cheers
>> >
>> > On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi,
>> >>
>> >> When I'm balancing manually the regions on my cluster, and I want to
>> >> make sure they are local, so I want to major_compact them each time
>> >> I'm moving them.
>> >>
>> >> On the balanceCluster method, we are returning a list of region to
>> >> move. Which mean they are not yet moved, so I can't compact them
>> >> there.
>> >>
>> >> Is there a place where I shoud hook to compact those regions?
>> >>
>> >> So far, the only idea I found was to start a thread on the
>> >> balancerCluster, wait 1 minute, and compact all the regions I
>> >> returned. But I'm wondering if there is a better way to achieve that?
>> >> Is there a queue where I should place those regions to compact
>> >> instead? Also, I need to know (even if it's just in the logs) when
>> >> those compactions are done.
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >
>>
>

Re: Best way to compact a region after a move?

Posted by Ted Yu <yu...@gmail.com>.
You can find how to dynamically load coprocessor in
hbase-server/src/main/ruby/shell/commands/alter.rb

There're ample test cases which show you how to use RegionObserver, e.g.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java

Yes, you can attach your coprocessor to the table(s) which you want the
custom compaction. The coprocessor would be deployed on region servers.

Cheers

On Sun, Dec 30, 2012 at 9:47 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Ted,
>
> Thanks for your reply.
>
> I looked at the RegionObserver and I will dig this way.  I think I
> found what I need in it.
>
> How can I attach it to HBase? Should I do that on all the servers? On
> the master only and it will replicate? Should I attached it to each
> regions? Or directly to the table?
>
> Thanks,
>
> JM
>
>
>
> 2012/12/30, Ted Yu <yu...@gmail.com>:
> > balancerCluster() executes on master. Compaction is region server
> activity.
> > So they don't pair naturally.
> >
> > I answered first part of the question in the thread titled 'How to know
> > it's time for a major compaction?':
> >
> > In RegionObserver, we already have the following hook:
> >
> >   /**
> >    * Called after the region is reported as open to the master.
> >    * @param c the environment provided by the region server
> >    */
> >   void postOpen(final ObserverContext<
> > RegionCoprocessorEnvironment> c);
> >
> > Auto-compaction logic can be triggered through the above hook.
> >
> > Take a look at the following hook for the second part of your question:
> >
> >   void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
> > final HStore store,
> >       StoreFile resultFile) throws IOException;
> >
> > Cheers
> >
> > On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi,
> >>
> >> When I'm balancing manually the regions on my cluster, and I want to
> >> make sure they are local, so I want to major_compact them each time
> >> I'm moving them.
> >>
> >> On the balanceCluster method, we are returning a list of region to
> >> move. Which mean they are not yet moved, so I can't compact them
> >> there.
> >>
> >> Is there a place where I shoud hook to compact those regions?
> >>
> >> So far, the only idea I found was to start a thread on the
> >> balancerCluster, wait 1 minute, and compact all the regions I
> >> returned. But I'm wondering if there is a better way to achieve that?
> >> Is there a queue where I should place those regions to compact
> >> instead? Also, I need to know (even if it's just in the logs) when
> >> those compactions are done.
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >
>

Re: Best way to compact a region after a move?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Ted,

Thanks for your reply.

I looked at the RegionObserver and I will dig this way.  I think I
found what I need in it.

How can I attach it to HBase? Should I do that on all the servers? On
the master only and it will replicate? Should I attached it to each
regions? Or directly to the table?

Thanks,

JM



2012/12/30, Ted Yu <yu...@gmail.com>:
> balancerCluster() executes on master. Compaction is region server activity.
> So they don't pair naturally.
>
> I answered first part of the question in the thread titled 'How to know
> it's time for a major compaction?':
>
> In RegionObserver, we already have the following hook:
>
>   /**
>    * Called after the region is reported as open to the master.
>    * @param c the environment provided by the region server
>    */
>   void postOpen(final ObserverContext<
> RegionCoprocessorEnvironment> c);
>
> Auto-compaction logic can be triggered through the above hook.
>
> Take a look at the following hook for the second part of your question:
>
>   void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
> final HStore store,
>       StoreFile resultFile) throws IOException;
>
> Cheers
>
> On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> When I'm balancing manually the regions on my cluster, and I want to
>> make sure they are local, so I want to major_compact them each time
>> I'm moving them.
>>
>> On the balanceCluster method, we are returning a list of region to
>> move. Which mean they are not yet moved, so I can't compact them
>> there.
>>
>> Is there a place where I shoud hook to compact those regions?
>>
>> So far, the only idea I found was to start a thread on the
>> balancerCluster, wait 1 minute, and compact all the regions I
>> returned. But I'm wondering if there is a better way to achieve that?
>> Is there a queue where I should place those regions to compact
>> instead? Also, I need to know (even if it's just in the logs) when
>> those compactions are done.
>>
>> Thanks,
>>
>> JM
>>
>

Re: Best way to compact a region after a move?

Posted by Ted Yu <yu...@gmail.com>.
balancerCluster() executes on master. Compaction is region server activity.
So they don't pair naturally.

I answered first part of the question in the thread titled 'How to know
it's time for a major compaction?':

In RegionObserver, we already have the following hook:

  /**
   * Called after the region is reported as open to the master.
   * @param c the environment provided by the region server
   */
  void postOpen(final ObserverContext<
RegionCoprocessorEnvironment> c);

Auto-compaction logic can be triggered through the above hook.

Take a look at the following hook for the second part of your question:

  void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
final HStore store,
      StoreFile resultFile) throws IOException;

Cheers

On Sun, Dec 30, 2012 at 8:25 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> When I'm balancing manually the regions on my cluster, and I want to
> make sure they are local, so I want to major_compact them each time
> I'm moving them.
>
> On the balanceCluster method, we are returning a list of region to
> move. Which mean they are not yet moved, so I can't compact them
> there.
>
> Is there a place where I shoud hook to compact those regions?
>
> So far, the only idea I found was to start a thread on the
> balancerCluster, wait 1 minute, and compact all the regions I
> returned. But I'm wondering if there is a better way to achieve that?
> Is there a queue where I should place those regions to compact
> instead? Also, I need to know (even if it's just in the logs) when
> those compactions are done.
>
> Thanks,
>
> JM
>