You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org> on 2011/05/01 22:42:03 UTC

[jira] [Created] (HBASE-3842) Refactor Coprocessor Compaction API

Refactor Coprocessor Compaction API
-----------------------------------

                 Key: HBASE-3842
                 URL: https://issues.apache.org/jira/browse/HBASE-3842
             Project: HBase
          Issue Type: Improvement
          Components: coprocessors, regionserver
    Affects Versions: 0.92.0
            Reporter: Nicolas Spiegelberg
            Assignee: Nicolas Spiegelberg
            Priority: Minor
             Fix For: 0.92.0


After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027498#comment-13027498 ] 

Nicolas Spiegelberg commented on HBASE-3842:
--------------------------------------------

Talked a little with Gary Helmling about this.  The main points are:

1. Selection of files for compaction will be decoupled from the actual compaction/merge itself
2. Splits will be completely decoupled from compactions
3. Since we're making all these changes, it would be nice to add the ability to short-circuit & instead use custom compaction algorithms.

Here is an initial proposal for the new API.

{code}
/* Called before selecting what files to compact.  An sorted candidate set for compaction is given and the compaction algorithm will use the remaining contents of the candidate set (after coprocessor) as input to the compaction algorithm.
 *
 * @param candidates : a mutable copy of StoreFile candidates for this Store.  add/remove within the coprocessor will affect the candidate set
 * @return true to skip default selection algorithm and use all files still in candidates
 */
boolean preCompactSelection(byte[] store, List<StoreFile> candidates);  

// @param candidates : StoreFiles in this compaction
// @return if not null, assumes a custom compaction algorithm was used and adds resulting StoreFiles to the Store
List<StoreFile> preCompaction(byte[] store, ImmutableList<StoreFile> candidates);

// @param newFiles : StoreFiles made by this compaction
void postCompaction(byte[] store, List<StoreFile> newFiles);

{code}

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082719#comment-13082719 ] 

stack commented on HBASE-3842:
------------------------------

Allowing scanner wrapping with the 'clunky' null return means done seems fine to me.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095084#comment-13095084 ] 

jiraposter@reviews.apache.org commented on HBASE-3842:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1686/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

This patch adds two new hooks to wrap the selection process for store files to compact:

  void preCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
      final Store store, final List<StoreFile> candidates);

  void postCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
      final Store store, final ImmutableList<StoreFile> selected);

In addition, the existing preCompact and postCompact methods have been refactored as described in JIRA:

  InternalScanner preCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
    final Store store, final InternalScanner scanner);

  void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
    final Store store, StoreFile resultFile);

Coprocessors that wish to override compaction behavior can wrap the provided InternalScanner in preCompact and return their own implementation.  They can then apply custom policy on the fly before returning KeyValues from the scanner.  Alternately, the coprocessor could set the "bypass" flag in preCompact, which will skip the normal process of writing out a new store file.  In this case, the coprocessor is indicating that it will handle the store file writing itself.  Once the coprocessor has written a new store file, it will need to explicitly tell the Store instance to load it using bulkLoadHFile().


This addresses bug HBASE-3842.
    https://issues.apache.org/jira/browse/HBASE-3842


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java d473ba7 
  src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 008d027 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 30c9d69 
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 53645ce 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 655db7d 
  src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java c0b7267 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java c2af6a1 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java a8edb42 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java b370ff0 

Diff: https://reviews.apache.org/r/1686/diff


Testing
-------

Added a new test case -- TestRegionObserverInterface#testCompactionOverride -- to verify custom compaction handling using an InternalScanner implementation.

All additional coprocessor related tests pass.


Thanks,

Gary



> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095692#comment-13095692 ] 

jiraposter@reviews.apache.org commented on HBASE-3842:
------------------------------------------------------



bq.  On 2011-09-01 04:21:24, Andrew Purtell wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java, line 146
bq.  > <https://reviews.apache.org/r/1686/diff/2/?file=37163#file37163line146>
bq.  >
bq.  >     Should this be in its own change set?
bq.  
bq.  Gary Helmling wrote:
bq.      Adding ImmutableList from Google Guava to the RegionObserver.postCompactSelection() signature broke the CP class compilation being done internally here, so I wanted to include a fix.  This seemed the cleanest way.  Alternately, we could change the method signature to use just List, but I do like making the immutability explicit.  It's more self-documenting that way.

Makes sense, thanks.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1686/#review1713
-----------------------------------------------------------


On 2011-09-01 02:35:36, Gary Helmling wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1686/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-01 02:35:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch adds two new hooks to wrap the selection process for store files to compact:
bq.  
bq.    void preCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final List<StoreFile> candidates);
bq.  
bq.    void postCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final ImmutableList<StoreFile> selected);
bq.  
bq.  In addition, the existing preCompact and postCompact methods have been refactored as described in JIRA:
bq.  
bq.    InternalScanner preCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, final InternalScanner scanner);
bq.  
bq.    void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, StoreFile resultFile);
bq.  
bq.  Coprocessors that wish to override compaction behavior can wrap the provided InternalScanner in preCompact and return their own implementation.  They can then apply custom policy on the fly before returning KeyValues from the scanner.  Alternately, the coprocessor could set the "bypass" flag in preCompact, which will skip the normal process of writing out a new store file.  In this case, the coprocessor is indicating that it will handle the store file writing itself.  Once the coprocessor has written a new store file, it will need to explicitly tell the Store instance to load it using bulkLoadHFile().
bq.  
bq.  
bq.  This addresses bug HBASE-3842.
bq.      https://issues.apache.org/jira/browse/HBASE-3842
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java d473ba7 
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 008d027 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 30c9d69 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 53645ce 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 655db7d 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java c0b7267 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java c2af6a1 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java a8edb42 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java b370ff0 
bq.  
bq.  Diff: https://reviews.apache.org/r/1686/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test case -- TestRegionObserverInterface#testCompactionOverride -- to verify custom compaction handling using an InternalScanner implementation.
bq.  
bq.  All additional coprocessor related tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.



> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13075985#comment-13075985 ] 

Gary Helmling edited comment on HBASE-3842 at 8/2/11 12:57 AM:
---------------------------------------------------------------

I think the stacking issue is key here:  are we expecting the common case to be loading a single "CompactionObserver" that overrides the compaction implementation, or loading multiple that each override/customize compaction policy but not necessarily behavior?

I agree on the one hand that having a {{KeyValue}} oriented interface for {{preCompactWrite()}} and {{postCompactWrite()}} may not be sufficient.  At the same time, I don't think we want to force the implementations to write their own {{StoreFiles}} though, as that will be massively inefficient -- for N loaded coprocessors this becomes N compactions being written (assuming we bypass the core compaction code at the end of chaining).

One alternative would be to have {{preCompact}} take the scanner to be used as a parameter, as suggested, and return a scanner instance that would allow overriding policy and mutating KVs, while still relying on the core writer functionality.  This would allow wrapping the store scanner with a custom scanner that inspects and emits KVs as needed on the fly.  In this case, {{preCompact}} would look like:

{code}
StoreScanner preCompact(ObserverContext<~> context, Store store, StoreScanner scanner);
{code}

Wrapping the scanner seems much easier for chaining multiple observers.  On the other hand we lose the clean {{boolean}} return to indicate that core compaction processing should be skipped.  Are there cases that would still want to handling the store file writing portion of the implementation entirely in the coprocessor?  If so, can we still emit a flag to skip normal processing another way?  We could skip normal processing if {{null}} is returned.  Seems a little clunky, but it could work with appropriate documentation.

      was (Author: ghelmling):
    I think the stacking issue is key here:  are we expecting the common case to be loading a single "CompactionObserver" that overrides the compaction implementation, or loading multiple that each override/customize compaction policy but not necessarily behavior?

I agree on the one hand that having a {{KeyValue}} oriented interface for {{preCompactWrite()}} and {{postCompactWrite()}} may not be sufficient.  At the same time, I don't think we want to force the implementations to write their own {{StoreFiles}} though, as that will be massively inefficient -- for N loaded coprocessors this becomes N compactions being written (assuming we bypass the core compaction code at the end of chaining).

One alternative would be to have {{preCompact}} take the scanner to be used as a parameter, as suggested, and return a scanner instance that would allow overriding policy and mutating KVs, while still relying on the core writer functionality.  This would allow wrapping the store scanner with a custom scanner that inspects and emits KVs as needed on the fly.  In this case, {{preCompact}} would look like:

{{code}}
StoreScanner preCompact(ObserverContext<~> context, Store store, StoreScanner scanner);
{{code}}

Wrapping the scanner seems much easier for chaining multiple observers.  On the other hand we lose the clean {{boolean}} return to indicate that core compaction processing should be skipped.  Are there cases that would still want to handling the store file writing portion of the implementation entirely in the coprocessor?  If so, can we still emit a flag to skip normal processing another way?  We could skip normal processing if {{null}} is returned.  Seems a little clunky, but it could work with appropriate documentation.
  
> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076296#comment-13076296 ] 

Andrew Purtell commented on HBASE-3842:
---------------------------------------

bq. are we expecting the common case to be loading a single "CompactionObserver" that overrides the compaction implementation, or loading multiple that each override/customize compaction policy but not necessarily behavior?

I believe in general Coprocessors will not usually be used to wholesale replace anything. People have that option, but the design rationale is about supporting point changes and enhancements. If we do it right, the interception points are designed so the CP implementer needs do only the minimum work necessary to achieve their aims. So I think we will see extensions that in this case here filter out or rewrite specific sets of KVs during compaction, as part of a larger set of smallish changes to other functions that also do something different for specific sets of KVs, or a particular key pattern meaningful to the CP application, etc. We need an API that allows the CP implementer to remain concerned only with their application/extension, not be required to reimplement compaction and understand all the hairy details if they want to alter it even slightly.

bq. One alternative would be to have preCompact take the scanner to be used as a parameter, as suggested, and return a scanner instance that would allow overriding policy and mutating KVs, while still relying on the core writer functionality. 

+1

But we must not have behind the scenes a stacked coprocessor configuration resulting in rewriting store files over and over, if three stacked coprocessors running compaction three (or four!) times over.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095109#comment-13095109 ] 

jiraposter@reviews.apache.org commented on HBASE-3842:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1686/#review1713
-----------------------------------------------------------

Ship it!



src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java
<https://reviews.apache.org/r/1686/#comment3887>

    Should this be in its own change set?



src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
<https://reviews.apache.org/r/1686/#comment3888>

    Nice test.


- Andrew


On 2011-09-01 02:35:36, Gary Helmling wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1686/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-01 02:35:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch adds two new hooks to wrap the selection process for store files to compact:
bq.  
bq.    void preCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final List<StoreFile> candidates);
bq.  
bq.    void postCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final ImmutableList<StoreFile> selected);
bq.  
bq.  In addition, the existing preCompact and postCompact methods have been refactored as described in JIRA:
bq.  
bq.    InternalScanner preCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, final InternalScanner scanner);
bq.  
bq.    void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, StoreFile resultFile);
bq.  
bq.  Coprocessors that wish to override compaction behavior can wrap the provided InternalScanner in preCompact and return their own implementation.  They can then apply custom policy on the fly before returning KeyValues from the scanner.  Alternately, the coprocessor could set the "bypass" flag in preCompact, which will skip the normal process of writing out a new store file.  In this case, the coprocessor is indicating that it will handle the store file writing itself.  Once the coprocessor has written a new store file, it will need to explicitly tell the Store instance to load it using bulkLoadHFile().
bq.  
bq.  
bq.  This addresses bug HBASE-3842.
bq.      https://issues.apache.org/jira/browse/HBASE-3842
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java d473ba7 
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 008d027 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 30c9d69 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 53645ce 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 655db7d 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java c0b7267 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java c2af6a1 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java a8edb42 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java b370ff0 
bq.  
bq.  Diff: https://reviews.apache.org/r/1686/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test case -- TestRegionObserverInterface#testCompactionOverride -- to verify custom compaction handling using an InternalScanner implementation.
bq.  
bq.  All additional coprocessor related tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.



> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027794#comment-13027794 ] 

Andrew Purtell commented on HBASE-3842:
---------------------------------------

bq. If I passed the CompactionRequest object - which contains a Store - to the user, then the coprocessor client could access the MemStore through some Store API?

+1 passing Store instead of merely the byte[] store name

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027498#comment-13027498 ] 

Nicolas Spiegelberg edited comment on HBASE-3842 at 5/1/11 8:57 PM:
--------------------------------------------------------------------

Talked a little with Gary Helmling about this.  The main points are:

1. Selection of files for compaction will be decoupled from the actual compaction/merge itself
2. Splits will be completely decoupled from compactions
3. Since we're making all these changes, it would be nice to add the ability to short-circuit & instead use custom compaction algorithms.

Here is an initial proposal for the new API.

{code}
/* Called before selecting what files to compact.  An sorted candidate set for compaction is given 
and the compaction algorithm will use the remaining contents of the candidate set (after coprocessor) 
as input to the compaction algorithm.
 * @param candidates : a mutable copy of StoreFile candidates for this Store.  add/remove within the 
coprocessor will affect the candidate set
 * @return true to skip default selection algorithm and use all files still in candidates
 *         false will use the default selection algorithm, using the current content of candidates as input
 */
boolean preCompactSelection(byte[] store, List<StoreFile> candidates);  

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param candidates : StoreFiles selected for compaction
 */
void postCompactSelection(byte[] store, ImmutableList<StoreFile> selected);
{code}

{code}
/* Called before the compaction merge algorithm should commence.  Note that the coprocessor user can 
implement his custom merge algorithm at this point.  If you wish to alter the candidate set, this should 
be done in {@link preCompactSelection}.
 * @param toCompact : StoreFiles that will be compacted.
 * @return null == continue with default compaction algorithm
 * @return if not null, assumes a custom compaction algorithm was used and adds resulting StoreFiles to the Store
 */
List<StoreFile> preCompaction(byte[] store, ImmutableList<StoreFile> toCompact);

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param newFiles : StoreFiles made by this compaction
 */
void postCompaction(byte[] store, ImmutableList<StoreFile> newFiles);

{code}

      was (Author: nspiegelberg):
    Talked a little with Gary Helmling about this.  The main points are:

1. Selection of files for compaction will be decoupled from the actual compaction/merge itself
2. Splits will be completely decoupled from compactions
3. Since we're making all these changes, it would be nice to add the ability to short-circuit & instead use custom compaction algorithms.

Here is an initial proposal for the new API.

{code}
/* Called before selecting what files to compact.  An sorted candidate set for compaction is given 
and the compaction algorithm will use the remaining contents of the candidate set (after coprocessor) 
as input to the compaction algorithm.
 * @param candidates : a mutable copy of StoreFile candidates for this Store.  add/remove within the 
coprocessor will affect the candidate set
 * @return true to skip default selection algorithm and use all files still in candidates
 *         false will use the default selection algorithm, using the current content of candidates as input
 */
boolean preCompactSelection(byte[] store, List<StoreFile> candidates);  

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param candidates : StoreFiles selected for compaction
 */
void postCompactSelection(byte[] store, ImmutableList<StoreFile> selected);
{code}

{code}
/* Called before the compaction merge algorithm should commence.  Note that the coprocessor user can 
implement his custom merge algorithm at this point.  If you wish to alter the candidate set, this should 
be done in {@link preCompactSelection}.
 * @param toCompact : StoreFiles that will be compacted.
 * @return null == continue with default compaction algorithm
 * @return if not null, assumes a custom compaction algorithm was used and adds resulting StoreFiles to the Store
 */
List<StoreFile> preCompaction(byte[] store, ImmutableList<StoreFile> toCompact);

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param newFiles : StoreFiles made by this compaction
 */
void postCompaction(byte[] store, List<StoreFile> newFiles);

{code}
  
> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047747#comment-13047747 ] 

stack commented on HBASE-3842:
------------------------------

Do we have a patch for this?  Would like to get 0.92.0 soon.  I'd think that API change should be done before we make our first release with coprocessors.  Let me raise this to critical for 0.92.0  We can change it back down later if it does not look like its going to happen.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061614#comment-13061614 ] 

Andrew Purtell commented on HBASE-3842:
---------------------------------------

Additionally I propose the addition of a new coprocessor observer type, {{CompactionObserver}}, with the following interface:

{code}
/**
 * Notification that compaction is about to begin.
 * @param context Observer context
 * @param store the Store that is about to be compacted
 * @return false to skip compaction
 */
boolean preCompact(ObserverContext<~> context, Store store);

/**
 * Notification that compaction completed
 * @param context Observer context
 * @param store the Store that was compacted
 */
void postCompact(ObserverContext<~> context, Store store);

/**
 * Called before a KV is written to the new store file.
 * <p>
 * WARNING: It is only safe, unless you <i>really</i> know what you are doing,
 * to signal the KV should be dropped by returning {{null}} or to return a KV
 * with only the value changed somehow.
 * @param context Observer context
 * @param store the Store that is being compacted
 * @param kv the KeyValue
 * @return the KeyValue to write, or null if none
 */
KeyValue preCompactWrite(ObserverContext<~> context, Store store, KeyValue kv);

/**
 * Called after a KV is written to the new store file.
 * @param context Observer context
 * @param store the Store that is being compacted
 * @param kv the KeyValue
 */
void postCompactWrite(ObserverContext<~> context, Store store, KeyValue kv);
{code}

If the coprocessor implements this interface, it is hooked into the compaction process such that it can operate on every KV. This is the same model we use for WAL observers.


> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061624#comment-13061624 ] 

Andrew Purtell commented on HBASE-3842:
---------------------------------------

@Nicholas: So then stacking would work like: The first coprocessor in the chain writes its own StoreFile entirely from the preCompact() hook and then swaps out the scanner for one over the result, the next coprocessor in the chain writes its own StoreFile and then swaps out the scanner for one over the result, and so on. Correct?

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-3842:
---------------------------------

    Attachment: HBASE-3842_final.patch

Final patch committed to trunk.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3842_final.patch
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095464#comment-13095464 ] 

jiraposter@reviews.apache.org commented on HBASE-3842:
------------------------------------------------------



bq.  On 2011-09-01 04:21:24, Andrew Purtell wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java, line 146
bq.  > <https://reviews.apache.org/r/1686/diff/2/?file=37163#file37163line146>
bq.  >
bq.  >     Should this be in its own change set?

Adding ImmutableList from Google Guava to the RegionObserver.postCompactSelection() signature broke the CP class compilation being done internally here, so I wanted to include a fix.  This seemed the cleanest way.  Alternately, we could change the method signature to use just List, but I do like making the immutability explicit.  It's more self-documenting that way.


- Gary


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1686/#review1713
-----------------------------------------------------------


On 2011-09-01 02:35:36, Gary Helmling wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1686/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-01 02:35:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch adds two new hooks to wrap the selection process for store files to compact:
bq.  
bq.    void preCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final List<StoreFile> candidates);
bq.  
bq.    void postCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final ImmutableList<StoreFile> selected);
bq.  
bq.  In addition, the existing preCompact and postCompact methods have been refactored as described in JIRA:
bq.  
bq.    InternalScanner preCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, final InternalScanner scanner);
bq.  
bq.    void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, StoreFile resultFile);
bq.  
bq.  Coprocessors that wish to override compaction behavior can wrap the provided InternalScanner in preCompact and return their own implementation.  They can then apply custom policy on the fly before returning KeyValues from the scanner.  Alternately, the coprocessor could set the "bypass" flag in preCompact, which will skip the normal process of writing out a new store file.  In this case, the coprocessor is indicating that it will handle the store file writing itself.  Once the coprocessor has written a new store file, it will need to explicitly tell the Store instance to load it using bulkLoadHFile().
bq.  
bq.  
bq.  This addresses bug HBASE-3842.
bq.      https://issues.apache.org/jira/browse/HBASE-3842
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java d473ba7 
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 008d027 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 30c9d69 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 53645ce 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 655db7d 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java c0b7267 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java c2af6a1 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java a8edb42 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java b370ff0 
bq.  
bq.  Diff: https://reviews.apache.org/r/1686/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test case -- TestRegionObserverInterface#testCompactionOverride -- to verify custom compaction handling using an InternalScanner implementation.
bq.  
bq.  All additional coprocessor related tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.



> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027795#comment-13027795 ] 

stack commented on HBASE-3842:
------------------------------

I like passing CompactionRequest notion (and it gives you access to Store if you need it).

You could do an Interface to narrow what CPs can access.  There may be concurrency/perf reasons for not letting CPs have direct access. 

I have no strong opinion on it N.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling reassigned HBASE-3842:
------------------------------------

    Assignee: Gary Helmling  (was: Nicolas Spiegelberg)

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098407#comment-13098407 ] 

Hudson commented on HBASE-3842:
-------------------------------

Integrated in HBase-TRUNK #2178 (See [https://builds.apache.org/job/HBase-TRUNK/2178/])
    HBASE-3842  Refactor RegionObserver compaction API for easier overriding of policy

garyh : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java


> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3842_final.patch
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling resolved HBASE-3842.
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to trunk.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3842_final.patch
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3842:
-------------------------

         Priority: Major  (was: Minor)
    Fix Version/s: 0.92.0

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13075985#comment-13075985 ] 

Gary Helmling commented on HBASE-3842:
--------------------------------------

I think the stacking issue is key here:  are we expecting the common case to be loading a single "CompactionObserver" that overrides the compaction implementation, or loading multiple that each override/customize compaction policy but not necessarily behavior?

I agree on the one hand that having a {{KeyValue}} oriented interface for {{preCompactWrite()}} and {{postCompactWrite()}} may not be sufficient.  At the same time, I don't think we want to force the implementations to write their own {{StoreFiles}} though, as that will be massively inefficient -- for N loaded coprocessors this becomes N compactions being written (assuming we bypass the core compaction code at the end of chaining).

One alternative would be to have {{preCompact}} take the scanner to be used as a parameter, as suggested, and return a scanner instance that would allow overriding policy and mutating KVs, while still relying on the core writer functionality.  This would allow wrapping the store scanner with a custom scanner that inspects and emits KVs as needed on the fly.  In this case, {{preCompact}} would look like:

{{code}}
StoreScanner preCompact(ObserverContext<~> context, Store store, StoreScanner scanner);
{{code}}

Wrapping the scanner seems much easier for chaining multiple observers.  On the other hand we lose the clean {{boolean}} return to indicate that core compaction processing should be skipped.  Are there cases that would still want to handling the store file writing portion of the implementation entirely in the coprocessor?  If so, can we still emit a flag to skip normal processing another way?  We could skip normal processing if {{null}} is returned.  Seems a little clunky, but it could work with appropriate documentation.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027498#comment-13027498 ] 

Nicolas Spiegelberg edited comment on HBASE-3842 at 5/1/11 8:56 PM:
--------------------------------------------------------------------

Talked a little with Gary Helmling about this.  The main points are:

1. Selection of files for compaction will be decoupled from the actual compaction/merge itself
2. Splits will be completely decoupled from compactions
3. Since we're making all these changes, it would be nice to add the ability to short-circuit & instead use custom compaction algorithms.

Here is an initial proposal for the new API.

{code}
/* Called before selecting what files to compact.  An sorted candidate set for compaction is given 
and the compaction algorithm will use the remaining contents of the candidate set (after coprocessor) 
as input to the compaction algorithm.
 * @param candidates : a mutable copy of StoreFile candidates for this Store.  add/remove within the 
coprocessor will affect the candidate set
 * @return true to skip default selection algorithm and use all files still in candidates
 *         false will use the default selection algorithm, using the current content of candidates as input
 */
boolean preCompactSelection(byte[] store, List<StoreFile> candidates);  

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param candidates : StoreFiles selected for compaction
 */
void postCompactSelection(byte[] store, ImmutableList<StoreFile> selected);
{code}

{code}
/* Called before the compaction merge algorithm should commence.  Note that the coprocessor user can 
implement his custom merge algorithm at this point.  If you wish to alter the candidate set, this should 
be done in {@link preCompactSelection}.
 * @param toCompact : StoreFiles that will be compacted.
 * @return null == continue with default compaction algorithm
 * @return if not null, assumes a custom compaction algorithm was used and adds resulting StoreFiles to the Store
 */
List<StoreFile> preCompaction(byte[] store, ImmutableList<StoreFile> toCompact);

/* Called after selecting what files to compact.  Mostly for notification purposes
 * @param newFiles : StoreFiles made by this compaction
 */
void postCompaction(byte[] store, List<StoreFile> newFiles);

{code}

      was (Author: nspiegelberg):
    Talked a little with Gary Helmling about this.  The main points are:

1. Selection of files for compaction will be decoupled from the actual compaction/merge itself
2. Splits will be completely decoupled from compactions
3. Since we're making all these changes, it would be nice to add the ability to short-circuit & instead use custom compaction algorithms.

Here is an initial proposal for the new API.

{code}
/* Called before selecting what files to compact.  An sorted candidate set for compaction is given and the compaction algorithm will use the remaining contents of the candidate set (after coprocessor) as input to the compaction algorithm.
 *
 * @param candidates : a mutable copy of StoreFile candidates for this Store.  add/remove within the coprocessor will affect the candidate set
 * @return true to skip default selection algorithm and use all files still in candidates
 */
boolean preCompactSelection(byte[] store, List<StoreFile> candidates);  

// @param candidates : StoreFiles in this compaction
// @return if not null, assumes a custom compaction algorithm was used and adds resulting StoreFiles to the Store
List<StoreFile> preCompaction(byte[] store, ImmutableList<StoreFile> candidates);

// @param newFiles : StoreFiles made by this compaction
void postCompaction(byte[] store, List<StoreFile> newFiles);

{code}
  
> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027792#comment-13027792 ] 

Nicolas Spiegelberg commented on HBASE-3842:
--------------------------------------------

@stack: That's a good point about major compaction.  Maybe I should have the input param be CompactionRequest, which contains the Store, File list, and isMajor.  Would there be a problem with giving the user direct access to the Store?  I was a little worried about giving that unless someone else concurred.  Maybe make a separate ICompactionRequest & IStore API for coprocessor contracts?

If I passed the CompactionRequest object - which contains a Store - to the user, then the coprocessor client could access the MemStore through some Store API?

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095124#comment-13095124 ] 

jiraposter@reviews.apache.org commented on HBASE-3842:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1686/#review1714
-----------------------------------------------------------

Ship it!


API looks good to me.  Ship it.


src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
<https://reviews.apache.org/r/1686/#comment3889>

    Nice doc.


- Michael


On 2011-09-01 02:35:36, Gary Helmling wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1686/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-01 02:35:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch adds two new hooks to wrap the selection process for store files to compact:
bq.  
bq.    void preCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final List<StoreFile> candidates);
bq.  
bq.    void postCompactSelection(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.        final Store store, final ImmutableList<StoreFile> selected);
bq.  
bq.  In addition, the existing preCompact and postCompact methods have been refactored as described in JIRA:
bq.  
bq.    InternalScanner preCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, final InternalScanner scanner);
bq.  
bq.    void postCompact(final ObserverContext<RegionCoprocessorEnvironment> c,
bq.      final Store store, StoreFile resultFile);
bq.  
bq.  Coprocessors that wish to override compaction behavior can wrap the provided InternalScanner in preCompact and return their own implementation.  They can then apply custom policy on the fly before returning KeyValues from the scanner.  Alternately, the coprocessor could set the "bypass" flag in preCompact, which will skip the normal process of writing out a new store file.  In this case, the coprocessor is indicating that it will handle the store file writing itself.  Once the coprocessor has written a new store file, it will need to explicitly tell the Store instance to load it using bulkLoadHFile().
bq.  
bq.  
bq.  This addresses bug HBASE-3842.
bq.      https://issues.apache.org/jira/browse/HBASE-3842
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java d473ba7 
bq.    src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 008d027 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 30c9d69 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 53645ce 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 655db7d 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java c0b7267 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java c2af6a1 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java a8edb42 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java b370ff0 
bq.  
bq.  Diff: https://reviews.apache.org/r/1686/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test case -- TestRegionObserverInterface#testCompactionOverride -- to verify custom compaction handling using an InternalScanner implementation.
bq.  
bq.  All additional coprocessor related tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.



> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Gary Helmling
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061619#comment-13061619 ] 

Nicolas Spiegelberg commented on HBASE-3842:
--------------------------------------------

@Andrew: do we really want to call this in an inner loop?  Wouldn't you accomplish the same thing by:

1. passing in the scanner that we'll use for compaction preCompact() so you can write your own StoreFile and mutate KVs
2. synchronously reading the StoreFile postCompact()

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027746#comment-13027746 ] 

stack commented on HBASE-3842:
------------------------------

+1

In postCompactSelection, do you need to know if you have all files or not (So you can figure if its a major or not)? 

Also, don't you want to pass the memstore to the preCompaction so we can implement the flushing compaction where we weave a flush into a compaction result so we don't always create new file on flush?


> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3842) Refactor Coprocessor Compaction API

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-3842:
---------------------------------

    Priority: Critical  (was: Major)

We have a need for this internally, so I'll be working up a patch this week.  As part of the coprocessor API, I think it's critical to get this in 0.92.

> Refactor Coprocessor Compaction API
> -----------------------------------
>
>                 Key: HBASE-3842
>                 URL: https://issues.apache.org/jira/browse/HBASE-3842
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.92.0
>
>
> After HBASE-3797, the compaction logic flow has been significantly altered.  Because of this, the current compaction coprocessor API is insufficient for gaining full insight into compaction requests/results.  Refactor coprocessor API after HBASE-3797 is committed to be more extensible and increase visibility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira