You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by keith-turner <gi...@git.apache.org> on 2015/09/22 02:39:40 UTC

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

GitHub user keith-turner opened a pull request:

    https://github.com/apache/accumulo/pull/47

    ACCUMULO-2232 Added options to Combiner for handling deletes

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/keith-turner/accumulo ACCUMULO-2232

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #47
    
----
commit 0ca928603078ed1698186515325b20eb98b4515a
Author: Keith Turner <kt...@apache.org>
Date:   2015-09-21T19:23:18Z

    ACCUMULO-2232 Added options to Combiner for handling deletes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052506
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -227,10 +271,25 @@ public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> op
           throw new IllegalArgumentException("The " + COLUMNS_OPTION + " must not be empty");
     
         combiners = new ColumnSet(Lists.newArrayList(Splitter.on(",").split(encodedColumns)));
    +
    +    isPartialCompaction = ((env.getIteratorScope() == IteratorScope.majc) && !env.isFullMajorCompaction());
    +    if (options.containsKey(DELETE_HANDLING_ACTION_OPTION)) {
    +      deleteHandlingAction = DeleteHandlingAction.valueOf(options.get(DELETE_HANDLING_ACTION_OPTION));
    --- End diff --
    
    A warning message with the actual property being checked (DELETE_HANDLING_ACTION_OPTION) and the value that we have no enum-value for would be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/47#issuecomment-142362740
  
    In 31fcf00 I changed the behaviour slightly.
    
    I dropped the option to throw an exception.  I was thinking this option could prevent problems, but it can not.  It can still only detect problems.  For detecting problems I think logging an error is sufficient.
    
    I also changed logging to log on partial and full major compactions.   If a full major compaction sees a delete, then its evidence that a problem could have occurred in the past with a partial compaction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/47#issuecomment-142326951
  
    @joshelser thanks for the feedback, it was very useful.   I think I fixed all the issues you found.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40442327
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column> c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns) {
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    +   */
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a major compaction. An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a major compaction and logged an error, it will not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts the files in a tablet, it only compacts a subset of the files. If a delete
    +   * marker exists in one of the files that is not being compacted, then data that should be deleted may be combined. See
    +   * <a href="https://issues.apache.org/jira/browse/ACCUMULO-2232">ACCUMULO-2232</a> for more information.
    +   *
    +   * <p>
    +   * This method allows users to configure how they want to handle the combination of delete markers, combiners, and major compactions. The default behavior is
    +   * {@link DeleteHandlingAction#LOG_ERROR}. See the javadoc on each {@link DeleteHandlingAction} enum for a description of each option.
    +   *
    +   * <p>
    +   * For correctness deletes should not be used with columns that are combined OR the {@link DeleteHandlingAction#REDUCE_ON_FULL_COMPACTION_ONLY} option should
    +   * be used. Only reducing on full major compactions may have negative performance implications.
    +   *
    --- End diff --
    
    I don't think this should be coupled with this fix.  Also not sure about doing this in 1.6.  I opened a new issue [ACCUMULO-4011](https://issues.apache.org/jira/browse/ACCUMULO-4011)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052439
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -149,6 +159,37 @@ public void next() throws IOException {
     
       private Key workKey = new Key();
     
    +  private static Cache<String,Long> loggedMsgCache = CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build();
    --- End diff --
    
    +final


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/47#issuecomment-142327147
  
    Great. I'll try to take another peek today, but if you think it's ready, don't feel obligated to wait on me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40147270
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column> c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns) {
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    --- End diff --
    
    Instead of offering the options `IGNORE` and `LOG_ERROR`, we should probably just always log a warning (if `REDUCE_ON_FULL_COMPACTION_ONLY` is not set) and let users disable it in the log4j config, rather than in code.
    
    The `REDUCE_ON_FULL_COMPACTION_ONLY` should be treated as a distinct boolean value which can be set to `true` or `false` as a normal iterator option (similar to `Filter`'s `negate` option), with a default value of `false` to preserve existing behavior.
    
    That would also minimize API changes, especially for 1.6.x and 1.7.x. While this isn't technically public API, it'd still be good to minimize API impact on users using those versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052699
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -149,6 +159,37 @@ public void next() throws IOException {
     
       private Key workKey = new Key();
     
    +  private static Cache<String,Long> loggedMsgCache = CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build();
    --- End diff --
    
    Would be safer to include a high-watermark on the size of the cache. Maybe 500 elements just to prevent catastrophic failure if we somehow create a bug that causes things to be added to this cache incorrectly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052565
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java ---
    @@ -786,4 +816,107 @@ public void testAdds() {
         assertEquals(LongCombiner.safeAdd(Long.MAX_VALUE - 5, 5), Long.MAX_VALUE);
       }
     
    +  private TreeMap<Key,Value> readAll(SortedKeyValueIterator<Key,Value> combiner) throws Exception {
    +    TreeMap<Key,Value> ret = new TreeMap<Key,Value>();
    +
    +    combiner.seek(new Range(), EMPTY_COL_FAMS, false);
    +
    +    while (combiner.hasTop()) {
    +      ret.put(new Key(combiner.getTopKey()), new Value(combiner.getTopValue()));
    +      combiner.next();
    +    }
    +
    +    return ret;
    +  }
    +
    +  private void runDeleteHandlingTest(TreeMap<Key,Value> input, TreeMap<Key,Value> expected, DeleteHandlingAction dha, IteratorEnvironment env)
    +      throws Exception {
    +    runDeleteHandlingTest(input, expected, dha, env, null);
    +  }
    +
    +  private void runDeleteHandlingTest(TreeMap<Key,Value> input, TreeMap<Key,Value> expected, DeleteHandlingAction dha, IteratorEnvironment env,
    +      String expectedLog) throws Exception {
    +    boolean deepCopy = expected == null;
    +
    +    StringWriter writer = new StringWriter();
    +    WriterAppender appender = new WriterAppender(new PatternLayout("%p, %m%n"), writer);
    +    Logger logger = Logger.getLogger(Combiner.class);
    +    boolean additivity = logger.getAdditivity();
    +    try {
    +      logger.addAppender(appender);
    +      logger.setAdditivity(false);
    +
    +      Combiner ai = new SummingCombiner();
    +
    +      IteratorSetting is = new IteratorSetting(1, SummingCombiner.class);
    +      SummingCombiner.setEncodingType(is, LongCombiner.StringEncoder.class);
    +      Combiner.setColumns(is, Collections.singletonList(new IteratorSetting.Column("cf001")));
    +      if (dha != null) {
    +        Combiner.setDeleteHandlingAction(is, dha);
    +      }
    +
    +      ai.init(new SortedMapIterator(input), is.getOptions(), env);
    +
    +      if (deepCopy)
    +        assertEquals(expected, readAll(ai.deepCopy(env)));
    +      assertEquals(expected, readAll(ai));
    +
    +    } finally {
    +      logger.removeAppender(appender);
    --- End diff --
    
    Glad to see the try/finally logger reset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052599
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java ---
    @@ -786,4 +816,107 @@ public void testAdds() {
         assertEquals(LongCombiner.safeAdd(Long.MAX_VALUE - 5, 5), Long.MAX_VALUE);
       }
     
    +  private TreeMap<Key,Value> readAll(SortedKeyValueIterator<Key,Value> combiner) throws Exception {
    +    TreeMap<Key,Value> ret = new TreeMap<Key,Value>();
    +
    +    combiner.seek(new Range(), EMPTY_COL_FAMS, false);
    +
    +    while (combiner.hasTop()) {
    +      ret.put(new Key(combiner.getTopKey()), new Value(combiner.getTopValue()));
    +      combiner.next();
    +    }
    +
    +    return ret;
    +  }
    +
    +  private void runDeleteHandlingTest(TreeMap<Key,Value> input, TreeMap<Key,Value> expected, DeleteHandlingAction dha, IteratorEnvironment env)
    +      throws Exception {
    +    runDeleteHandlingTest(input, expected, dha, env, null);
    +  }
    +
    +  private void runDeleteHandlingTest(TreeMap<Key,Value> input, TreeMap<Key,Value> expected, DeleteHandlingAction dha, IteratorEnvironment env,
    +      String expectedLog) throws Exception {
    +    boolean deepCopy = expected == null;
    +
    +    StringWriter writer = new StringWriter();
    +    WriterAppender appender = new WriterAppender(new PatternLayout("%p, %m%n"), writer);
    +    Logger logger = Logger.getLogger(Combiner.class);
    +    boolean additivity = logger.getAdditivity();
    +    try {
    +      logger.addAppender(appender);
    +      logger.setAdditivity(false);
    +
    +      Combiner ai = new SummingCombiner();
    +
    +      IteratorSetting is = new IteratorSetting(1, SummingCombiner.class);
    +      SummingCombiner.setEncodingType(is, LongCombiner.StringEncoder.class);
    +      Combiner.setColumns(is, Collections.singletonList(new IteratorSetting.Column("cf001")));
    +      if (dha != null) {
    +        Combiner.setDeleteHandlingAction(is, dha);
    +      }
    +
    +      ai.init(new SortedMapIterator(input), is.getOptions(), env);
    +
    +      if (deepCopy)
    +        assertEquals(expected, readAll(ai.deepCopy(env)));
    +      assertEquals(expected, readAll(ai));
    +
    +    } finally {
    +      logger.removeAppender(appender);
    +      logger.setAdditivity(additivity);
    +    }
    +
    +    String logMsgs = writer.toString();
    +    if (expectedLog == null) {
    +      Assert.assertTrue(logMsgs, logMsgs.length() == 0);
    +    } else {
    +      logMsgs = logMsgs.replace('\n', ' ');
    +      Assert.assertTrue(logMsgs, logMsgs.matches(expectedLog));
    +    }
    +  }
    +
    +  @Test
    +  public void testDeleteHandling() throws Exception {
    +    Encoder<Long> encoder = LongCombiner.STRING_ENCODER;
    +
    +    TreeMap<Key,Value> input = new TreeMap<Key,Value>();
    +
    +    IteratorEnvironment paritalMajcIe = new CombinerIteratorEnvironment(IteratorScope.majc, false);
    +    IteratorEnvironment fullMajcIe = new CombinerIteratorEnvironment(IteratorScope.majc, true);
    +
    +    // keys that aggregate
    +    nkv(input, 1, 1, 1, 1, false, 4l, encoder);
    +    nkv(input, 1, 1, 1, 2, true, 0l, encoder);
    +    nkv(input, 1, 1, 1, 3, false, 2l, encoder);
    +    nkv(input, 1, 1, 1, 4, false, 9l, encoder);
    +
    +    TreeMap<Key,Value> expected = new TreeMap<Key,Value>();
    +    nkv(expected, 1, 1, 1, 1, false, 4l, encoder);
    +    nkv(expected, 1, 1, 1, 2, true, 0l, encoder);
    +    nkv(expected, 1, 1, 1, 4, false, 11l, encoder);
    +
    +    try {
    +      runDeleteHandlingTest(input, expected, DeleteHandlingAction.THROW_EXCEPTION, paritalMajcIe);
    +      Assert.fail();
    +    } catch (IllegalStateException ise) {
    +      Assert.assertTrue(ise.getMessage().contains("Saw a delete during a partial compaction"));
    --- End diff --
    
    Try to include a meaningful error message if the assertion fails. We should be able to explain the exact reason the test failed just by looking at surefire output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner closed the pull request at:

    https://github.com/apache/accumulo/pull/47


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40442099
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column> c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns) {
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    --- End diff --
    
    I like this approach because it simplified things.  In bbea55d I got rid of the enum and swithced to a boolean.  There is still an addition of a public method to Combiner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052524
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -275,6 +337,10 @@ public boolean validateOptions(Map<String,String> options) {
             throw new IllegalArgumentException("invalid column encoding " + encodedColumns);
         }
     
    +    if (options.containsKey(DELETE_HANDLING_ACTION_OPTION)) {
    +      DeleteHandlingAction.valueOf(options.get(DELETE_HANDLING_ACTION_OPTION));
    --- End diff --
    
    Same suggestion about a meaningful error message here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/47#issuecomment-142363719
  
    :+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40111853
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -149,6 +160,38 @@ public void next() throws IOException {
     
       private Key workKey = new Key();
     
    +  private static final Cache<String,Long> loggedMsgCache = CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).maximumSize(10000).build();
    +
    +  private void sawDelete() {
    +    if (isPartialCompaction) {
    +      switch (deleteHandlingAction) {
    +        case LOG_ERROR:
    +          try {
    +            loggedMsgCache.get(this.getClass().getName(), new Callable<Long>() {
    --- End diff --
    
    At first, I thought "Callable<Void>" might be better, but apparently CacheBuilder doesn't like that, so might be better to use "Callable<Boolean>", because it's more readable to see "Boolean.TRUE" inserted into the cache to denote that a log message has recently occurred. (Unless you were going to use the integer to denote how many times it has been logged before suppressing repeats after some threshold, like 5.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40147937
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column> c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns) {
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    +   */
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a major compaction. An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a major compaction and logged an error, it will not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts the files in a tablet, it only compacts a subset of the files. If a delete
    +   * marker exists in one of the files that is not being compacted, then data that should be deleted may be combined. See
    +   * <a href="https://issues.apache.org/jira/browse/ACCUMULO-2232">ACCUMULO-2232</a> for more information.
    +   *
    +   * <p>
    +   * This method allows users to configure how they want to handle the combination of delete markers, combiners, and major compactions. The default behavior is
    +   * {@link DeleteHandlingAction#LOG_ERROR}. See the javadoc on each {@link DeleteHandlingAction} enum for a description of each option.
    +   *
    +   * <p>
    +   * For correctness deletes should not be used with columns that are combined OR the {@link DeleteHandlingAction#REDUCE_ON_FULL_COMPACTION_ONLY} option should
    +   * be used. Only reducing on full major compactions may have negative performance implications.
    +   *
    --- End diff --
    
    Another option to give users the ability to "delete" without using problematic deletes is to give them the ability to return a `null` value, with the semantics of `null` being "drop them all". This, combined with the ability to detect which iterator scope and and major compaction mode is being run, could be used to drop data in a sensible way, suitable to the user's schema, without inserting problematic delete markers.
    
    If we did that, it'd be good to add that here, with the recommendation to avoid using deletes entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052540
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +378,46 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column> c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns) {
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a partial major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a partial major compaction. An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a partial compaction and logged an error, it will not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Throw an exception when a a delete is observed by a combiner during a partial major compaction.
    +     */
    +    THROW_EXCEPTION,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts the files in a tablet, it only compacts a subset of the files. If a delete
    --- End diff --
    
    Great writeup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052512
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -240,15 +299,18 @@ public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> op
         newInstance.setSource(getSource().deepCopy(env));
         newInstance.combiners = combiners;
         newInstance.combineAllColumns = combineAllColumns;
    +    newInstance.isPartialCompaction = isPartialCompaction;
    +    newInstance.deleteHandlingAction = deleteHandlingAction;
         return newInstance;
       }
     
       @Override
       public IteratorOptions describeOptions() {
         IteratorOptions io = new IteratorOptions("comb", "Combiners apply reduce functions to multiple versions of values with otherwise equal keys", null, null);
    -    io.addNamedOption(ALL_OPTION, "set to true to apply Combiner to every column, otherwise leave blank. if true, " + COLUMNS_OPTION
    -        + " option will be ignored.");
    +    io.addNamedOption(ALL_OPTION,
    --- End diff --
    
    Nit: unrelated formatting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052520
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -240,15 +299,18 @@ public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> op
         newInstance.setSource(getSource().deepCopy(env));
         newInstance.combiners = combiners;
         newInstance.combineAllColumns = combineAllColumns;
    +    newInstance.isPartialCompaction = isPartialCompaction;
    +    newInstance.deleteHandlingAction = deleteHandlingAction;
         return newInstance;
       }
     
       @Override
       public IteratorOptions describeOptions() {
         IteratorOptions io = new IteratorOptions("comb", "Combiners apply reduce functions to multiple versions of values with otherwise equal keys", null, null);
    -    io.addNamedOption(ALL_OPTION, "set to true to apply Combiner to every column, otherwise leave blank. if true, " + COLUMNS_OPTION
    -        + " option will be ignored.");
    +    io.addNamedOption(ALL_OPTION,
    +        "set to true to apply Combiner to every column, otherwise leave blank. if true, " + COLUMNS_OPTION + " option will be ignored.");
         io.addNamedOption(COLUMNS_OPTION, "<col fam>[:<col qual>]{,<col fam>[:<col qual>]} escape non-alphanum chars using %<hex>.");
    +    // TODO
    --- End diff --
    
    Did you forget todo something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/47#issuecomment-142180658
  
    As commented on JIRA before, approach looks good. Pointed out what I see as room for some improvements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-2232 Added options to Combiner for...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052742
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -149,6 +159,37 @@ public void next() throws IOException {
     
       private Key workKey = new Key();
     
    +  private static Cache<String,Long> loggedMsgCache = CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build();
    +
    +  private void sawDelete() {
    +    if (isPartialCompaction) {
    +      switch (deleteHandlingAction) {
    +        case LOG_ERROR:
    +          try {
    +            loggedMsgCache.get(this.getClass().getName(), new Callable<Long>() {
    +              @Override
    +              public Long call() throws Exception {
    +                log.error("Combiner of type " + this.getClass().getSimpleName()
    +                    + " saw a delete during a partial compaction.  This could cause undesired results.  See ACCUMULO-2232.  Will not log subsequent occurences for at least 1 hour.");
    +                return System.currentTimeMillis();
    --- End diff --
    
    Do we really need the current time? Isn't `0` sufficient since we don't inspect the value of the cache entries? A comment that the value returned by the Callable is meaningless would also be nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---