You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/02/22 13:37:32 UTC

[GitHub] [accumulo] Manno15 opened a new pull request #1944: Closes #1929. Add property verification for Sampler Options

Manno15 opened a new pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944


   Closes #1929. 
   
   Sampler options didn't have proper validity checks like other properties (it only made sure the prefix was valid but not the options part). This PR adds that verification so the case the issue specified cannot occur in the future.  It does stand to reason that we might want to adjust the shell API in the future so something cannot hang when an error occurs but with proper validity checks that might not be necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582183378



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       The default implementation should probably be a no-op. It is assumed that the map of all the options will be non-null, because Accumulo would be responsible for preparing that map itself, but it's fine if you want to check for that, to be sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 edited a comment on pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 edited a comment on pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#issuecomment-784199344


   > Another way you might provide early validation of these properties is to add some sort of options validation to samplers, either by adding a validateOptions method or similar to the interface as a default method, and then call it when validating the config. Or, the hanging issue could be addressed by ensuring any exception thrown in the sampler's init method will trigger an error in the logs and skip the sampler, rather than trigger a retry behavior that hangs.
   
   Anything that utilizes the `AbstractHashSampler` does have validity checks already. Which is where the original error came from. Seems the best way to tackle this is to handle the hanging issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582202305



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Nevermind. You are correct. I just realized the null gets thrown before reaching my validation check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582978083



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/RowColumnSampler.java
##########
@@ -80,8 +83,11 @@ private boolean hashField(SamplerConfiguration config, String field) {
   }
 
   @Override
-  protected boolean isValidOption(String option) {
-    return super.isValidOption(option) || VALID_OPTIONS.contains(option);
+  public void validateOptions(Map<String,String> config) {
+    super.validateOptions(config);
+    for (Map.Entry<String,String> entry : config.entrySet()) {

Review comment:
       Good catch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#issuecomment-783468646


   Another way you might provide early validation of these properties is to add some sort of options validation to samplers, either by adding a validateOptions method or similar to the interface as a default method, and then call it when validating the config.
   
   Or, the hanging issue could be addressed by ensuring any exception thrown in the sampler's init method will trigger an error in the logs and skip the sampler, rather than trigger a retry behavior that hangs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582964295



##########
File path: core/src/main/java/org/apache/accumulo/core/sample/impl/SamplerFactory.java
##########
@@ -38,13 +42,16 @@ public static Sampler newSampler(SamplerConfigurationImpl config, AccumuloConfig
         clazz = ClassLoaderUtil.loadClass(context, config.getClassName(), Sampler.class);
 
       Sampler sampler = clazz.getDeclaredConstructor().newInstance();
-
+      sampler.validateOptions(config.getOptions());
       sampler.init(config.toSamplerConfiguration());
 
       return sampler;
 
     } catch (ReflectiveOperationException e) {
       throw new RuntimeException(e);
+    } catch (IllegalArgumentException | NullPointerException e) {
+      log.error("Cannot init sampler, invalid configuration: {}", e.getMessage(), e);
+      return null;

Review comment:
       It's a little weird that if the class can't be loaded, then it throws a runtime exception, but if it can't be started because of invalid config or some other problem, then it logs and returns null. I'm thinking these should probably be consistent. I would do something like:
   
   ```suggestion
       } catch (ReflectiveOperationException | RuntimeException e) {
         log.error("Cannot initialize sampler {}: {}", config.getClassName(), e.getMessage(), e);
         return null;
   ```
   

##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/RowColumnSampler.java
##########
@@ -80,8 +83,11 @@ private boolean hashField(SamplerConfiguration config, String field) {
   }
 
   @Override
-  protected boolean isValidOption(String option) {
-    return super.isValidOption(option) || VALID_OPTIONS.contains(option);
+  public void validateOptions(Map<String,String> config) {
+    super.validateOptions(config);
+    for (Map.Entry<String,String> entry : config.entrySet()) {

Review comment:
       Since you're only using the key, and not validating any values in this implementation, you could just loop over the `keySet()` rather than the `entrySet()`.

##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,14 +58,28 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
    */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(isValid(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(isValid(entry.getValue()), "Unknown value for hasher: %s", entry.getValue());
 
-  protected boolean isValidOption(String option) {
-    return VALID_OPTIONS.contains(option);
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
+
+  protected boolean isValid(String entry) {
+    return VALID_OPTIONS.contains(entry) || VALID_VALUES_HASHER.contains(entry);

Review comment:
       Each half of this method's implementation is checking for different things. The left side of the `||` checks keys, whereas the right side checks values. Each case should probably just be inlined into their respective areas of the code, and this method can be deleted.

##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,14 +58,28 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
    */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(isValid(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(isValid(entry.getValue()), "Unknown value for hasher: %s", entry.getValue());
 
-  protected boolean isValidOption(String option) {
-    return VALID_OPTIONS.contains(option);
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
+
+  protected boolean isValid(String entry) {
+    return VALID_OPTIONS.contains(entry) || VALID_VALUES_HASHER.contains(entry);

Review comment:
       Or, since this method is protected (public API), it should be deprecated, and its original implementation left intact.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583054303



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       I guess it was something on my end to create extra white space. I had what you committed before and the build failed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582191475



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       I went ahead and kept the check. If the user forgets to even mention the option before compacting, it will cause the shell to hang.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583007789



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,14 +58,28 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
    */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(isValid(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(isValid(entry.getValue()), "Unknown value for hasher: %s", entry.getValue());
 
-  protected boolean isValidOption(String option) {
-    return VALID_OPTIONS.contains(option);
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
+
+  protected boolean isValid(String entry) {
+    return VALID_OPTIONS.contains(entry) || VALID_VALUES_HASHER.contains(entry);

Review comment:
       My lasted commit includes these changes. I did revert that function and deprecated it. Let me know if I need to do something else for it or not. Will add comments above to explain the deprecation (might not be necessary). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583038288



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       That is what caused the other build error. I couldn't win with either.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583074204



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       When the formatter automatically adds a line, it leaves a space after the `*` to indent the section. That causes checkstyle to complain. However, if you delete that extra space, the formatter is fine with it and will not reformat it to add it again. My change added the line back without the extra space that caused it to fail previously. It's just an annoying quirk of the formatter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583076857



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       > When the formatter automatically adds a line, it leaves a space after the * to indent the section. That causes checkstyle to complain. However, if you delete that extra space, the formatter is fine with it and will not reformat it to add it again. My change added the line back without the extra space that caused it to fail previously. It's just an annoying quirk of the formatted.
   
   ah, that makes sense. Thanks for fixing it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#issuecomment-785050818


   @ctubbsii I went ahead and added the validateOptions in the sampler interface. With what I have now, it will log an error message, and the sampler returns `null` so the rest of the compaction can complete as normal. The shell thread now returns and does not hang.  However, the shell still doesn't see the error so the user will have to realize that their sampler options didn't go through and make those corrections by looking at the monitor/logs. I marked this as a draft since I am still working on some things to hopefully improve functionality but I wanted to get the base idea out there for you to look at when you get the chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582136481



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -77,13 +74,6 @@ public void init(SamplerConfiguration config) {
     String hasherOpt = config.getOptions().get("hasher");
     String modulusOpt = config.getOptions().get("modulus");
 
-    requireNonNull(hasherOpt, "Hasher not specified");
-    requireNonNull(modulusOpt, "Modulus not specified");

Review comment:
       Ah, you are correct. I misread that. My future changes should handle this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583075566



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       I actually think it might be useful to add the trailing space removal as a feature to https://github.com/revelc/formatter-maven-plugin since the Eclipse formatter library it uses doesn't have a feature to remove the trailing whitespace itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582211646



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Made these changes in my recent commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#issuecomment-785980727


   Marking this back to ready for review. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582134602



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -77,13 +74,6 @@ public void init(SamplerConfiguration config) {
     String hasherOpt = config.getOptions().get("hasher");
     String modulusOpt = config.getOptions().get("modulus");
 
-    requireNonNull(hasherOpt, "Hasher not specified");
-    requireNonNull(modulusOpt, "Modulus not specified");

Review comment:
       Not quite. What you added was a check that the key was not null, which might be redundant, since the fact that it exists implies it is not null. This code here was checking that the values (not the keys) that correspond to these option keys was not null.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r580353529



##########
File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
##########
@@ -1352,6 +1352,17 @@ public static boolean isValidZooPropertyKey(String key) {
         || key.startsWith(REPLICATION_PREFIX.getKey());
   }
 
+  /**
+   * Checks if the given sampler option is valid
+   *
+   * @param option
+   *          property option
+   * @return true if option is a valid sampler option
+   */
+  public static boolean isValidSamplerOption(String option) {
+    return option.endsWith("hasher") || option.endsWith("modules");
+  }
+

Review comment:
       That is what I was afraid of. Guess there isn't an easy way to guarantee validity. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583007789



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,14 +58,28 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
    */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(isValid(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(isValid(entry.getValue()), "Unknown value for hasher: %s", entry.getValue());
 
-  protected boolean isValidOption(String option) {
-    return VALID_OPTIONS.contains(option);
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
+
+  protected boolean isValid(String entry) {
+    return VALID_OPTIONS.contains(entry) || VALID_VALUES_HASHER.contains(entry);

Review comment:
       My lasted commit includes these changes. I did revert that function and deprecated it. Let me know if I need to do something else for it or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582083168



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Since this is public API and changes the interface, it should be a default method, so that way it won't break custom implementations users have written against the interface that don't have the new method.
   
   The original version of this method was previously only used to throw an exception for unrecognized options, but didn't actually do any validation of the option's values. With the addition of this public API, we can improve that situation. This new method should be able to see the keys as well as their corresponding values. To validate all options' values and detect missing required values, this new method should accept all options at the same time (probably as a `Map<String,String>`).

##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -77,13 +74,6 @@ public void init(SamplerConfiguration config) {
     String hasherOpt = config.getOptions().get("hasher");
     String modulusOpt = config.getOptions().get("modulus");
 
-    requireNonNull(hasherOpt, "Hasher not specified");
-    requireNonNull(modulusOpt, "Modulus not specified");

Review comment:
       This was removed, but was not replaced with any kind of validation that the required options were set with actual values.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582116995



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Good idea, I will look into adding that improvement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 merged pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 merged pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582101165



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -77,13 +74,6 @@ public void init(SamplerConfiguration config) {
     String hasherOpt = config.getOptions().get("hasher");
     String modulusOpt = config.getOptions().get("modulus");
 
-    requireNonNull(hasherOpt, "Hasher not specified");
-    requireNonNull(modulusOpt, "Modulus not specified");

Review comment:
       My recent commit moved the null check to where the other argument is checked. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583007789



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,14 +58,28 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
    */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(isValid(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(isValid(entry.getValue()), "Unknown value for hasher: %s", entry.getValue());
 
-  protected boolean isValidOption(String option) {
-    return VALID_OPTIONS.contains(option);
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
+
+  protected boolean isValid(String entry) {
+    return VALID_OPTIONS.contains(entry) || VALID_VALUES_HASHER.contains(entry);

Review comment:
       My lasted commit includes these changes. I did revert that function and deprecated it. Let me know if I need to do something else for it or not. Will add comments above to explain the deprecation. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582211646



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Made these changes in my recent commit. Still need to add a value check for `modulus` in `AbstractHashSampler`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r580350037



##########
File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
##########
@@ -1352,6 +1352,17 @@ public static boolean isValidZooPropertyKey(String key) {
         || key.startsWith(REPLICATION_PREFIX.getKey());
   }
 
+  /**
+   * Checks if the given sampler option is valid
+   *
+   * @param option
+   *          property option
+   * @return true if option is a valid sampler option
+   */
+  public static boolean isValidSamplerOption(String option) {
+    return option.endsWith("hasher") || option.endsWith("modules");
+  }
+

Review comment:
       Sampler options are arbitrary, and implementation-specific. The available options depend on the user's provided sampler implementation. They do not have to only be one of these options. These are merely example properties placed in the docs that an implementation might have.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r583037568



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/AbstractHashSampler.java
##########
@@ -57,12 +58,33 @@
   private int modulus;
 
   private static final Set<String> VALID_OPTIONS = Set.of("hasher", "modulus");
+  private static final Set<String> VALID_VALUES_HASHER = Set.of("murmur3_32", "md5", "sha1");
+
+  /**
+   * Subclasses with options should override this method to validate subclass options while also
+   * calling {@code super.validateOptions(config)} to validate base class options.
+   */
+  @Override
+  public void validateOptions(Map<String,String> config) {
+    for (Map.Entry<String,String> entry : config.entrySet()) {
+      checkArgument(VALID_OPTIONS.contains(entry.getKey()), "Unknown option: %s", entry.getKey());
+
+      if (entry.getKey().equals("hasher"))
+        checkArgument(VALID_VALUES_HASHER.contains(entry.getValue()),
+            "Unknown value for hasher: %s", entry.getValue());
+
+      if (entry.getKey().equals("modulus"))
+        checkArgument(Integer.parseInt(entry.getValue()) > 0,
+            "Improper Integer value for modulus: %s", entry.getValue());
+    }
+  }
 
   /**
    * Subclasses with options should override this method and return true if the option is valid for
    * the subclass or if {@code super.isValidOption(opt)} returns true.
+   * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}

Review comment:
       ```suggestion
      *
      * @deprecated since 2.1.0, replaced by {@link #validateOptions(Map)}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582988352



##########
File path: core/src/main/java/org/apache/accumulo/core/sample/impl/SamplerFactory.java
##########
@@ -38,13 +42,16 @@ public static Sampler newSampler(SamplerConfigurationImpl config, AccumuloConfig
         clazz = ClassLoaderUtil.loadClass(context, config.getClassName(), Sampler.class);
 
       Sampler sampler = clazz.getDeclaredConstructor().newInstance();
-
+      sampler.validateOptions(config.getOptions());
       sampler.init(config.toSamplerConfiguration());
 
       return sampler;
 
     } catch (ReflectiveOperationException e) {
       throw new RuntimeException(e);
+    } catch (IllegalArgumentException | NullPointerException e) {
+      log.error("Cannot init sampler, invalid configuration: {}", e.getMessage(), e);
+      return null;

Review comment:
       Good idea, I like this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on a change in pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#discussion_r582166728



##########
File path: core/src/main/java/org/apache/accumulo/core/client/sample/Sampler.java
##########
@@ -61,4 +61,11 @@
    *         Return false if it should not be included.
    */
   boolean accept(Key k);
+
+  /**
+   * @param option
+   *          Option to validate.
+   * @return True if option is a valid sampler option. False otherwise.
+   */
+  boolean isValidOption(String option);

Review comment:
       Is there anything we should do for the default case? I currently have it verified to not be null. Not sure what else to add to the interface itself. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1944: Closes #1929. Add property verification for Sampler Options

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1944:
URL: https://github.com/apache/accumulo/pull/1944#issuecomment-784199344


   > Another way you might provide early validation of these properties is to add some sort of options validation to samplers, either by adding a validateOptions method or similar to the interface as a default method, and then call it when validating the config. Or, the hanging issue could be addressed by ensuring any exception thrown in the sampler's init method will trigger an error in the logs and skip the sampler, rather than trigger a retry behavior that hangs.
   
   Do you know if users have to implement `AbstractHashSampler` when creating a new sampler? Most of the already configured samplers have validity checks already. Which is where the original error came from. Seems the best way to tackle this is to handle the hanging issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org