You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/14 13:22:30 UTC

[GitHub] [ozone] elek opened a new pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

elek opened a new pull request #2331:
URL: https://github.com/apache/ozone/pull/2331


   ## What changes were proposed in this pull request?
   
   This patch introduces a new string contsructor for `ECReplicationConfig` to make it possible to define EC replication scheme for any new key from CLI.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5247
   
   ## How was this patch tested?
   
   New unit test is added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on a change in pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#discussion_r652453883



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -36,6 +43,29 @@ public ECReplicationConfig(int data, int parity) {
     this.parity = parity;
   }
 
+  public ECReplicationConfig(String string) {
+    final Pattern pattern = Pattern.compile("(\\d+)-(\\d+)");
+    final Matcher matcher = pattern.matcher(string);

Review comment:
       Good point. Let me move it to a static final constant.
   
   Answering the question is slightly harder. Compilation time I couldn't see any optimization:
   
   
   ```
   javap -v org.apache.hadoop.hdds.client.ECReplicationConfig
   
   ...
   
     public org.apache.hadoop.hdds.client.ECReplicationConfig(java.lang.String);
       descriptor: (Ljava/lang/String;)V
       flags: ACC_PUBLIC
       Code:
         stack=3, locals=4, args_size=2
            0: aload_0
            1: invokespecial #1                  // Method java/lang/Object."<init>":()V
            4: ldc           #4                  // String (\d+)-(\d+)
            6: invokestatic  #5                  // Method java/util/regex/Pattern.compile:(Ljava/lang/String;)Ljava/util/regex/Pattern;
            9: astore_2
   ```
   
   The call is really compiled as is to the method.
   
   Runtime optimization it's another question, but I don't think that it's possible easily as hostpost JVM would require understanding about the idempotency of `Pattern.compile` (second call on `Pattern.compile` returns with a value which is equivalent to the first one.)
   
   --> looks better to move the compile to a static field...
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#discussion_r651515758



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -36,6 +43,29 @@ public ECReplicationConfig(int data, int parity) {
     this.parity = parity;
   }
 
+  public ECReplicationConfig(String string) {
+    final Pattern pattern = Pattern.compile("(\\d+)-(\\d+)");
+    final Matcher matcher = pattern.matcher(string);
+    if (!matcher.matches()) {
+      throw new IllegalArgumentException("EC replication config should be " +
+          "defined in the form 3-2, 6-3 or 10-4");
+    }
+
+    data = Integer.parseInt(matcher.group(1));
+    parity = Integer.parseInt(matcher.group(2));
+    if (data <= 0 || parity <= 0) {
+      throw new IllegalArgumentException("Data and parity part in EC " +
+          "replication config supposed to be positive numbers");
+    }
+    if ((data == 3 && parity == 2)
+        || (data == 6 && parity == 3)
+        || (data == 10 && parity == 4)) {
+      return;
+    }
+    LOG.warn("We recommend to use EC 3-2, 6-3 or 10-4. Other combinations " +

Review comment:
       I prefer to restrict to allow only supported combinations. So, +1 to throw exception here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged pull request #2331:
URL: https://github.com/apache/ozone/pull/2331


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] arp7 commented on pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
arp7 commented on pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#issuecomment-860775270


   The change looks good, minor comment - I think we need to throw exception for an unsupported scheme E.g. 10+10, and add a test case for that.
   
   In the future we could make the list of supported policies pluggable/configurable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] arp7 commented on a change in pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#discussion_r651035209



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -36,6 +43,29 @@ public ECReplicationConfig(int data, int parity) {
     this.parity = parity;
   }
 
+  public ECReplicationConfig(String string) {
+    final Pattern pattern = Pattern.compile("(\\d+)-(\\d+)");
+    final Matcher matcher = pattern.matcher(string);
+    if (!matcher.matches()) {
+      throw new IllegalArgumentException("EC replication config should be " +
+          "defined in the form 3-2, 6-3 or 10-4");
+    }
+
+    data = Integer.parseInt(matcher.group(1));
+    parity = Integer.parseInt(matcher.group(2));
+    if (data <= 0 || parity <= 0) {
+      throw new IllegalArgumentException("Data and parity part in EC " +
+          "replication config supposed to be positive numbers");
+    }
+    if ((data == 3 && parity == 2)
+        || (data == 6 && parity == 3)
+        || (data == 10 && parity == 4)) {
+      return;
+    }
+    LOG.warn("We recommend to use EC 3-2, 6-3 or 10-4. Other combinations " +

Review comment:
       Should we throw an exception here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#discussion_r651138621



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -36,6 +43,29 @@ public ECReplicationConfig(int data, int parity) {
     this.parity = parity;
   }
 
+  public ECReplicationConfig(String string) {
+    final Pattern pattern = Pattern.compile("(\\d+)-(\\d+)");
+    final Matcher matcher = pattern.matcher(string);

Review comment:
       This is a question more from my own interest point of view, as I know this is not a performance critical piece of code.
   
   These two variables are effectively constants, but they are defined at the method level. Do you know if Java is smart enough to realise these are constants and evaluate the expression only once? Compared to defining these as "private static final" at the class level, will this perform the same or worse?
   
   I will try to answer this myself with a quick benchmark if I get some time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #2331:
URL: https://github.com/apache/ozone/pull/2331#issuecomment-862142935


   Thanks for the question/suggestion @arp7 and @umamaheswararao. I also considered throwing an exception instead of printing out a warning, but finally, I decided to put only a warning here. Let me share my view here:
   
    1. First of all it's a client-side change. Any serious validation should be placed on the server/OM side. (from this point of view even warning can be removed)
   
    2. It's hard to predict all the possible use cases. I agree that we should introduce a configuration option to make it possible for vendors to limit the available ReplicationConfig/ECReplicationConfig set. But the Apache Ozone code should be as much flexible as possible (IMHO). Example: earlier we hard-coded support of `ONE`/`THREE` replication scheme for closed containers, but it turned out that there is a demand to use other values (like `TWO`)
   
   3. There or other meaningful combinations: for example if dozens of nodes are available and the key sizes are bigger, it may be reasonable to use `6-4` instead of `3-2`. The write amplification is the same, but `6-4` can have better durability (in exchange for more padding + memory usage).
   
   4. I can imagine even 10-10 as a useful option with very specific/smart placement policies (one may require more parity to place them to other racks). But I admit it would be a very rare case.
    
   My suggestion:
   
   Remove the warning from here and put a server-side validation based on the configuration value. I am fine to restrict it to 3-2,6-3,10-4 by default, but power users can modify the defaults if they require it.
   
   Will create this follow-up jira if no objection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #2331: HDDS-5247. EC: Create ECReplicationConfig on client side based on input string

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged pull request #2331:
URL: https://github.com/apache/ozone/pull/2331


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org