You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/09/22 15:03:33 UTC

[GitHub] [ozone] sodonnel opened a new pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

sodonnel opened a new pull request #2674:
URL: https://github.com/apache/ozone/pull/2674


   ## What changes were proposed in this pull request?
   
   We need to store the codes (rs / xor / etc) and the EC stripe size in ECReplicationConfig.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5775
   
   ## How was this patch tested?
   
   New and existing tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2674:
URL: https://github.com/apache/ozone/pull/2674#discussion_r715105268



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -20,45 +20,117 @@
 
 import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
 
+import java.util.EnumSet;
 import java.util.Objects;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 /**
  * Replication configuration for EC replication.
  */
 public class ECReplicationConfig implements ReplicationConfig {
-  
-  private static final Pattern STRING_FORMAT = Pattern.compile("(\\d+)-(\\d+)");
-  
+
+  // TODO - should this enum be defined in the protobuf rather than here? Right
+  //        the proto will carry a string. It might be more flexible for the
+  //        constants to be defined in code rather than proto?
+
+  /**
+   * Enum defining the allowed list of ECCodecs.
+   */
+  public enum EcCodec {
+    RS, XOR;
+
+    public static String allValuesAsString() {
+      return EnumSet.allOf(EcCodec.class)
+          .stream()
+          .map(Enum::toString)
+          .collect(Collectors.joining(","));
+    }
+  }
+
+  // Acceptable patterns are like:
+  //   rs-3-2-1024k
+  //   RS-3-2-2048
+  //   XOR-10-4-4096K
+  private static final Pattern STRING_FORMAT
+      = Pattern.compile("([a-zA-Z]+)-(\\d+)-(\\d+)-(\\d+)((?:k|K))?");
+
   private int data;
 
   private int parity;
 
+  // TODO - the default chunk size is 4MB - is EC defaulting to 1MB or 4MB
+  //        stripe width? Should we default this to the chunk size setting?
+  private int ecChunkSize = 1024 * 1024;

Review comment:
       is that to reduce number of putblock calls? 
   one concern to have 4MB is, we would do putBlock once we write 12MB (if data=3). Until then data is not committed at DN. with data=10, we would do putBlocks at 40MB. And also we have to cache 40MB in memory at client. Other side, keeping 1MB would increase number putBlock calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2674:
URL: https://github.com/apache/ozone/pull/2674


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2674:
URL: https://github.com/apache/ozone/pull/2674#discussion_r716821900



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -20,45 +20,117 @@
 
 import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
 
+import java.util.EnumSet;
 import java.util.Objects;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 /**
  * Replication configuration for EC replication.
  */
 public class ECReplicationConfig implements ReplicationConfig {
-  
-  private static final Pattern STRING_FORMAT = Pattern.compile("(\\d+)-(\\d+)");
-  
+
+  // TODO - should this enum be defined in the protobuf rather than here? Right
+  //        the proto will carry a string. It might be more flexible for the
+  //        constants to be defined in code rather than proto?
+
+  /**
+   * Enum defining the allowed list of ECCodecs.
+   */
+  public enum EcCodec {
+    RS, XOR;
+
+    public static String allValuesAsString() {
+      return EnumSet.allOf(EcCodec.class)
+          .stream()
+          .map(Enum::toString)
+          .collect(Collectors.joining(","));
+    }
+  }
+
+  // Acceptable patterns are like:
+  //   rs-3-2-1024k
+  //   RS-3-2-2048
+  //   XOR-10-4-4096K
+  private static final Pattern STRING_FORMAT
+      = Pattern.compile("([a-zA-Z]+)-(\\d+)-(\\d+)-(\\d+)((?:k|K))?");
+
   private int data;
 
   private int parity;
 
+  // TODO - the default chunk size is 4MB - is EC defaulting to 1MB or 4MB
+  //        stripe width? Should we default this to the chunk size setting?
+  private int ecChunkSize = 1024 * 1024;

Review comment:
       I don't have an opinion on it either way. I was really just asking the question if the default EC chunksize should be 1MB or 4MB. I am fine with it being 1MB. Infact it probably makes sense to be 1MB for EC rather than 4MB so the smaller files are stripped across the containers better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2674:
URL: https://github.com/apache/ozone/pull/2674


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org