You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/09/27 15:39:46 UTC

[GitHub] [ozone] umamaheswararao commented on a change in pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

umamaheswararao commented on a change in pull request #2674:
URL: https://github.com/apache/ozone/pull/2674#discussion_r715105268



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -20,45 +20,117 @@
 
 import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
 
+import java.util.EnumSet;
 import java.util.Objects;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 /**
  * Replication configuration for EC replication.
  */
 public class ECReplicationConfig implements ReplicationConfig {
-  
-  private static final Pattern STRING_FORMAT = Pattern.compile("(\\d+)-(\\d+)");
-  
+
+  // TODO - should this enum be defined in the protobuf rather than here? Right
+  //        the proto will carry a string. It might be more flexible for the
+  //        constants to be defined in code rather than proto?
+
+  /**
+   * Enum defining the allowed list of ECCodecs.
+   */
+  public enum EcCodec {
+    RS, XOR;
+
+    public static String allValuesAsString() {
+      return EnumSet.allOf(EcCodec.class)
+          .stream()
+          .map(Enum::toString)
+          .collect(Collectors.joining(","));
+    }
+  }
+
+  // Acceptable patterns are like:
+  //   rs-3-2-1024k
+  //   RS-3-2-2048
+  //   XOR-10-4-4096K
+  private static final Pattern STRING_FORMAT
+      = Pattern.compile("([a-zA-Z]+)-(\\d+)-(\\d+)-(\\d+)((?:k|K))?");
+
   private int data;
 
   private int parity;
 
+  // TODO - the default chunk size is 4MB - is EC defaulting to 1MB or 4MB
+  //        stripe width? Should we default this to the chunk size setting?
+  private int ecChunkSize = 1024 * 1024;

Review comment:
       is that to reduce number of putblock calls? 
   one concern to have 4MB is, we would do putBlock once we write 12MB (if data=3). Until then data is not committed at DN. with data=10, we would do putBlocks at 40MB. And also we have to cache 40MB in memory at client. Other side, keeping 1MB would increase number putBlock calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org