You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/09/27 15:47:46 UTC

[GitHub] [ozone] sodonnel commented on a change in pull request #2674: EC: Add Codec and Stripe Size to ECReplicationConfig

sodonnel commented on a change in pull request #2674:
URL: https://github.com/apache/ozone/pull/2674#discussion_r716821900



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ECReplicationConfig.java
##########
@@ -20,45 +20,117 @@
 
 import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
 
+import java.util.EnumSet;
 import java.util.Objects;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 /**
  * Replication configuration for EC replication.
  */
 public class ECReplicationConfig implements ReplicationConfig {
-  
-  private static final Pattern STRING_FORMAT = Pattern.compile("(\\d+)-(\\d+)");
-  
+
+  // TODO - should this enum be defined in the protobuf rather than here? Right
+  //        the proto will carry a string. It might be more flexible for the
+  //        constants to be defined in code rather than proto?
+
+  /**
+   * Enum defining the allowed list of ECCodecs.
+   */
+  public enum EcCodec {
+    RS, XOR;
+
+    public static String allValuesAsString() {
+      return EnumSet.allOf(EcCodec.class)
+          .stream()
+          .map(Enum::toString)
+          .collect(Collectors.joining(","));
+    }
+  }
+
+  // Acceptable patterns are like:
+  //   rs-3-2-1024k
+  //   RS-3-2-2048
+  //   XOR-10-4-4096K
+  private static final Pattern STRING_FORMAT
+      = Pattern.compile("([a-zA-Z]+)-(\\d+)-(\\d+)-(\\d+)((?:k|K))?");
+
   private int data;
 
   private int parity;
 
+  // TODO - the default chunk size is 4MB - is EC defaulting to 1MB or 4MB
+  //        stripe width? Should we default this to the chunk size setting?
+  private int ecChunkSize = 1024 * 1024;

Review comment:
       I don't have an opinion on it either way. I was really just asking the question if the default EC chunksize should be 1MB or 4MB. I am fine with it being 1MB. Infact it probably makes sense to be 1MB for EC rather than 4MB so the smaller files are stripped across the containers better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org