You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/30 02:20:38 UTC

[GitHub] [iceberg] jackye1995 commented on a change in pull request #3181: Parquet: make row group check (min and max record count) configurable.

jackye1995 commented on a change in pull request #3181:
URL: https://github.com/apache/iceberg/pull/3181#discussion_r719007937



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -107,6 +107,18 @@ private TableProperties() {
   public static final String DELETE_PARQUET_COMPRESSION_LEVEL = "write.delete.parquet.compression-level";
   public static final String PARQUET_COMPRESSION_LEVEL_DEFAULT = null;
 
+  public static final String PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT =
+      "write.parquet.row-group-check-min-record-count";
+  public static final String DELETE_PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT =
+      "write.delete.parquet.row-group-check-min-record-count";
+  public static final String PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT_DEFAULT = "100";
+
+  public static final String PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT =
+      "write.parquet.row-group-check-max-record-count";
+  public static final String DELETE_PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT =
+      "write.delete.parquet.row-group-check-max-record-count";
+  public static final String PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT_DEFAULT = "10000";

Review comment:
       this can be integer

##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -316,7 +331,13 @@ static Context dataContext(Map<String, String> config) {
 
         String compressionLevel = config.getOrDefault(PARQUET_COMPRESSION_LEVEL, PARQUET_COMPRESSION_LEVEL_DEFAULT);
 
-        return new Context(rowGroupSize, pageSize, dictionaryPageSize, codec, compressionLevel);
+        int rowGroupCheckMinRecordCount = Integer.parseInt(config.getOrDefault(
+            PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT, PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT_DEFAULT));
+        int rowGroupCheckMaxRecordCount = Integer.parseInt(config.getOrDefault(
+            PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT, PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT_DEFAULT));
+

Review comment:
       I think we need to add some basic validations such as the config value is positive, max is greater than min.

##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -316,7 +331,13 @@ static Context dataContext(Map<String, String> config) {
 
         String compressionLevel = config.getOrDefault(PARQUET_COMPRESSION_LEVEL, PARQUET_COMPRESSION_LEVEL_DEFAULT);
 
-        return new Context(rowGroupSize, pageSize, dictionaryPageSize, codec, compressionLevel);
+        int rowGroupCheckMinRecordCount = Integer.parseInt(config.getOrDefault(

Review comment:
       can use `PropertyUtil.propertyAsInt`

##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -46,7 +46,7 @@
 
 class ParquetWriter<T> implements FileAppender<T>, Closeable {
 
-  private static DynConstructors.Ctor<PageWriteStore> pageStoreCtorParquet = DynConstructors
+  private static final DynConstructors.Ctor<PageWriteStore> pageStoreCtorParquet = DynConstructors

Review comment:
       Thanks for working on that! I think we are accumulating more and more of these warnings, better to fix them as much as possible along the way.

##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -107,6 +107,18 @@ private TableProperties() {
   public static final String DELETE_PARQUET_COMPRESSION_LEVEL = "write.delete.parquet.compression-level";
   public static final String PARQUET_COMPRESSION_LEVEL_DEFAULT = null;
 
+  public static final String PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT =
+      "write.parquet.row-group-check-min-record-count";
+  public static final String DELETE_PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT =
+      "write.delete.parquet.row-group-check-min-record-count";
+  public static final String PARQUET_ROW_GROUP_CHECK_MIN_RECORD_COUNT_DEFAULT = "100";

Review comment:
       this can be integer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org