You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/24 19:59:40 UTC

[GitHub] [iceberg] szehon-ho opened a new pull request #3959: Disable metrics for very wide tables to reduce metadata size

szehon-ho opened a new pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959


   Giving a try to the suggestion in the last community sync to address very-wide-table.
   
   This changes the default MetricsMode from "truncate(16)" to "none" if there are over 100 columns (should it be counts?).  This is overriden if the user explicitly sets a default, and of course if a user explicitly sets metrics on a particular column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1039391805


   Thanks @rdblue for merge and all the reviewers!  Sorry didn't see the last style comments before merge, i can prepare another pr for those


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1026330456


   Hi @rdblue @danielcweeks , any thought on this?  Thanks in advance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1033356504


   I am not sure how useful a table property is or whether we should even make it configurable. Specifically, I am worried about adding a rarely used table property. We are usually pretty conservative when it comes to adding new table properties. Is there a good use case for this to be configurable? It seems that if anyone needs to tune the metrics config, the existing framework is already good enough and allows us to set default and per-column modes. I think we should encourage users to set that.
   
   Also, I think 100 is too big. I'd prefer it to be around 24-32. I've seen tables with 100 string columns and the metadata size already had an impact on the performance. Any thoughts on that? Keep in mind we always collect stats for sort columns and will collect stats for identity columns in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1021490735


   @danielcweeks When we discussed this were you thinking about no metrics for tables with more than 100 columns? or no metrics for anything other than the first 100 columns.
   
   Obviously in both cases we keep sort column metrics by default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1030399188


   > @danielcweeks When we discussed this were you thinking about no metrics for tables with more than 100 columns? or no metrics for anything other than the first 100 columns.
   > 
   > Obviously in both cases we keep sort column metrics by default
   
   @RussellSpitzer I would think it would be either all or nothing.  I'm not sure we can speculate that the first 100 columns are somehow more meaningful than those that follow (though I guess you could make that argument).
   
   For now it seems like just taking a simple approach would be better.  There are some edge cases like what happens in the event that someone is incrementally adding columns and it crosses that threshold, but I wouldn't worry about it too much at this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791301333



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -194,7 +194,8 @@ private TableProperties() {
 
   public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column.";
   public static final String DEFAULT_WRITE_METRICS_MODE = "write.metadata.metrics.default";
-  public static final String DEFAULT_WRITE_METRICS_MODE_DEFAULT = "truncate(16)";

Review comment:
       This is changing a public API and to a new type. May need to reconsider this. Would probably be safer to just add the new variable below with a slightly different name?
   
   Although if other folks don't mind, I think that's fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791301333



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -194,7 +194,8 @@ private TableProperties() {
 
   public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column.";
   public static final String DEFAULT_WRITE_METRICS_MODE = "write.metadata.metrics.default";
-  public static final String DEFAULT_WRITE_METRICS_MODE_DEFAULT = "truncate(16)";

Review comment:
       This is changing a public API and to a new type. May need to reconsider this. Would probably be safer to just add the new variable below with a slightly different name?
   
   Although if other folks don't mind, I think that's fine.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       I'm not sure why we had this as a log warn before, seems like this should just fail ... but I don't think we need to change the behavior yet

##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -194,7 +194,8 @@ private TableProperties() {
 
   public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column.";
   public static final String DEFAULT_WRITE_METRICS_MODE = "write.metadata.metrics.default";
-  public static final String DEFAULT_WRITE_METRICS_MODE_DEFAULT = "truncate(16)";

Review comment:
       This is changing a public API and to a new type. May need to reconsider this. Would probably be safer to just add the new variable below with a slightly different name?
   
   Although if other folks don't mind, I think leaving this as is is also fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791983562



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -194,7 +194,8 @@ private TableProperties() {
 
   public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column.";
   public static final String DEFAULT_WRITE_METRICS_MODE = "write.metadata.metrics.default";
-  public static final String DEFAULT_WRITE_METRICS_MODE_DEFAULT = "truncate(16)";

Review comment:
       Sure, changed it back, it made the diff smaller




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r799859124



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       Is there a way to override this behavior?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r806041604



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       The idea here was to not fail writes just because a user messed up a tangentially related config property. I think it's better to allow pipelines to continue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r799869297



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       Good point, I can make a table config , does that sound reasonable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r799873909



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       Was initially thinking like 'write.metadata.metrics.max.columns'='100'?  Yea good point probably catalog level prop is cleaner

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       Was initially thinking like 'write.metadata.metrics.max.columns'='100'?  Yea good point maybe catalog level prop is cleaner




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791985017



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       I might be mistaken, but I think it was the same behavior, warn log is still there. but in any case after changing the default from enum back to a string the diff now looks less.  Now in this method only the original variable name "defaultMode" should be changed to avoid conflict.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r799928644



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       Took another look, and actually it's a bit messy to make this a catalog property in the current code (the Table is serialized and sent to writers, but Catalog is not..).  
   
   Was thinking it makes sense as a table property as well (ie, if i know this table is worth optimizing, set the max columns to be higher).  May be a bit awkward to have a different catalog just for this.  And if user really wants to have one global setting, this change is coming: https://github.com/apache/iceberg/pull/4011




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r805435895



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       I think this is okay for now. Tables will still respect whatever is set as `write.metadata.metrics.default` so this really just changes Iceberg's default in a reasonable way. It is also good to note that metrics for sort columns are automatically promoted to at least `truncate[16]` so it isn't as though we're losing _all_ stats.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1040547652


   @szehon-ho, don't worry about it. I didn't want to keep this outstanding for minor things like that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1039315177






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791305264



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       I'm not sure why we had this as a log warn before, seems like this should just fail ... but I don't think we need to change the behavior yet




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r792007163



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       Yeah I just don't know why we originally had this behavior ... seems odd to me :shrug:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1039315177


   Merging this. Thanks, @szehon-ho and everyone for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r799872514



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       "defaultMetricCollection : all" or something? Don't we already have that?
   
   Also this feels like it should probably be a catalog level prop :shrug:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1035572551


   Hi @aokolnychyi @RussellSpitzer @rdblue @danielcweeks if you guys want another look after the last comments, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r805436120



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,35 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, String defaultMode) {

Review comment:
       Instead of renaming the variable that is used everywhere, I'd just rename the incoming argument. There would be fewer changes if this used `defaultDefaultMode` or something.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#issuecomment-1039391805


   Thanks @rdblue for merge and all the reviewers!  Sorry didn't see the last style comments before merge, i can prepare another pr for those


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r791301333



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -194,7 +194,8 @@ private TableProperties() {
 
   public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column.";
   public static final String DEFAULT_WRITE_METRICS_MODE = "write.metadata.metrics.default";
-  public static final String DEFAULT_WRITE_METRICS_MODE_DEFAULT = "truncate(16)";

Review comment:
       This is changing a public API and to a new type. May need to reconsider this. Would probably be safer to just add the new variable below with a slightly different name?
   
   Although if other folks don't mind, I think leaving this as is is also fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r806041604



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,39 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode MetricsConfig default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, MetricsMode defaultMode) {
     Map<String, MetricsMode> columnModes = Maps.newHashMap();
-    MetricsMode defaultMode;
-    String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
-    try {
-      defaultMode = MetricsModes.fromString(defaultModeAsString);
-    } catch (IllegalArgumentException err) {
-      // Mode was invalid, log the error and use the default
-      LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err);
-      defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT);
+
+    // Handle user override of default mode
+    MetricsMode finalDefaultMode;
+    String defaultModeProp = props.get(DEFAULT_WRITE_METRICS_MODE);
+    if (defaultModeProp != null) {
+      try {
+        finalDefaultMode = MetricsModes.fromString(defaultModeProp);
+      } catch (IllegalArgumentException err) {
+        // User override was invalid, log the error and use the default
+        LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeProp, err);

Review comment:
       The idea here was to not fail writes just because a user messed up a tangentially related config property. I think it's better to allow pipelines to continue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r805435487



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       I think we may already have what we need to do this. Right now, the default metrics mode is set by `write.metadata.metrics.default`. I'd propose the following:
   * If `write.metadata.metrics.default` explicitly set in table properties, always use it
   * If `write.metadata.metrics.default` is not set, check the number of columns
       * If `numCols < 32` then use `truncate[16]`, the current default
       * If `numCols >= 32` then use `none`
   
   That seems like a reasonable way to make this customizable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r805436161



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {
+      defaultMode = DEFAULT_WRITE_METRICS_MODE_DEFAULT;
+    } else {
+      defaultMode = MetricsModes.None.get().toString();
+    }
+    return from(table.properties(), table.sortOrder(), defaultMode);

Review comment:
       Nit: missing newline after control flow block and before `return`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3959: Disable metrics for very wide tables to reduce metadata size

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3959:
URL: https://github.com/apache/iceberg/pull/3959#discussion_r805435487



##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       I think we may already have what we need to do this. Right now, the default metrics mode is set by `write.metadata.metrics.default`. I'd propose the following:
   * If `write.metadata.metrics.default` explicitly set in table properties, always use it
   * If `write.metadata.metrics.default` is not set, check the number of columns
       * If `numCols < 32` then use `truncate[16]`, the current default
       * If `numCols >= 32` then use `none`
   
   That seems like a reasonable way to make this customizable.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {

Review comment:
       I think this is okay for now. Tables will still respect whatever is set as `write.metadata.metrics.default` so this really just changes Iceberg's default in a reasonable way. It is also good to note that metrics for sort columns are automatically promoted to at least `truncate[16]` so it isn't as though we're losing _all_ stats.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -127,24 +135,35 @@ public static MetricsConfig forPositionDelete(Table table) {
     return new MetricsConfig(columnModes.build(), defaultMode);
   }
 
-  private static MetricsConfig from(Map<String, String> props, SortOrder order) {
+  /**
+   * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode.
+   * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default
+   *              (write.metadata.metrics.default)
+   * @param order sort order columns, will be promoted to truncate(16)
+   * @param defaultMode default, if not set by user property
+   * @return metrics configuration
+   */
+  private static MetricsConfig from(Map<String, String> props, SortOrder order, String defaultMode) {

Review comment:
       Instead of renaming the variable that is used everywhere, I'd just rename the incoming argument. There would be fewer changes if this used `defaultDefaultMode` or something.

##########
File path: core/src/main/java/org/apache/iceberg/MetricsConfig.java
##########
@@ -94,15 +96,21 @@ public static MetricsConfig getDefault() {
    **/
   @Deprecated
   public static MetricsConfig fromProperties(Map<String, String> props) {
-    return from(props, null);
+    return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT);
   }
 
   /**
    * Creates a metrics config from a table.
    * @param table iceberg table
    */
   public static MetricsConfig forTable(Table table) {
-    return from(table.properties(), table.sortOrder());
+    String defaultMode;
+    if (table.schema().columns().size() <= MAX_COLUMNS) {
+      defaultMode = DEFAULT_WRITE_METRICS_MODE_DEFAULT;
+    } else {
+      defaultMode = MetricsModes.None.get().toString();
+    }
+    return from(table.properties(), table.sortOrder(), defaultMode);

Review comment:
       Nit: missing newline after control flow block and before `return`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org