You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/29 06:14:57 UTC

[GitHub] [iceberg] ConeyLiu opened a new pull request, #5659: Core: Support set system level properties with environmental variable

ConeyLiu opened a new pull request, #5659:
URL: https://github.com/apache/iceberg/pull/5659

   This add supports setting system level properties with environmental variables. For example: we could control the number of threads for the default Iceberg thread pool with java system property: `iceberg.worker.num-threads` or env: `ICEBERG_WORKER_NUM_THREADS`. This is indeed useful when we want to change those properties while the java system properties have some default settings and we do not want to change them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1159190765


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   This class is now a bit confusing (called SystemProperties but it handles env...), should we just make another class?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1562213139

   Thanks @szehon-ho @zinking @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1561721706

   Merged, thanks @ConeyLiu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1160509216


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   What about adding a new class `RuntimeInfo` to handle both properties and env? And deprecate `SystemProperties`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1185990876


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);
+
+  /** Whether to use the shared worker pool when planning table scans. */
+  public static final ConfigEntry<Boolean> SCAN_THREAD_POOL_ENABLED =
+      new ConfigEntry<>(
+          "iceberg.scan.plan-in-worker-pool",
+          "ICEBERG_SCAN_PLAN_IN_WORKER_POOL",
+          true,
+          Boolean::parseBoolean);
+
+  /**
+   * Maximum number of distinct {@link org.apache.iceberg.io.FileIO} that is allowed to have
+   * associated {@link org.apache.iceberg.io.ContentCache} in memory at a time.
+   */
+  public static final ConfigEntry<Integer> IO_MANIFEST_CACHE_MAX_FILEIO =
+      new ConfigEntry<>(
+          "iceberg.io.manifest.cache.fileio-max",
+          "ICEBERG_IO_MANIFEST_CACHE_FILEIO_MAX",
+          8,
+          Integer::parseUnsignedInt);
+
+  public static class ConfigEntry<T> {
+    private final String propertyKey;
+    private final String envKey;
+    private final T defaultValue;
+    private final Function<String, T> parseFunc;
+
+    private ConfigEntry(
+        String propertyKey, String envKey, T defaultValue, Function<String, T> parseFunc) {
+      this.propertyKey = propertyKey;
+      this.envKey = envKey;
+      this.defaultValue = defaultValue;
+      this.parseFunc = parseFunc;
+    }
+
+    public final String propertyKey() {
+      return propertyKey;
+    }
+
+    public final String envKey() {
+      return envKey;
+    }
+
+    public final T defaultValue() {
+      return defaultValue;
+    }
+
+    public final T value() {
+      String value = System.getProperty(propertyKey);

Review Comment:
   Updated with a cache value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1161054994


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   Using both system property and environment variable to control some settings are very common usage, for example: `ARROW_ENABLE_NULL_CHECK_FOR_GET` and `arrow.enable_null_check_for_get`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1160509216


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   What about adding a new class `SystemConfigs` to handle both properties and env? And deprecate `SystemProperties`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r997256590


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -30,14 +32,23 @@ private SystemProperties() {}
    */
   public static final String WORKER_THREAD_POOL_SIZE_PROP = "iceberg.worker.num-threads";
 
+  public static final String WORKER_THREAD_POOL_SIZE_ENV = "ICEBERG_WORKER_NUM_THREADS";
+
   /** Whether to use the shared worker pool when planning table scans. */
   public static final String SCAN_THREAD_POOL_ENABLED = "iceberg.scan.plan-in-worker-pool";
 
-  static boolean getBoolean(String systemProperty, boolean defaultValue) {
+  public static final String SCAN_THREAD_POOL_ENABLED_ENV = "ICEBERG_SCAN_PLAN_IN_WORKER_POOL";

Review Comment:
   Does this need to be explosed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1159190765


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   This class is now a bit confusing (called SystemProperties but it handles env...), should we just make another class?  
   
   I dont think there's that many places that need this, that we need to always call these utils?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1185990563


##########
core/src/main/java/org/apache/iceberg/util/ThreadPools.java:
##########
@@ -33,11 +33,9 @@ public class ThreadPools {
   private ThreadPools() {}
 
   public static final String WORKER_THREAD_POOL_SIZE_PROP =
-      SystemProperties.WORKER_THREAD_POOL_SIZE_PROP;
+      SystemConfigs.WORKER_THREAD_POOL_SIZE.propertyKey();

Review Comment:
   Deprecated it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zinking commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "zinking (via GitHub)" <gi...@apache.org>.
zinking commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1484988613

   @szehon-ho  @rdblue can we take another look and merge if possible ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1560698555

   gentle ping @szehon-ho


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #5659:
URL: https://github.com/apache/iceberg/pull/5659


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #5659: Core: Support set system level properties with environmental variable

Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1229825390

   cc @rdblue @kbendick @szehon-ho, this is a small update, could you take a look when you are free? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1195014775


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);
+
+  /** Whether to use the shared worker pool when planning table scans. */
+  public static final ConfigEntry<Boolean> SCAN_THREAD_POOL_ENABLED =
+      new ConfigEntry<>(
+          "iceberg.scan.plan-in-worker-pool",
+          "ICEBERG_SCAN_PLAN_IN_WORKER_POOL",
+          true,
+          Boolean::parseBoolean);
+
+  /**
+   * Maximum number of distinct {@link org.apache.iceberg.io.FileIO} that is allowed to have
+   * associated {@link org.apache.iceberg.io.ContentCache} in memory at a time.
+   */
+  public static final ConfigEntry<Integer> IO_MANIFEST_CACHE_MAX_FILEIO =
+      new ConfigEntry<>(
+          "iceberg.io.manifest.cache.fileio-max",
+          "ICEBERG_IO_MANIFEST_CACHE_FILEIO_MAX",
+          8,
+          Integer::parseUnsignedInt);
+
+  public static class ConfigEntry<T> {
+    private final String propertyKey;
+    private final String envKey;
+    private final T defaultValue;
+    private final Function<String, T> parseFunc;
+    private T lazyValue = null;
+
+    private ConfigEntry(
+        String propertyKey, String envKey, T defaultValue, Function<String, T> parseFunc) {
+      this.propertyKey = propertyKey;
+      this.envKey = envKey;
+      this.defaultValue = defaultValue;
+      this.parseFunc = parseFunc;
+    }
+
+    public final String propertyKey() {
+      return propertyKey;
+    }
+
+    public final String envKey() {
+      return envKey;
+    }
+
+    public final T defaultValue() {
+      return defaultValue;
+    }
+
+    public final T value() {
+      if (lazyValue == null) {
+        lazyValue = getValue();
+      }
+
+      return lazyValue;
+    }
+
+    private T getValue() {
+      String value = System.getProperty(propertyKey);
+      if (value == null) {
+        value = System.getenv(envKey);
+      }
+
+      if (value != null) {
+        try {
+          return parseFunc.apply(value);
+        } catch (NumberFormatException e) {

Review Comment:
   Updated and add the log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1160836266


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   Yea that sounds cleaner to me.  Curious will there ever be a case we don't want env variables but just rely on system properties?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1486308618

   @szehon-ho In our production environments, there are some default parameter settings for spark driver/executor java options (eg: gc parameters). And those parameters are concated into one string. This makes it difficult to overwrite or modify those parameters to set iceberg system-level properties. While with ENV settings, we could set those system-level properties separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1536136704

   Thanks @szehon-ho for the reviewing, and sorry got held up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1161054666


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   I think there shouldn't be that case, we can don't set the env variable and also the property variable has higher priority.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1163370556


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);

Review Comment:
   I think previously it would throw an exception if invalid, and return default.  But now it may not do so?



##########
core/src/main/java/org/apache/iceberg/util/ThreadPools.java:
##########
@@ -33,11 +33,9 @@ public class ThreadPools {
   private ThreadPools() {}
 
   public static final String WORKER_THREAD_POOL_SIZE_PROP =
-      SystemProperties.WORKER_THREAD_POOL_SIZE_PROP;
+      SystemConfigs.WORKER_THREAD_POOL_SIZE.propertyKey();

Review Comment:
   Wondering should we deprecate this variable for SystemConfigs one?  Not sure any real use case to keep it around.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1159190765


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -18,7 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Configuration properties that are controlled by Java system properties. */
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable

Review Comment:
   This class is now a bit confusing, should we just make another class?  
   
   I dont think there's that many places that need this, that we need to always call these utils?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1007825452


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -30,14 +32,23 @@ private SystemProperties() {}
    */
   public static final String WORKER_THREAD_POOL_SIZE_PROP = "iceberg.worker.num-threads";
 
+  public static final String WORKER_THREAD_POOL_SIZE_ENV = "ICEBERG_WORKER_NUM_THREADS";
+
   /** Whether to use the shared worker pool when planning table scans. */
   public static final String SCAN_THREAD_POOL_ENABLED = "iceberg.scan.plan-in-worker-pool";
 
-  static boolean getBoolean(String systemProperty, boolean defaultValue) {
+  public static final String SCAN_THREAD_POOL_ENABLED_ENV = "ICEBERG_SCAN_PLAN_IN_WORKER_POOL";

Review Comment:
   It is used by `core/src/main/java/org/apache/iceberg/util/ThreadPools.java` in a different package. I updated other envs to the default access level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1163372408


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);
+
+  /** Whether to use the shared worker pool when planning table scans. */
+  public static final ConfigEntry<Boolean> SCAN_THREAD_POOL_ENABLED =
+      new ConfigEntry<>(
+          "iceberg.scan.plan-in-worker-pool",
+          "ICEBERG_SCAN_PLAN_IN_WORKER_POOL",
+          true,
+          Boolean::parseBoolean);
+
+  /**
+   * Maximum number of distinct {@link org.apache.iceberg.io.FileIO} that is allowed to have
+   * associated {@link org.apache.iceberg.io.ContentCache} in memory at a time.
+   */
+  public static final ConfigEntry<Integer> IO_MANIFEST_CACHE_MAX_FILEIO =
+      new ConfigEntry<>(
+          "iceberg.io.manifest.cache.fileio-max",
+          "ICEBERG_IO_MANIFEST_CACHE_FILEIO_MAX",
+          8,
+          Integer::parseUnsignedInt);
+
+  public static class ConfigEntry<T> {
+    private final String propertyKey;
+    private final String envKey;
+    private final T defaultValue;
+    private final Function<String, T> parseFunc;
+
+    private ConfigEntry(
+        String propertyKey, String envKey, T defaultValue, Function<String, T> parseFunc) {
+      this.propertyKey = propertyKey;
+      this.envKey = envKey;
+      this.defaultValue = defaultValue;
+      this.parseFunc = parseFunc;
+    }
+
+    public final String propertyKey() {
+      return propertyKey;
+    }
+
+    public final String envKey() {
+      return envKey;
+    }
+
+    public final T defaultValue() {
+      return defaultValue;
+    }
+
+    public final T value() {
+      String value = System.getProperty(propertyKey);

Review Comment:
   Previously, it would get calculated once, now every time it will parse the properties.  Should we cache the value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1194425009


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);
+
+  /** Whether to use the shared worker pool when planning table scans. */
+  public static final ConfigEntry<Boolean> SCAN_THREAD_POOL_ENABLED =
+      new ConfigEntry<>(
+          "iceberg.scan.plan-in-worker-pool",
+          "ICEBERG_SCAN_PLAN_IN_WORKER_POOL",
+          true,
+          Boolean::parseBoolean);
+
+  /**
+   * Maximum number of distinct {@link org.apache.iceberg.io.FileIO} that is allowed to have
+   * associated {@link org.apache.iceberg.io.ContentCache} in memory at a time.
+   */
+  public static final ConfigEntry<Integer> IO_MANIFEST_CACHE_MAX_FILEIO =
+      new ConfigEntry<>(
+          "iceberg.io.manifest.cache.fileio-max",
+          "ICEBERG_IO_MANIFEST_CACHE_FILEIO_MAX",
+          8,
+          Integer::parseUnsignedInt);
+
+  public static class ConfigEntry<T> {
+    private final String propertyKey;
+    private final String envKey;
+    private final T defaultValue;
+    private final Function<String, T> parseFunc;
+    private T lazyValue = null;
+
+    private ConfigEntry(
+        String propertyKey, String envKey, T defaultValue, Function<String, T> parseFunc) {
+      this.propertyKey = propertyKey;
+      this.envKey = envKey;
+      this.defaultValue = defaultValue;
+      this.parseFunc = parseFunc;
+    }
+
+    public final String propertyKey() {
+      return propertyKey;
+    }
+
+    public final String envKey() {
+      return envKey;
+    }
+
+    public final T defaultValue() {
+      return defaultValue;
+    }
+
+    public final T value() {
+      if (lazyValue == null) {
+        lazyValue = getValue();
+      }
+
+      return lazyValue;
+    }
+
+    private T getValue() {
+      String value = System.getProperty(propertyKey);
+      if (value == null) {
+        value = System.getenv(envKey);
+      }
+
+      if (value != null) {
+        try {
+          return parseFunc.apply(value);
+        } catch (NumberFormatException e) {

Review Comment:
   Nit: how about catch Exception , as we don't know what kind of Function we get?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5659: Core: Support set system level properties with environmental variables

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#issuecomment-1485741149

   Hi guys, can you explain the motivation of the change?  Not opposed, but reading the description but didn't quite understand it.  To me, both seem to be more or less equivalent in ease-of-use (they are passed to the java process when it is spawned..)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #5659: Core: Support set system level properties with environmental variables

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #5659:
URL: https://github.com/apache/iceberg/pull/5659#discussion_r1185990116


##########
core/src/main/java/org/apache/iceberg/SystemConfigs.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.function.Function;
+
+/**
+ * Configuration properties that are controlled by Java system properties or environmental variable.
+ */
+public class SystemConfigs {
+
+  private SystemConfigs() {}
+
+  /**
+   * Sets the size of the worker pool. The worker pool limits the number of tasks concurrently
+   * processing manifests in the base table implementation across all concurrent planning or commit
+   * operations.
+   */
+  public static final ConfigEntry<Integer> WORKER_THREAD_POOL_SIZE =
+      new ConfigEntry<>(
+          "iceberg.worker.num-threads",
+          "ICEBERG_WORKER_NUM_THREADS",
+          Math.max(2, Runtime.getRuntime().availableProcessors()),
+          Integer::parseUnsignedInt);

Review Comment:
   Good catch, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org