You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/15 01:47:36 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

jackye1995 opened a new pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333


   @aokolnychyi this should probably go to 0.11.1
   
   HadoopFileIO currently does not have a no-arg constructor and could not be loaded by the dynamic FileIO loader. But our documentation claims it can be loaded, so we need to fix it asap. There are 2 ways to fix it,
   1. add a no-arg constructor to HadoopFileIO, let it implement Configurable, and set configuration in `setConf` method.
   2. (current approach in PR) allow the loader to load an implementation that has a constructor taking Hadoop configuration.
   
   To do approach 1, we need to change HadoopFileIO class variable to be not final, and I am a bit hesitated to do that for performance concerns, so I went with approach 2 for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r595790212



##########
File path: core/src/test/java/org/apache/iceberg/TestCatalogUtil.java
##########
@@ -123,6 +123,16 @@ public void loadCustomFileIO_noArg() {
     Assert.assertEquals(properties, ((TestFileIONoArg) fileIO).map);
   }
 
+  @Test
+  public void loadCustomFileIO_hadoopConfigConstructor() {
+    Configuration configuration = new Configuration();
+    configuration.set("key", "val");
+    FileIO fileIO = CatalogUtil.loadFileIO(
+        TestFileIOHadoopConstructor.class.getName(), Maps.newHashMap(), configuration);

Review comment:
       Why not just  load the `HadoopFileIO` directly ?  Here we are trying to load an `TestFileIOHadoopConstructor`,  I'm concerning that we could not detect the breaking case if someone change the `HadoopFileIO` constructor....




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594407742



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);

Review comment:
       try -> will attempt to use
   
   Just because the current wording makes it sound like the user needs to try to do something




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-799643210


   @RussellSpitzer updated based on all the comments, please take a look when you have time, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r599983663



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -197,10 +197,17 @@ public static Catalog buildIcebergCatalog(String name, Map<String, String> optio
   /**
    * Load a custom {@link FileIO} implementation.
    * <p>
-   * The implementation must have a no-arg constructor.
+   * The implementation should have a no-arg constructor.
    * If the class implements {@link Configurable},
    * a Hadoop config will be passed using {@link Configurable#setConf(Configuration)}.
-   * {@link FileIO#initialize(Map properties)} is called to complete the initialization.
+   *
+   * For compatibility with HadoopFileIO,

Review comment:
       Yes I also noticed this is a bit annoying to see a stack trace printed when trying to load it, let me just not print the error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594818223



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -197,10 +197,17 @@ public static Catalog buildIcebergCatalog(String name, Map<String, String> optio
   /**
    * Load a custom {@link FileIO} implementation.
    * <p>
-   * The implementation must have a no-arg constructor.
+   * The implementation should have a no-arg constructor.
    * If the class implements {@link Configurable},
    * a Hadoop config will be passed using {@link Configurable#setConf(Configuration)}.
-   * {@link FileIO#initialize(Map properties)} is called to complete the initialization.
+   *
+   * For compatibility with HadoopFileIO,

Review comment:
       I wonder if we starts to support 1-arg constructor as a valid case in the code and prints error messages about it, should we mention this as a fully supported case instead of a backward compatibility support? Especially that the reason for you to choose this option is performance concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-806131975


   @jackye1995, can you elaborate on your performance concerns if the configuration isn't final? I'm not sure that would be a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594409801



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);

Review comment:
       Also shouldn't this be "info"? It's not necessarily an error right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594558557



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);

Review comment:
       Thanks, updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-807263898


   @jackye1995, let's go with the no-arg constructor and remove `final`. Configuration is primarily accessed to create `FileIO`, which can then have a `final` reference. So I don't think this is in a critical path and we shouldn't worry about it. Probably not worth making the contract for dynamic loading more complicated and difficult to maintain.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r599983125



##########
File path: core/src/test/java/org/apache/iceberg/TestCatalogUtil.java
##########
@@ -123,6 +123,16 @@ public void loadCustomFileIO_noArg() {
     Assert.assertEquals(properties, ((TestFileIONoArg) fileIO).map);
   }
 
+  @Test
+  public void loadCustomFileIO_hadoopConfigConstructor() {
+    Configuration configuration = new Configuration();
+    configuration.set("key", "val");
+    FileIO fileIO = CatalogUtil.loadFileIO(
+        TestFileIOHadoopConstructor.class.getName(), Maps.newHashMap(), configuration);

Review comment:
       I don't think that would be possible, since HadoopFileIO is public and the constructor is also public, we cannot simply change it. But this is a good point, I can just load it instead of having this dummy test class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-807310913


   @rdblue I was just going to reply, I did some benchmarking, as expected there is a difference but it was at microsecond level, so I think it should be fine to just remove the final. I think I was overly conservative on this, thanks for the suggestion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594413509



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);
+      try {
+        ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked();
+        useNoArgConstructor = false;

Review comment:
       One additional question, do we always want the 0 arg constructor to get priority over a 1-arg? Or should we throw an error if you provide both?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594410343



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);
+      try {
+        ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked();
+        useNoArgConstructor = false;

Review comment:
       Could we just invoke the constructor where we load it and skip the state variable passing through here?
   
   Something like
   ```java
   Try {
      ctor = ..
      fileIO = ctor.newInstance
   } catch {
      ctor = ... single are ctor
      fileIo = ctor.newInstance(hadoopConf)
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r600826241



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -197,10 +197,17 @@ public static Catalog buildIcebergCatalog(String name, Map<String, String> optio
   /**
    * Load a custom {@link FileIO} implementation.
    * <p>
-   * The implementation must have a no-arg constructor.
+   * The implementation should have a no-arg constructor.
    * If the class implements {@link Configurable},
    * a Hadoop config will be passed using {@link Configurable#setConf(Configuration)}.
-   * {@link FileIO#initialize(Map properties)} is called to complete the initialization.
+   *

Review comment:
       Paragraphs should be separated by `<p>`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-806190054


   > Sorry, I missed this PR. I can hold 0.11.1 RC for a little longer.
   
   Thank you Anton!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-806164297


   > I think this can be collapsed into a single DynConstructors call that lists both implementations. 
   
   The issue with that approach is that I still need a boolean to check if I need to call the `setConf` for no-arg constructor, and there was no way to get that boolean. I guess I can bypass that by checking `instanceof Configurable` for all FileIOs, so if we decide to go with that route I can make the change accordingly.
   
   > can you elaborate on your performance concerns if the configuration isn't final?
   
   I would actually prefer to use this approach of adding no-arg consturctor for `HadoopFileIO` which makes the whole concept of FIleIO loading much simpler. But I see `conf()` is called in all read and write paths to create a serializable configuration to avoid Kryo serialization issue. There is definitely a small difference for using final versus not final, but the difference might not be significant from a broader view, I do not have a definitive answer for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594410343



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);
+      try {
+        ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked();
+        useNoArgConstructor = false;

Review comment:
       Could we just invoke the constructor where we load it and skip the state variable passing through here?
   
   Something like
   ```java
   try {
      ctor = ..
      fileIO = ctor.newInstance
   } catch {
      ctor = ... single are ctor
      fileIo = ctor.newInstance(hadoopConf)
   }
   ```
   
   To avoid try nesting we could extract that to a helper function and then have something like
   ```java
   try {
     fileIO = loadFileIO(impl)
   } catch {
     Cannot find a 0-arg or 1-arg constructor to initalize FileImplemenation
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 edited a comment on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 edited a comment on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-806164297


   @rdblue 
   
   > I think this can be collapsed into a single DynConstructors call that lists both implementations. 
   
   The issue with that approach is that I still need a boolean to check if I need to call the `setConf` for no-arg constructor, and there was no way to get that boolean. I guess I can bypass that by checking `instanceof Configurable` for all FileIOs, so if we decide to go with that route I can make the change accordingly.
   
   > can you elaborate on your performance concerns if the configuration isn't final?
   
   I would actually prefer to use this approach of adding no-arg consturctor for `HadoopFileIO` which makes the whole concept of FIleIO loading much simpler. But I see `conf()` is called in all read and write paths to create a serializable configuration to avoid Kryo serialization issue. There is definitely a small difference for using final versus not final, but the difference might not be significant from a broader view, I do not have a definitive answer for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r600827578



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -214,24 +221,32 @@ public static FileIO loadFileIO(
       Map<String, String> properties,
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
-    DynConstructors.Ctor<FileIO> ctor;
     try {
-      ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
+      return loadFileIO(impl, hadoopConf, properties);
     } catch (NoSuchMethodException e) {
       throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+          "Cannot find a 0-arg or 1-arg constructor to initialize FileIO implementation %s", impl), e);
+    } catch (ClassCastException e) {
+      throw new IllegalArgumentException(
+          String.format("Cannot initialize FileIO, %s does not implement FileIO interface.", impl), e);
     }
+  }
 
+  @SuppressWarnings("CatchBlockLogException")
+  private static FileIO loadFileIO(String impl, Configuration hadoopConf, Map<String, String> properties)
+      throws NoSuchMethodException {
+    DynConstructors.Ctor<FileIO> ctor;
     FileIO fileIO;
     try {
+      ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
       fileIO = ctor.newInstance();
-    } catch (ClassCastException e) {
-      throw new IllegalArgumentException(
-          String.format("Cannot initialize FileIO, %s does not implement FileIO.", impl), e);
-    }
-
-    if (fileIO instanceof Configurable) {
-      ((Configurable) fileIO).setConf(hadoopConf);
+      if (fileIO instanceof Configurable) {
+        ((Configurable) fileIO).setConf(hadoopConf);
+      }
+    } catch (NoSuchMethodException e) {
+      LOG.info("0-arg constructor not found for {}, will attempt to use Hadoop Configuration constructor", impl);
+      ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked();

Review comment:
       I think this can be collapsed into a single DynConstructors call that lists both implementations. The first one that is found will be selected. And the `newInstance` method should discard arguments that are not needed so you can use the same call for both.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-809740239


   Thanks everyone! I am going to include this change in 0.11.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#discussion_r594571232



##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -215,16 +216,27 @@ public static FileIO loadFileIO(
       Configuration hadoopConf) {
     LOG.info("Loading custom FileIO implementation: {}", impl);
     DynConstructors.Ctor<FileIO> ctor;
+    boolean useNoArgConstructor = true;
     try {
       ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
     } catch (NoSuchMethodException e) {
-      throw new IllegalArgumentException(String.format(
-          "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e);
+      LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e);
+      try {
+        ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked();
+        useNoArgConstructor = false;

Review comment:
       > do we always want the 0 arg constructor to get priority over a 1-arg
   
   I think it is better to let 0 arg constructor to get priority, so it does not block any FileIO that might have both constructors.
   
   > To avoid try nesting we could extract that to a helper function
   
   Yeah I tried about that but there was duplicated logic for finding ClassCastException. Using a helper function is probably a way to solve it, let me update with that, thank you!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2333: Core: fix bug that HadoopFileIO cannot be dynamically loaded

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2333:
URL: https://github.com/apache/iceberg/pull/2333#issuecomment-806156032


   Sorry, I missed this PR. I can hold 0.11.1 RC for a little longer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org