You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/08 09:35:06 UTC

[GitHub] [iceberg] kbendick commented on a change in pull request #1558: load hive-site.xml for flink catalog

kbendick commented on a change in pull request #1558:
URL: https://github.com/apache/iceberg/pull/1558#discussion_r501581069



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java
##########
@@ -121,4 +139,58 @@ protected Catalog createCatalog(String name, Map<String, String> properties, Con
   public static Configuration clusterHadoopConf() {
     return HadoopUtils.getHadoopConfiguration(GlobalConfiguration.loadConfiguration());
   }
+
+  private void loadHiveConf(Configuration configuration, Map<String, String> properties) {
+    String hiveConfPath = properties.get(HIVE_SITE_PATH);
+    Path path = new Path(hiveConfPath);
+    String scheme = getScheme(path);
+    // We can add more storage support later,like s3
+    switch (scheme) {
+      case HIVE_SITE_SCHEME_HDFS:
+        downloadFromHdfs(configuration, path);
+        break;
+      case HIVE_SITE_SCHEME_FILE:
+        loadLocalHiveConf(configuration, hiveConfPath);
+        break;
+      default:
+        throw new UnsupportedOperationException(
+            "Unsupported FileSystem for scheme :" + scheme);
+    }
+  }
+
+  private String getScheme(Path path) {
+    String scheme = path.toUri().getScheme();
+    if (scheme == null) {
+      // for case :  /tmp/hive-site.xml
+      return HIVE_SITE_SCHEME_FILE;

Review comment:
       In most projects I've worked on, it's possible to set the default scheme. I know this is true for Flink as well, which has `fs.default-scheme`, which is used for paths that don't have a scheme specified.
   
   Given that this is specifically for Flink, would it make more sense to fall back to that configuration value if it's available? Perhaps one of the more regular Flink contributors could chime in here. I'd be more likely to default to `file` as the scheme over HDFS, but I run all of my Flink deployment in containers so I'm likely biased.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org