You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/24 21:33:23 UTC

[GitHub] [iceberg] guilload opened a new pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

guilload opened a new pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243


   This PR adds support for custom catalog to Iceberg StorageHandler as well as Hadoop and Hive catalogs and removes the duplicated logic in `mapreduce.IcebergInputFormat.findTable` and `mapred.TableResolver`.
   
   @cmathiesen @massdosage @rdsr 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r464367200



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -66,7 +73,12 @@ public HiveAuthorizationProvider getAuthorizationProvider() throws HiveException
 
   @Override
   public void configureInputJobProperties(TableDesc tableDesc, Map<String, String> map) {
+    Properties props = tableDesc.getProperties();
+    Table table = Catalogs.loadTable(conf, props);
 
+    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(NAME));
+    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
+    map.put(InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(table.schema()));

Review comment:
       Yes, and ideally we'd have a unit test which uses multiple tables which are set up differently and checks that the properties that are unique to each table are honoured properly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460626104



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -95,15 +93,12 @@ public ConfigBuilder schema(Schema schema) {
     }
 
     public ConfigBuilder readFrom(TableIdentifier identifier) {
-      return readFrom(identifier.toString());
-    }
-
-    public ConfigBuilder readFrom(File path) {
-      return readFrom(path.toString());
+      conf.set(TABLE_IDENTIFIER, identifier.toString());
+      return this;
     }
 
-    public ConfigBuilder readFrom(String path) {
-      conf.set(TABLE_PATH, path);
+    public ConfigBuilder readFrom(File location) {

Review comment:
       Yes, you are right! I just wanted this to not be a `java.io.File` type




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
massdosage commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-669772790


   > > > > `HiveRunner` is rather slow (I believe each test sets up a new local HMS from scratch) and `TestHiveIcebergInputFormat` takes about 4 minutes to run on my machine (compile time included).
   > > > 
   > > > 
   > > > In most of our test cases we spin up HMS once per test file and clean it up in setUp/tearDown. Not an ideal solution, but saves as time/money.
   > > 
   > > 
   > > HiveRunner doesn't currently support that but it has been discussed before - [klarna/HiveRunner#69](https://github.com/klarna/HiveRunner/issues/69)
   > 
   > Hive does this here:
   > https://github.com/apache/hive/blob/6267520a50ba8c12ae5cee8fb27a6746376a21c6/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L444
   > 
   > I think the only part we are interested in here is:
   > https://github.com/apache/hive/blob/6267520a50ba8c12ae5cee8fb27a6746376a21c6/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L336
   > 
   > Which is very similar than what was done in [klarna/HiveRunner#69](https://github.com/klarna/HiveRunner/issues/69). We manually went through and deleted the tables in the databases (I think there was some issue previously which should be fixed now). The only other difference is that we clean up the files as well which is needed for EXTERNAL tables. Since the Iceberg tables will be EXTERNAL, I think we will need that as well here.
   
   That sounds promising. I guess the best place to do this is in HiveRunner itself. That issue referenced above has a PR against it from one of the original committers on HiveRunner but he makes it clear that it's very experimental and it hasn't been worked on in a long time. We could either build on that or raise a new one. I'd be happy to work with you on it if you want?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-663779407


   It's not ideal to have slow tests, but better to have slow tests than leave things untested, I think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470370131



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -155,11 +157,31 @@ public void testJoinTables() {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", 102L, 33.33d}, rows.get(2));
   }
 
-  private void createHiveTable(String table, String location) {
+  protected void createTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    Table table = createIcebergTable(tableName, schema, records);
+    createHiveTable(tableName, table.location());
+  }
+
+  protected Table createIcebergTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    String identifier = testTables.identifier("default." + tableName);
+    TestHelper helper = new TestHelper(
+            metastore.hiveConf(), testTables.tables(), identifier, schema, SPEC, FileFormat.PARQUET, temp);
+    Table table = helper.createTable();
+
+    if (!records.isEmpty()) {
+      helper.appendToTable(helper.writeFile(null, records));
+    }
+
+    return table;
+  }
+
+  protected void createHiveTable(String tableName, String location) {
     shell.execute(String.format(

Review comment:
       shouldn't we set the  custom catalog loader  by setting a table property `catalog.loader.class` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460587306



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -95,15 +93,12 @@ public ConfigBuilder schema(Schema schema) {
     }
 
     public ConfigBuilder readFrom(TableIdentifier identifier) {
-      return readFrom(identifier.toString());
-    }
-
-    public ConfigBuilder readFrom(File path) {
-      return readFrom(path.toString());
+      conf.set(TABLE_IDENTIFIER, identifier.toString());
+      return this;
     }
 
-    public ConfigBuilder readFrom(String path) {
-      conf.set(TABLE_PATH, path);
+    public ConfigBuilder readFrom(File location) {

Review comment:
       I don't think so. I think it should be `String location` because it identifies a table by location (like HadoopTables) right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463902741



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));

Review comment:
       We can ignore this in light of further comments below.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463784313



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));
+  }
+
+  // For use in HiveIcebergSerDe and HiveIcebergStorageHandler
+  public static Table loadTable(Configuration conf, Properties props) {
+    return loadTable(conf, props.getProperty(NAME), props.getProperty(LOCATION));
+  }
+
+  private static Table loadTable(Configuration conf, String tableIdentifier, String tableLocation) {

Review comment:
       Does it make sense to do something similar to what spark does?
   ```
   protected Table findTable(Map<String, String> options, Configuration conf) {
       Preconditions.checkArgument(options.containsKey("path"), "Cannot open table: path is not set");
       String path = options.get("path");
   
       if (path.contains("/")) {
         HadoopTables tables = new HadoopTables(conf);
         return tables.load(path);
       } else {
         HiveCatalog hiveCatalog = HiveCatalogs.loadCatalog(conf);
         TableIdentifier tableIdentifier = TableIdentifier.parse(path);
         return hiveCatalog.loadTable(tableIdentifier);
       }
     }
   ```
   use TABLE_PATH to determine whether we need to use Hadoop or Hive catalog first and only then try custom catalog in the end. Seems like custom catalog would be used very rarely, if at **all.** 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-666708753


   Hey,
   
   I've just pushed a commit that adds the changes requested during the first round of review and additionally integrates the modifications suggested by @rdsr in #1267, which clean up the input format nicely and solve the issue reported in #1267.
   
   I should be able to add the tests for each catalog today or tomorrow.
   
   cc @massdosage 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463854512



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -66,7 +73,12 @@ public HiveAuthorizationProvider getAuthorizationProvider() throws HiveException
 
   @Override
   public void configureInputJobProperties(TableDesc tableDesc, Map<String, String> map) {
+    Properties props = tableDesc.getProperties();
+    Table table = Catalogs.loadTable(conf, props);
 
+    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(NAME));
+    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
+    map.put(InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(table.schema()));

Review comment:
       Depends upon where we set the catalog loader class.   If we set it in HiveConf, then it will always be available by default in configuration. If we want to expose catalog loader class as a table property, e.g to load different tables differently, then we can forward it here.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-663765581


   @guilload, you can use an abstract class for the test cases and then multiple concrete classes for the environments you want to run with.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463867095



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -66,7 +73,12 @@ public HiveAuthorizationProvider getAuthorizationProvider() throws HiveException
 
   @Override
   public void configureInputJobProperties(TableDesc tableDesc, Map<String, String> map) {
+    Properties props = tableDesc.getProperties();
+    Table table = Catalogs.loadTable(conf, props);
 
+    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(NAME));
+    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
+    map.put(InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(table.schema()));

Review comment:
       It seems like it would always comes from table properties to me, since we might have tables from different Iceberg catalogs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-672320969


   I've just added 2500efb, which run the Hive test suite against multiple catalog implementations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r468197899



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));
+  }
+
+  // For use in HiveIcebergSerDe and HiveIcebergStorageHandler
+  public static Table loadTable(Configuration conf, Properties props) {
+    return loadTable(conf, props.getProperty(NAME), props.getProperty(LOCATION));
+  }
+
+  private static Table loadTable(Configuration conf, String tableIdentifier, String tableLocation) {
+    Optional<Catalog> catalog = loadCatalog(conf);
+
+    if (catalog.isPresent()) {
+      Preconditions.checkArgument(tableIdentifier != null, "Table identifier not set");
+      return catalog.get().loadTable(TableIdentifier.parse(tableIdentifier));
+    }
+
+    Preconditions.checkArgument(tableLocation != null, "Table location not set");
+    return new HadoopTables(conf).load(tableLocation);
+  }
+
+  @VisibleForTesting
+  static Optional<Catalog> loadCatalog(Configuration conf) {
+    String catalogLoaderClass = conf.get(InputFormatConfig.CATALOG_LOADER_CLASS);
+
+    if (catalogLoaderClass != null) {
+      CatalogLoader loader = (CatalogLoader) DynConstructors.builder(CatalogLoader.class)
+              .impl(catalogLoaderClass)
+              .build()
+              .newInstance();
+      return Optional.of(loader.load(conf));
+    }
+
+    String catalogName = conf.get(InputFormatConfig.CATALOG);
+
+    if (catalogName != null) {
+      switch (catalogName.toLowerCase()) {
+        case HADOOP:
+          return Optional.of(new HadoopCatalog(conf));

Review comment:
       Yes, that would be useful. I've just pushed 90ccd39b8e349b3506c9866e179112a408447bfa for that purpose.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-663773025


   Yeah, if we don't mind having as many test classes as catalog implementations then I can do that. 
   
   Also, is the running time of the Iceberg test suite something we care about? `HiveRunner` is rather slow (I believe each test sets up a new local HMS from scratch) and `TestHiveIcebergInputFormat` takes about 4 minutes to run on my machine (compile time included).  Running that test with more catalog implementations is only going to increase that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460586088



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    // A bit of a hack to make this function work transparently with Hive without having to remap the "name" and
+    // "location" properties.
+    if (HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID).length() > 0) {
+      return loadTable(conf, conf.get(NAME), conf.get(LOCATION));

Review comment:
       How's the name property used to specify a table ?

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -95,15 +93,12 @@ public ConfigBuilder schema(Schema schema) {
     }
 
     public ConfigBuilder readFrom(TableIdentifier identifier) {
-      return readFrom(identifier.toString());
-    }
-
-    public ConfigBuilder readFrom(File path) {
-      return readFrom(path.toString());
+      conf.set(TABLE_IDENTIFIER, identifier.toString());
+      return this;
     }
 
-    public ConfigBuilder readFrom(String path) {
-      conf.set(TABLE_PATH, path);
+    public ConfigBuilder readFrom(File location) {

Review comment:
       Shouldn't this be `InputFile`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470367437



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -45,9 +47,9 @@
 import static org.apache.iceberg.types.Types.NestedField.required;
 
 @RunWith(StandaloneHiveRunner.class)
-public class TestHiveIcebergInputFormat {
+public abstract class TestHiveIcebergInputFormat {

Review comment:
       it is kinda weird that we are calling this class `TestHiveIcebergInputFormat` instead of say `TestHiveStorageHandler` or something similar.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470767997



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -155,11 +157,31 @@ public void testJoinTables() {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", 102L, 33.33d}, rows.get(2));
   }
 
-  private void createHiveTable(String table, String location) {
+  protected void createTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    Table table = createIcebergTable(tableName, schema, records);
+    createHiveTable(tableName, table.location());
+  }
+
+  protected Table createIcebergTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    String identifier = testTables.identifier("default." + tableName);
+    TestHelper helper = new TestHelper(
+            metastore.hiveConf(), testTables.tables(), identifier, schema, SPEC, FileFormat.PARQUET, temp);
+    Table table = helper.createTable();
+
+    if (!records.isEmpty()) {
+      helper.appendToTable(helper.writeFile(null, records));
+    }
+
+    return table;
+  }
+
+  protected void createHiveTable(String tableName, String location) {
     shell.execute(String.format(

Review comment:
       Yeah, I think that will work once the catalog loader handles catalog settings defined at the table properties level. I'll add this refactor in the follow up PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463782035



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));
+  }
+
+  // For use in HiveIcebergSerDe and HiveIcebergStorageHandler
+  public static Table loadTable(Configuration conf, Properties props) {
+    return loadTable(conf, props.getProperty(NAME), props.getProperty(LOCATION));
+  }
+
+  private static Table loadTable(Configuration conf, String tableIdentifier, String tableLocation) {
+    Optional<Catalog> catalog = loadCatalog(conf);

Review comment:
       Can we do something like what Spark does
   ```
   protected Table findTable(Map<String, String> options, Configuration conf) {
       Preconditions.checkArgument(options.containsKey("path"), "Cannot open table: path is not set");
       String path = options.get("path");
   
       if (path.contains("/")) {
         HadoopTables tables = new HadoopTables(conf);
         return tables.load(path);
       } else {
         HiveCatalog hiveCatalog = HiveCatalogs.loadCatalog(conf);
         TableIdentifier tableIdentifier = TableIdentifier.parse(path);
         return hiveCatalog.loadTable(tableIdentifier);
       }
     }
   ```
   use TABLE_PATH to determine whether we need to use Hadoop or Hive catalog first and only then try custom catalog in the end. Seems like custom catalog would be used very rarely, if at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
massdosage commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-667982867


   I've managed to run this branch successfully on Hive in distributed mode (just with some packaging changes to build an uber jar and do some relocations) and it works fine. I'll keep checking when there are major changes but so far it looks good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463838000



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -66,7 +73,12 @@ public HiveAuthorizationProvider getAuthorizationProvider() throws HiveException
 
   @Override
   public void configureInputJobProperties(TableDesc tableDesc, Map<String, String> map) {
+    Properties props = tableDesc.getProperties();
+    Table table = Catalogs.loadTable(conf, props);
 
+    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(NAME));
+    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
+    map.put(InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(table.schema()));

Review comment:
       Doesn't this also need to forward the catalog loader class?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463781404



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));

Review comment:
       can't we replace TABLE_LOCATION, TABLE_IDENTIFIER and TABLE_PATH with just TABLE_PATH?
   
   e.g see org.apache.iceberg.spark.source.IcebergSource#findTable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-675491544


   Thanks @guilload lets improve the test in follow ups.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-669145112


   > > > `HiveRunner` is rather slow (I believe each test sets up a new local HMS from scratch) and `TestHiveIcebergInputFormat` takes about 4 minutes to run on my machine (compile time included).
   > > 
   > > 
   > > In most of our test cases we spin up HMS once per test file and clean it up in setUp/tearDown. Not an ideal solution, but saves as time/money.
   > 
   > HiveRunner doesn't currently support that but it has been discussed before - [klarna/HiveRunner#69](https://github.com/klarna/HiveRunner/issues/69)
   
   Hive does this here:
   https://github.com/apache/hive/blob/6267520a50ba8c12ae5cee8fb27a6746376a21c6/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L444
   
   I think the only part we are interested in here is:
   https://github.com/apache/hive/blob/6267520a50ba8c12ae5cee8fb27a6746376a21c6/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L336
   
   Which is very similar than what was done in klarna/HiveRunner#69. We manually went through and deleted the tables in the databases (I think there was some issue previously which should be fixed now). The only other difference is that we clean up the files as well which is needed for EXTERNAL tables. Since the Iceberg tables will be EXTERNAL, I think we will need that as well here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463785255



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -39,13 +38,7 @@
 
   @Override
   public void initialize(@Nullable Configuration configuration, Properties serDeProperties) throws SerDeException {
-    final Table table;
-
-    try {
-      table = TableResolver.resolveTableFromConfiguration(configuration, serDeProperties);
-    } catch (IOException e) {
-      throw new UncheckedIOException("Unable to resolve table from configuration: ", e);
-    }
+    Table table = Catalogs.loadTable(configuration, serDeProperties);

Review comment:
       might require a rebase




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460646992



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    // A bit of a hack to make this function work transparently with Hive without having to remap the "name" and
+    // "location" properties.
+    if (HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID).length() > 0) {
+      return loadTable(conf, conf.get(NAME), conf.get(LOCATION));

Review comment:
       Where is this conf coming from? Is this a query conf? Or this is some feature of Hive I am not aware of? Then this does not seem too future proof. What happens if there are multiple Iceberg tables are in a single query?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-673846140


   generally LG! I'm ok apart from some comments on the tests. We can even address these comments on the tests later if we like.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-664120611


   > `HiveRunner` is rather slow (I believe each test sets up a new local HMS from scratch) and `TestHiveIcebergInputFormat` takes about 4 minutes to run on my machine (compile time included).
   
   In most of our test cases we spin up HMS once per test file and clean it up in setUp/tearDown. Not an ideal solution, but saves as time/money.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-667356074


   I like the approach, it mostly looks good to me. Would be great to have @massdosage check that the behavior is what they expect, too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463836992



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));
+  }
+
+  // For use in HiveIcebergSerDe and HiveIcebergStorageHandler
+  public static Table loadTable(Configuration conf, Properties props) {
+    return loadTable(conf, props.getProperty(NAME), props.getProperty(LOCATION));
+  }
+
+  private static Table loadTable(Configuration conf, String tableIdentifier, String tableLocation) {

Review comment:
       I think what's happening here makes sense because configuration is done separately using either `readFrom(location)` or `readFrom(identifier)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470761552



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -45,9 +47,9 @@
 import static org.apache.iceberg.types.Types.NestedField.required;
 
 @RunWith(StandaloneHiveRunner.class)
-public class TestHiveIcebergInputFormat {
+public abstract class TestHiveIcebergInputFormat {

Review comment:
       That's because this test class was added before the storage handler. I'll rename it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460586335



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -95,15 +93,12 @@ public ConfigBuilder schema(Schema schema) {
     }
 
     public ConfigBuilder readFrom(TableIdentifier identifier) {
-      return readFrom(identifier.toString());
-    }
-
-    public ConfigBuilder readFrom(File path) {
-      return readFrom(path.toString());
+      conf.set(TABLE_IDENTIFIER, identifier.toString());
+      return this;
     }
 
-    public ConfigBuilder readFrom(String path) {
-      conf.set(TABLE_PATH, path);
+    public ConfigBuilder readFrom(File location) {

Review comment:
       Shouldn't this be `InputFile`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460299142



##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -95,15 +93,12 @@ public ConfigBuilder schema(Schema schema) {
     }
 
     public ConfigBuilder readFrom(TableIdentifier identifier) {
-      return readFrom(identifier.toString());
-    }
-
-    public ConfigBuilder readFrom(File path) {
-      return readFrom(path.toString());
+      conf.set(TABLE_IDENTIFIER, identifier.toString());
+      return this;
     }
 
-    public ConfigBuilder readFrom(String path) {

Review comment:
       I have decided to remove `readFrom(String ...)` since the string can be ambiguously an identifier or a location.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470371366



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -155,11 +157,31 @@ public void testJoinTables() {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", 102L, 33.33d}, rows.get(2));
   }
 
-  private void createHiveTable(String table, String location) {
+  protected void createTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    Table table = createIcebergTable(tableName, schema, records);
+    createHiveTable(tableName, table.location());
+  }
+
+  protected Table createIcebergTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    String identifier = testTables.identifier("default." + tableName);
+    TestHelper helper = new TestHelper(
+            metastore.hiveConf(), testTables.tables(), identifier, schema, SPEC, FileFormat.PARQUET, temp);
+    Table table = helper.createTable();
+
+    if (!records.isEmpty()) {
+      helper.appendToTable(helper.writeFile(null, records));
+    }
+
+    return table;
+  }
+
+  protected void createHiveTable(String tableName, String location) {
     shell.execute(String.format(

Review comment:
       If that works [maybe we also have to abstract `Configuration creation`] then I think we can drop the `TestTables` class as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-671613075


   @rdblue I've rebased the PR, added a commit for customizing the warehouse location for the Hadoop catalog and reviewed the comment about multiples catalogs. However, I'm still working on running the Hive test suite once per catalog implementation. I've made good progress but still need more time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r463836423



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    return loadTable(conf, conf.get(InputFormatConfig.TABLE_IDENTIFIER), conf.get(InputFormatConfig.TABLE_LOCATION));
+  }
+
+  // For use in HiveIcebergSerDe and HiveIcebergStorageHandler
+  public static Table loadTable(Configuration conf, Properties props) {
+    return loadTable(conf, props.getProperty(NAME), props.getProperty(LOCATION));
+  }
+
+  private static Table loadTable(Configuration conf, String tableIdentifier, String tableLocation) {
+    Optional<Catalog> catalog = loadCatalog(conf);
+
+    if (catalog.isPresent()) {
+      Preconditions.checkArgument(tableIdentifier != null, "Table identifier not set");
+      return catalog.get().loadTable(TableIdentifier.parse(tableIdentifier));
+    }
+
+    Preconditions.checkArgument(tableLocation != null, "Table location not set");
+    return new HadoopTables(conf).load(tableLocation);
+  }
+
+  @VisibleForTesting
+  static Optional<Catalog> loadCatalog(Configuration conf) {
+    String catalogLoaderClass = conf.get(InputFormatConfig.CATALOG_LOADER_CLASS);
+
+    if (catalogLoaderClass != null) {
+      CatalogLoader loader = (CatalogLoader) DynConstructors.builder(CatalogLoader.class)
+              .impl(catalogLoaderClass)
+              .build()
+              .newInstance();
+      return Optional.of(loader.load(conf));
+    }
+
+    String catalogName = conf.get(InputFormatConfig.CATALOG);
+
+    if (catalogName != null) {
+      switch (catalogName.toLowerCase()) {
+        case HADOOP:
+          return Optional.of(new HadoopCatalog(conf));

Review comment:
       There are two relevant configs in addition to the type: `uri` for Hive and `warehouse` for Hadoop. Hive defaults to the value in configuration of `hive.metastore.uris` from `hive-site.xml`, so it makes sense to not pass it explicitly here.
   
   But, the Hadoop catalog doesn't really have a standard warehouse path. It just has a default. Should we add a configuration option to set up the warehouse when using a `HadoopCatalog`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-673544582


   thanks @guilload @rdblue I plan to spend time on this today


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr merged pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr merged pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r470370881



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java
##########
@@ -155,11 +157,31 @@ public void testJoinTables() {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", 102L, 33.33d}, rows.get(2));
   }
 
-  private void createHiveTable(String table, String location) {
+  protected void createTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    Table table = createIcebergTable(tableName, schema, records);
+    createHiveTable(tableName, table.location());
+  }
+
+  protected Table createIcebergTable(String tableName, Schema schema, List<Record> records)
+          throws IOException {
+    String identifier = testTables.identifier("default." + tableName);
+    TestHelper helper = new TestHelper(
+            metastore.hiveConf(), testTables.tables(), identifier, schema, SPEC, FileFormat.PARQUET, temp);
+    Table table = helper.createTable();
+
+    if (!records.isEmpty()) {
+      helper.appendToTable(helper.writeFile(null, records));
+    }
+
+    return table;
+  }
+
+  protected void createHiveTable(String tableName, String location) {
     shell.execute(String.format(

Review comment:
       I think if we make `createHiveTable` [and maybe `createIcebergTable`] abstract, each implementation can set the right configuration in table properties. We can then test custom catalog and other configs without explicitly passing as properties from `TestTables` . What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] guilload commented on a change in pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
guilload commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r468202092



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -66,7 +73,12 @@ public HiveAuthorizationProvider getAuthorizationProvider() throws HiveException
 
   @Override
   public void configureInputJobProperties(TableDesc tableDesc, Map<String, String> map) {
+    Properties props = tableDesc.getProperties();
+    Table table = Catalogs.loadTable(conf, props);
 
+    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(NAME));
+    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
+    map.put(InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(table.schema()));

Review comment:
       The current implement of the catalog loader reads directly from the HiveConf and ignores the table properties, that why the catalog loader class is not forwarded here.
   
   I'm planning to add the ability to declare the catalog in the table properties but I'd rather do so in a follow-up PR because that requires a refactor of the current test suite. I'll make sure to add a test that reads from multiples tables set up with different catalogs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460739962



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import java.util.Optional;
+import java.util.Properties;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.hive.HiveCatalogs;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public final class Catalogs {
+
+  private static final String HADOOP = "hadoop";
+  private static final String HIVE = "hive";
+
+  private static final String NAME = "name";
+  private static final String LOCATION = "location";
+
+  private Catalogs() {}
+
+  /**
+   * Load an Iceberg table using the catalog and table identifier (or table path) specified by the configuration.
+   * Catalog resolution happens in this order:
+   * 1. Custom catalog if specified by {@link InputFormatConfig#CATALOG_LOADER_CLASS}
+   * 2. Hadoop or Hive catalog if specified by {@link InputFormatConfig#CATALOG}
+   * 3. Hadoop Tables
+   * @param conf a Hadoop conf
+   * @return an Iceberg table
+   */
+  public static Table loadTable(Configuration conf) {
+    // A bit of a hack to make this function work transparently with Hive without having to remap the "name" and
+    // "location" properties.
+    if (HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID).length() > 0) {
+      return loadTable(conf, conf.get(NAME), conf.get(LOCATION));

Review comment:
       A useful test could be to add a HiveRunner test which does a join between multiple tables, each one loaded with a different catalog. We definitely found some issues in our earlier implementation of this that didn't hold once more than one table was involved in the query.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1243: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#discussion_r460585102



##########
File path: mr/src/main/java/org/apache/iceberg/mr/CatalogLoader.java
##########
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.catalog.Catalog;
+
+public interface CatalogLoader {
+
+  Catalog load(Configuration conf);
+

Review comment:
       nit: blank line. Also javadoc on this method
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1243: Hive: Add support for custom catalog to Iceberg StorageHandler (#1155)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1243:
URL: https://github.com/apache/iceberg/pull/1243#issuecomment-670604455


   @guilload, could you rebase and take a look at the [review comment about multiple catalogs](https://github.com/apache/iceberg/pull/1243#discussion_r463838000)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org