You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "youngxinler (via GitHub)" <gi...@apache.org> on 2023/02/20 11:34:16 UTC

[GitHub] [iceberg] youngxinler opened a new pull request, #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

youngxinler opened a new pull request, #6886:
URL: https://github.com/apache/iceberg/pull/6886

   background: #6863 
   
   Currently we only allow setting warehouse dir for Hadoop Catalogs, this PR supports `HiveCatalog` default warehouse path by configuring `spark.sql.warehouse.dir` or `spark.sql.catalog.$catalogName.warehouse`.
   
   Closes #6863 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1176953144


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.spark.sql.SparkSession;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();
+    SparkSession.clearDefaultSession();

Review Comment:
   This doesn't actually change any settings. It just changes what is returned by SparkSession. Since all the usages in the tests are based on "spark" which is a set val, call won't actually change anything.
   
   So what I was suggesting if you wanted to clear things out what you would do is something like
   
   
   tempSession = spark.newSession()
   ```java
     /**
      * Start a new session with isolated SQL configurations, temporary tables, registered
      * functions are isolated, but sharing the underlying `SparkContext` and cached data.
      *
      * @note Other than the `SparkContext`, all shared state is initialized lazily.
      * This method will force the initialization of the shared state to ensure that parent
      * and child sessions are set up with the same shared state. If the underlying catalog
      * implementation is Hive, this will initialize the metastore, which may take some time.
      *
      * @since 2.0.0
      */
      ```
   // Change Config of tempSession
   // You can reset catalogManager here too if you like
   // Do test
   
   Basically every time you do this it makes a new sessionState object which means you get a brand new runtimeConfig as well as a brand new catalog manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1172167494


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+  }
+
+  @Test
+  public void testCatalogWithSparkSqlWarehouseDir() {
+    spark.conf().unset("spark.sql.catalog." + catalogName + ".warehouse");

Review Comment:
   Thanks! I am also troubled by this problem, but it is not supported to start two SparkContexts in the same JVM, SparkSession getOrCreate method will return the existing spark instance. 
   So this is why I rewrite the code and change this The reason why the test code is placed in a newly created `TestSparkWarehouseLocation`, because if it is placed in `TestCreateTable` to reset in different catalog warehouse config, once it fails, it will cause unit tests to affect each other, and pre-control Catalog state changes and recovery will complicate the overall `TestCreateTable` code, and these changes are unnecessary for other tests. The latest changes have been committed, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164360552


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());

Review Comment:
   See https://iceberg.apache.org/contribute/#testing
   For advice on asserts, we are using only AssertJ now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166961966


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);

Review Comment:
   If the default always has a URI then by all means lets do this, also let's bump this into a "before all" method for the test suite, I don't think we need to call it for every test. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1172173067


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();

Review Comment:
   The reset here is to refresh the latest Catalog configuration for other tests, SparkTestBaseWithCatalog will be responsible for resetting the default Catalog config each time it is initialized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164372121


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+  }
+
+  @Test
+  public void testCatalogWithSparkSqlWarehouseDir() {
+    spark.conf().unset("spark.sql.catalog." + catalogName + ".warehouse");

Review Comment:
   Still not sure this is the write thing to do, if the sessionstate already has a catalog manager with the catalog init'd it would be used and this would have no effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164362162


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));

Review Comment:
   Not sure what you are mentioning here since we aren't actually Spark here. That said this should also be checking that the database name and table name are also present. The current test would pass if the location == warehouse.dir. We want to make sure that we are never setting the location to the warehouse directory. It should always be a subdirectory with the db and table name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164372121


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+  }
+
+  @Test
+  public void testCatalogWithSparkSqlWarehouseDir() {
+    spark.conf().unset("spark.sql.catalog." + catalogName + ".warehouse");

Review Comment:
   Still not sure this is the right thing to do, if the sessionstate already has a catalog manager with the catalog init'd it would be used and this would have no effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#issuecomment-1519592072

   > I think this is almost done, we just need to create new sessions when we are working with new config to avoid contamination of other unit tests either that or just clearing out the config of added parameters after each test.
   > 
   > I also have one more note, we could clean up the `if` tree in the checks to be something like String expectedPath = (is a Hadoop catalog) ? hadoopPath : hivePath;
   > 
   > Then it's a single assert
   
   thanks for review,  I have changed this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#issuecomment-1436838847

   I checked the test code in the spark test directory, and I couldn't find a suitable place to add the test, so I may need a little hint on how to add the test for this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1142304868


##########
docs/spark-configuration.md:
##########
@@ -37,6 +37,7 @@ This creates an Iceberg catalog named `hive_prod` that loads tables from a Hive
 spark.sql.catalog.hive_prod = org.apache.iceberg.spark.SparkCatalog
 spark.sql.catalog.hive_prod.type = hive
 spark.sql.catalog.hive_prod.uri = thrift://metastore-host:port
+spark.sql.catalog.hive_prod.warehouse = hdfs://nn:8020/warehouse/path

Review Comment:
   Not sure we should add this here as it's not required. I think it's probably best to stick to the basics here and document the new capability in Catalog options



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166570817


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());

Review Comment:
   This does seem unnecessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164369908


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);

Review Comment:
   We probably need to reset the catalogs (I think we can do this by just making a new session) after setting this property. otherwise other tests will hav init'd these catalogs in the catalogManager. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1136050274


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +134,14 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    // if spark.sql.catalog.$catalogName.warehouse no setting, will try use
+    // spark.sql.warehouse.dir as catalog warehouse.
+    final String warehouse =
+        Optional.ofNullable(conf.get(CatalogProperties.WAREHOUSE_LOCATION))
+            .orElseGet(() -> optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()));

Review Comment:
   I don't believe this would be in the options map, would it be in the sql conf?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166532251


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);

Review Comment:
   because `defaultWarehouseLocation(TableIdentifier tableIdentifier)` method will return exist namespace locationUri as warehouseLocation for HiveCatalog,  we can't change exist namespace location when set warehouseLocation, so i create a new namespace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1176953144


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.spark.sql.SparkSession;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();
+    SparkSession.clearDefaultSession();

Review Comment:
   This doesn't actually change any settings. It just changes what is returned by SparkSession but since all the usages in the tests are based on "spark" which is a set val.
   
   So what I was suggesting if you wanted to clear things out what you would do is something like
   
   
   tempSession = spark.newSession()
   ```java
     /**
      * Start a new session with isolated SQL configurations, temporary tables, registered
      * functions are isolated, but sharing the underlying `SparkContext` and cached data.
      *
      * @note Other than the `SparkContext`, all shared state is initialized lazily.
      * This method will force the initialization of the shared state to ensure that parent
      * and child sessions are set up with the same shared state. If the underlying catalog
      * implementation is Hive, this will initialize the metastore, which may take some time.
      *
      * @since 2.0.0
      */
      ```
   // Change Config of tempSession
   // You can reset catalogManager here too if you like
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] KarlManong commented on pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "KarlManong (via GitHub)" <gi...@apache.org>.

KarlManong commented on PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#issuecomment-1526829330

   What will happen when using multiple catalogs with different warehouses?
   example:
   ```java
           conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions");
           conf.set("spark.sql.catalog.first", "org.apache.iceberg.spark.SparkCatalog");
           conf.set("spark.sql.catalog.first.warehouse", "hdfs:///cluster_first/first_warehouse");
           conf.set("spark.sql.catalog.second", "org.apache.iceberg.spark.SparkCatalog");
           conf.set("spark.sql.catalog.second.warehouse", "hdfs:///cluster_second/second_warehouse");
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] clesaec commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "clesaec (via GitHub)" <gi...@apache.org>.

clesaec commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1112094529


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -127,6 +128,16 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    if (conf.get(CatalogProperties.WAREHOUSE_LOCATION) != null) {
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION, conf.get(CatalogProperties.WAREHOUSE_LOCATION));
+    } else if (optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()) != null) {
+      // if spark.sql.catalog.$catalogName.warehouse no setting, will try use
+      // spark.sql.warehouse.dir as catalog warehouse.
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION,
+          optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()));
+    }

Review Comment:
   Would be easier to read with Optional ?
   ```java
       final String warehouse = Optional.ofNullable(conf.get(CatalogProperties.WAREHOUSE_LOCATION))
               .orElseGet(() -> optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()));
       if (warehouse != null) {
         optionsMap.put(CatalogProperties.WAREHOUSE_LOCATION, warehouse);
       }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1145687170


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   of course, I agree that for HadoopCatalog accessed separately, if no warehouse is set, an error should be thrown.
   but i think `spark.sql.warehouse.dir` should have same behavior for all catalogs in spark,  so this behavior i think is ok.
   
   if you are still concerned about the possibility of breaking the previous behavior, I can change this logic to only take effect for HiveCatalog. I would like to hear your opinion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166960617


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   Shouldn't the property key be ```HiveConf.ConfVars.METASTOREWAREHOUSE.varname``` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1174057983


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+  }
+
+  @Test
+  public void testCatalogWithSparkSqlWarehouseDir() {
+    spark.conf().unset("spark.sql.catalog." + catalogName + ".warehouse");

Review Comment:
   Looks like you figured it out. Like I was mentioning what you want to do in this case is make a brand new session (not a new context). You can have multiple sessions per spark context, each with different runtime configuration. So when we want to do something within a test to configure a session differently you can just call "newSession" at the beginning and use that new session in the test.



##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+  }
+
+  @Test
+  public void testCatalogWithSparkSqlWarehouseDir() {
+    spark.conf().unset("spark.sql.catalog." + catalogName + ".warehouse");

Review Comment:
    Like I was mentioning what you want to do in this case is make a brand new session (not a new context). You can have multiple sessions per spark context, each with different runtime configuration. So when we want to do something within a test to configure a session differently you can just call "newSession" at the beginning and use that new session in the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166556814


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));

Review Comment:
   thanks for review.
   1. i found SparkSessionCatalog will away use `spark.sql.warehouse.dir`,  so I mistakenly thought that SparkSesssionCatalog was only implemented using SparkInnerHive even for the iceberg table, and now the behavior in the code is indeed the same, which confuses me,  So it seems that this is not in line with expectations, what do you think is the reason for this happening?
   ![image](https://user-images.githubusercontent.com/36697727/232000160-551ab282-55f7-46eb-bb36-c91bb46bbdca.png)
   
   2. I will change the judgment of table location, thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164364409


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));

Review Comment:
   So two things here
   1. I don't follow why "spark_catalog" should be any different here since we are still using our SparkSessionCatalog class.
   2. We need to make sure that the prefix and suffix are correct in the path. It should start with our specified location and end with our db + tb



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1111837244


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -127,6 +128,16 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    if (conf.get(CatalogProperties.WAREHOUSE_LOCATION) != null) {
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION, conf.get(CatalogProperties.WAREHOUSE_LOCATION));
+    } else if (optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()) != null) {
+      // if spark.sql.catalog.$catalogName.warehouse no setting, will try use
+      // spark.sql.warehouse.dir as catalog warehouse.
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION,
+          optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()));
+    }

Review Comment:
   Supports `spark.sql.warehouse.dir` and assumes that `spark.sql.catalog.$catalogName.warehouse` is preferred.
   
   I think it is more general to pass the warehouse path through `CatalogProperties.WAREHOUSE_LOCATION`? 
   Because both `HadoopCatalog` and `HiveCatalog` are adapted to `CatalogProperties.WAREHOUSE_LOCATION`.
   
   > StaticSQLConf.WAREHOUSE_PATH().key() = spark.sql.warehouse.dir
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1112465709


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -127,6 +128,16 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    if (conf.get(CatalogProperties.WAREHOUSE_LOCATION) != null) {
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION, conf.get(CatalogProperties.WAREHOUSE_LOCATION));
+    } else if (optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()) != null) {
+      // if spark.sql.catalog.$catalogName.warehouse no setting, will try use
+      // spark.sql.warehouse.dir as catalog warehouse.
+      optionsMap.put(
+          CatalogProperties.WAREHOUSE_LOCATION,
+          optionsMap.get(StaticSQLConf.WAREHOUSE_PATH().key()));
+    }

Review Comment:
   good hint,   I have modified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1146731902


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   I think it's fine to use that behavior everywhere, I don't really think we would break anyone's old flows since they are already are specifying the location at the catalog property level. Allowing this as a fall back seems pretty reasonable to me.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164366370


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());

Review Comment:
   Also why do we need to delete the location here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1136049243


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +134,14 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    // if spark.sql.catalog.$catalogName.warehouse no setting, will try use

Review Comment:
   I'm not sure this comment adds any thing to the code here. I would probably remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1180954110


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.spark.sql.SparkSession;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();
+    SparkSession.clearDefaultSession();

Review Comment:
   thanks for your guidance~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#issuecomment-1475718099

   > This needs some testing of the two new behaviors.
   
   Thanks for the review @RussellSpitzer , i add tests for the two new behaviors and the modification above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1145687170


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   of course, I agree that for HadoopCatalog accessed separately, if no warehouse is set, an error should be thrown.
   but i think `spark.sql.warehouse.dir` should have same behavior for all catalogs in spark,  so this behavior i think is ok.
   
   if you are still concerned about the possibility of breaking the previous behavior, I can change `spark.sql.warehouse.dir` default warehouse to only take effect for HiveCatalog. I would like to hear your opinion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1142340365


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkCatalogWarehouse.java:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import java.util.Map;
+import org.apache.iceberg.hive.HiveCatalog;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestSparkCatalogWarehouse extends SparkCatalogTestBase {
+
+  public TestSparkCatalogWarehouse(

Review Comment:
   A few notes here
   
   1. These tests probably belong in org.apache.iceberg.spark.TestCreateTable
   2. We Probably don't need to actually parameterize these tests since we can just make new catalogs in the tests by setting new parameters. This would also make it clear what tests were doing what.
   3. I think we need to cover the following cases, (TestCreateTable is already parameterized so we just need to write these once to hit all catalog types)
      a. Create a table with an explicit location (we should have this already)
      b. Create a table with a catalog specific warehouse location
      c. Create a table with out any locations and use the Spark warehouse directory
   4. We prefer to use AssertJ for assertions, we rarely use naked asserts. See #7131 which has some new guidance
      
   For (a and b) the table location which is created must use the database name and table name in the location. For example if I make a table "foo.bar" and specify the catalog warehouse as "fs:///bazz", I should get table location "fs:///bazz/foo/bar"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1145687633


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkCatalogWarehouse.java:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import java.util.Map;
+import org.apache.iceberg.hive.HiveCatalog;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestSparkCatalogWarehouse extends SparkCatalogTestBase {
+
+  public TestSparkCatalogWarehouse(

Review Comment:
   I agree with your points, there is indeed something I haven't considered, thank you for review. i will change it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1142324504


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   Does this change the behavior of HadoopCatalogs? I think it means that while previously creating a Hadoop Catalog without a warehouse location would throw an error, now it won't. I think this behavior may not be good ... I don't feel as strongly about how Hadoop Tables behave so maybe this is fine.  What do you think? I think we do need to have a test though for this behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1136050938


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +134,14 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    // if spark.sql.catalog.$catalogName.warehouse no setting, will try use
+    // spark.sql.warehouse.dir as catalog warehouse.
+    final String warehouse =
+        Optional.ofNullable(conf.get(CatalogProperties.WAREHOUSE_LOCATION))

Review Comment:
   This on the other hand would be in the options



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1180760109


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(

Review Comment:
   Does this change the default behavior? What was the default value prior to this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1166952642


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);

Review Comment:
   ? I don't follow. Does the current namespace have a location URI? I didn't think it had one. Does default just always have a uri?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1176953144


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.spark.sql.SparkSession;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();
+    SparkSession.clearDefaultSession();

Review Comment:
   This doesn't actually change any settings. It just changes what is returned by SparkSession but since all the usages in the tests are based on "spark" which is a set val.
   
   So what I was suggesting if you wanted to clear things out what you would do is something like
   
   
   tempSession = spark.newSession()
   ```java
     /**
      * Start a new session with isolated SQL configurations, temporary tables, registered
      * functions are isolated, but sharing the underlying `SparkContext` and cached data.
      *
      * @note Other than the `SparkContext`, all shared state is initialized lazily.
      * This method will force the initialization of the shared state to ensure that parent
      * and child sessions are set up with the same shared state. If the underlying catalog
      * implementation is Hive, this will initialize the metastore, which may take some time.
      *
      * @since 2.0.0
      */
      ```
   // Change Config of tempSession
   // You can reset catalogManager here too if you like
   
   Basically every time you do this it makes a new sessionState object which means you get a brand new runtimeConfig as well as a brand new catalog manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1174053501


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkWarehouseLocation.java:
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.Map;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.spark.Spark3Util;
+import org.apache.iceberg.spark.SparkCatalogConfig;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class TestSparkWarehouseLocation extends SparkCatalogTestBase {
+
+  private String warehouseLocation;
+  private final String testNameSpace;
+  private final String testTableName;
+  private final TableIdentifier testTableIdentifier;
+
+  public TestSparkWarehouseLocation(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+    testNameSpace = (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    testTableName = testNameSpace + ".table";
+    testTableIdentifier = TableIdentifier.of("default1", "table");
+  }
+
+  @Before
+  public void createTestWarehouseLocation() throws Exception {
+    this.warehouseLocation = "file:" + temp.newFolder(catalogName).getPath();
+  }
+
+  @After
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", testTableName);
+    sql("DROP NAMESPACE IF EXISTS %s", testNameSpace);
+    spark.sessionState().catalogManager().reset();

Review Comment:
   I would probably add this to the Before method as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1180954866


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(

Review Comment:
   If it is HadoopCatalog, warehouse must be set; if it is HiveCatalog, the default value is HiveConf.ConfVars.METASTOREWAREHOUSE.varname; if it is SparkCatalog, the default value is spark.sql.warehouse.dir. Now if spark.sql.catalog is not set. $catalogName.warehouse, then spark.sql.warehouse.dir will be used. For HiveCatalog, if a namespace has been created, then warehouse will be a certain value. 
   So I think this will have no impact on previous performance .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#issuecomment-1528797306

   I have seen the latest error, and I will investigate it after my vacation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164370874


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));
+    } else {
+      Assert.assertTrue(table.location().startsWith(location));
+    }
+    sql("DROP TABLE IF EXISTS %s", testTableName);

Review Comment:
   I think this is covered in a "@After" method. Otherwise this would still leave a mess behind if the test failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164368723


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);

Review Comment:
   Why do we need a custom namespace here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1164369908


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);

Review Comment:
   We probably need to reset the catalogs after setting this property. otherwise other tests will hav init'd these catalogs in the catalogManager. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1172155596


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -132,6 +132,9 @@ protected Catalog buildIcebergCatalog(String name, CaseInsensitiveStringMap opti
     optionsMap.putAll(options.asCaseSensitiveMap());
     optionsMap.put(CatalogProperties.APP_ID, SparkSession.active().sparkContext().applicationId());
     optionsMap.put(CatalogProperties.USER, SparkSession.active().sparkContext().sparkUser());
+    optionsMap.putIfAbsent(
+        CatalogProperties.WAREHOUSE_LOCATION,
+        SparkSession.active().sqlContext().conf().warehousePath());

Review Comment:
   HiveCatalog will use CatalogProperties.WAREHOUSE_LOCATION to set to HiveConf.ConfVars.METASTOREWAREHOUSE.varname during initialization, so I think it is better to use CatalogProperties.WAREHOUSE_LOCATION here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6886: Spark: allows catalog.warehouse for Spark Hive Catalogs #6863

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.

youngxinler commented on code in PR #6886:
URL: https://github.com/apache/iceberg/pull/6886#discussion_r1148496671


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java:
##########
@@ -341,4 +343,42 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCatalogSpecificWarehouseLocation() throws Exception {
+    File warehouseLocation = temp.newFolder(catalogName);
+    Assert.assertTrue(warehouseLocation.delete());
+    String location = "file:" + warehouseLocation.getAbsolutePath();
+    spark.conf().set("spark.sql.catalog." + catalogName + ".warehouse", location);
+    String testNameSpace =
+        (catalogName.equals("spark_catalog") ? "" : catalogName + ".") + "default1";
+    String testTableName = testNameSpace + ".table";
+    TableIdentifier newTableIdentifier = TableIdentifier.of("default1", "table");
+
+    sql("CREATE NAMESPACE IF NOT EXISTS %s", testNameSpace);
+    sql("CREATE TABLE %s " + "(id BIGINT NOT NULL, data STRING) " + "USING iceberg", testTableName);
+
+    Table table = Spark3Util.loadIcebergCatalog(spark, catalogName).loadTable(newTableIdentifier);
+    if ("spark_catalog".equals(catalogName)) {
+      Assert.assertTrue(table.location().startsWith(spark.sqlContext().conf().warehousePath()));

Review Comment:
   SparkInnerSessionCatalog will awalys use `spark.sql.warehouse.dir` as create table locations.
   <img width="1212" alt="image" src="https://user-images.githubusercontent.com/36697727/227761770-7d6a00f2-4986-4a50-9920-3726dc1f9949.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org