You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/01 03:53:33 UTC

[GitHub] [iceberg] SinghAsDev opened a new pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

SinghAsDev opened a new pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018


   Users accustomed to hive's external and managed table behavior may accidentally delete data without this change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1029266880


   I'm -1 for changing this in the Iceberg library. The right way to do this is to customize your catalog.
   
   Iceberg has no concept of an "external" table and although it does define `gc.enabled`, I don't think it is a good idea to conflate the two. The `gc.enabled` property is set when it isn't safe to drop table files. For example, when files are shared across branches in Nessie, it isn't safe to delete files using the metadata of any single branch. Imported tables also set this flag because the files may be "owned" by another table. These are narrow cases and `gc.enabled` is a safety valve.
   
   In normal operation, you never want users accidentally turning on `gc.enabled` because it will prevent old snapshots from getting cleaned up and will effectively keep all versions of the table forever. That's really bad.
   
   If the goal here is to avoid purging table data on delete, then there is another way to do that. `gc.enabled` is not the right mechanism to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1026761547


   In the newer versions of Hive, there is a table property `external.table.purge` which if true, the table data will be deleted upon dropping the table, just like for managed tables. For Iceberg tables created by the Hive engine, we automatically append this property `external.table.purge=true` (and set the gc.enabled to true as well accordingly).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1034124842


   Actually it looks like #3056 is actually intending to change existing drop table behavior. In that case, I agree that's a better way to handle this. Closing this. Thanks all for great discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1026761547


   In the newer versions of Hive, there is a table property `external.table.purge` which if true, the external table data will be deleted upon dropping the table, just like for managed tables. For Iceberg tables created by the Hive engine, we automatically append this property `external.table.purge=true` (and set the gc.enabled to true as well accordingly).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev closed pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev closed pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1027466195


   Hi @SinghAsDev , I see now, if I understand this is making gc.enabled synonymous with another table property 'external', but only in case of HiveCatalog.
   
   It sounds reasonable to me, but is it too specific for Spark -> HiveCatalog setup?  As there's several clients (Spark, Hive, Flink, Trino) and several catalogs (Hive, Glue, JDBC, REST?), I wonder if we have to make 'external' a recognized Iceberg table property across all catalogs to make it standard. 
   
   Just thinking aloud, would an easier fix for Spark's Catalog to just do the mapping from 'external' => 'gc.enabled', like Hive is doing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1034007518


   Thanks @rdblue for detailed context. I think it is fine to leave the behavior as is in Iceberg, especially because it sounds like existing users are not concerned about this?
   
   Correct me if I am wrong, but #3056 adds a way to explicitly delete all, purge, data along with table deletion. However, it does not change the behavior we are discussing on this thread where a delete table on `external` table would delete data (which may not be intention of the external table user).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1027165582


   Hi @marton-bod , thanks for the comment. Couple of follow ups.
   
   1. What you said is definitely correct that we translate `gc.enabled` behavior to hive's `external.table.purge` behavior while creating hms table in hive catalog. However, the issue I am pointing to is that `external` table behavior is not translated to `gc.enabled` behavior. The issue should be evident from unit tests added in this PR, but let me also add a bit of clarification to the PR summary.
   2. `external.table.purge` is only supported for >= Hive 4.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev edited a comment on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev edited a comment on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1027165582


   Hi @marton-bod , thanks for the comment. A few follow ups.
   
   1. What you said is definitely correct that we translate `gc.enabled` behavior to hive's `external.table.purge` behavior while creating hms table in hive catalog. However, the issue I am pointing to is that `external` table behavior is not translated to `gc.enabled` behavior. The issue should be evident from unit tests added in this PR, but let me also add a bit of clarification to the PR summary.
   2. `external.table.purge` is only supported for >= Hive 4.x.
   3. I think I should update the diff to also check for `external.table.purge` property before turning on/off gc.enabled. What do you think @marton-bod ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1028204110


   @SinghAsDev sure, I was actually thinking more the other side (catalogs), what if a Spark user wants to use a non-Hive catalog, like hadoop-catalog, glue-catalog, then they would get a different behavior from the Iceberg table when trying to drop?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1027976504


   @SinghAsDev Thanks for your answer, I think you're raising a valid point here.
   As I see it, there are two separate issues:
   - When creating external tables using the Hive engine, Hive appends `external.table.purge=true` to the table properties behind the scenes in `HiveIcebergMetaHook`. This goes against user expectations even in Hive 4.0+. I'll prepare a PR to fix that up on the Hive engine side (which should fix your Hive test failures presumably).
   - On the HiveCatalog-level, how do we want to handle different combinations of external/managed tables with the gc.enabled flag. I think it makes sense to disallow the MANAGED + `gc.enabled=false` combination. But I would not disallow the EXTERNAL + `gc.enabled=true` combination, exactly because Hive 4.0 makes this combination valid via the `external.table.purge=true` flag
   
   Please let me know what you think, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1028175781


   @szehon-ho that is a good point and I was also thinking about it. However, AFAIK, among all the clients Iceberg is used from, only Spark and Hive support external table concept, right? As such, I feel it should be OK to just limit this to Hive catalog. What do you think?
   
   
   > When creating external tables using the Hive engine, Hive appends external.table.purge=true to the table properties behind the scenes in HiveIcebergMetaHook. This goes against user expectations even in Hive 4.0+. I'll prepare a PR to fix that up on the Hive engine side (which should fix your Hive test failures presumably).
   
   @marton-bod this is exactly I was planning to do test failures here as well. Let me know if you fix it as a separate PR and I can rebase.
   
   > On the HiveCatalog-level, how do we want to handle different combinations of external/managed tables with the gc.enabled flag. I think it makes sense to disallow the MANAGED + gc.enabled=false combination. But I would not disallow the EXTERNAL + gc.enabled=true combination, exactly because Hive 4.0 makes this combination valid via the external.table.purge=true flag
   
   @marton-bod for disallowing EXTERNAL + gc.enabled=true, I think it makes sense for Hive 4.0+, but below that I think it is expected to not allow that, right? Maybe we can make this configurable. Would that be OK with you?
   
   @szehon-ho @marton-bod @pvary @rdblue I think we are at least aligned on the issue and the need for changing the behavior? If not, please let me know if I can provide more info. Since this change is a behavior change, I am curious is there a set of steps like user communication that we usually do or do we restrict such changes to dot releases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#discussion_r796463941



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java
##########
@@ -279,4 +284,88 @@ public void testDowngradeTableToFormatV1ThroughTablePropertyFails() {
         "Cannot downgrade v2 table to v1",
         () -> sql("ALTER TABLE %s SET TBLPROPERTIES ('format-version'='1')", tableName));
   }
+
+  @Test
+  public void testCreateExternalTableDisablesGC() throws IOException {
+    Assume.assumeTrue("Only test hive catalog", catalogName.equals(SparkCatalogConfig.HIVE.catalogName()));
+    Assert.assertFalse("Table should not already exist", validationCatalog.tableExists(tableIdent));
+
+    sql("CREATE EXTERNAL TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg", tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertNotNull("Should load the new table", table);
+
+    File dataLocation = new File(table.location().replace("file:", "") + "/data");
+    Assert.assertFalse("No data should be there.", dataLocation.exists());
+
+    try {
+      sql("INSERT INTO %s VALUES (123456789, \"test string\")", tableName);
+      List<Object[]> expected = sql(String.format("SELECT * FROM %s", tableName));
+      Assert.assertEquals(1, expected.size());
+      Assert.assertEquals("Data should be there.", 2,
+          Objects.requireNonNull(dataLocation.listFiles()).length);
+
+      sql("DROP TABLE %s", tableName);
+      Assert.assertEquals("Data should not have been deleted.", 2,
+          Objects.requireNonNull(dataLocation.listFiles()).length);
+    } finally {
+      FileUtils.deleteDirectory(dataLocation.getParentFile());
+    }
+  }
+
+  @Test
+  public void testCreateExternalTableThrowsWhenGcSet() {
+    Assume.assumeTrue("Only test hive catalog", catalogName.equals(SparkCatalogConfig.HIVE.catalogName()));
+    Assert.assertFalse("Table should not already exist", validationCatalog.tableExists(tableIdent));
+
+    try {
+      sql("CREATE EXTERNAL TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg tblproperties(\"gc" +
+          ".enabled\"=\"true\")", tableName);
+      Assert.fail("Runtime exception should have been thrown.");
+    } catch (RuntimeException e) {
+      // expected, do nothing
+    }
+  }
+
+  @Test
+  public void testCreateManagedTableThrowsWhenGcNotSet() {
+    Assume.assumeTrue("Only test hive catalog", catalogName.equals(SparkCatalogConfig.HIVE.catalogName()));
+    Assert.assertFalse("Table should not already exist", validationCatalog.tableExists(tableIdent));
+
+    try {
+      sql("CREATE TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg tblproperties(\"gc" +
+          ".enabled\"=\"false\")", tableName);
+      Assert.fail("Runtime exception should have been thrown.");
+    } catch (RuntimeException e) {
+      // expected, do nothing
+    }
+  }
+
+  @Test
+  public void testCreateManagedTableEnablesGC() throws IOException {
+    Assume.assumeTrue("Only test hive catalog", catalogName.equals(SparkCatalogConfig.HIVE.catalogName()));
+    Assert.assertFalse("Table should not already exist", validationCatalog.tableExists(tableIdent));
+
+    sql("CREATE TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg", tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertNotNull("Should load the new table", table);
+
+    File dataLocation = new File(table.location().replace("file:", "") + "/data");
+    Assert.assertFalse("No data should be there.", dataLocation.exists());
+
+    try {
+      sql("INSERT INTO %s VALUES (123456789, \"test string\")", tableName);
+      List<Object[]> expected = sql(String.format("SELECT * FROM %s", tableName));
+      Assert.assertEquals(1, expected.size());
+      Assert.assertEquals("Data should be there.", 2,
+          Objects.requireNonNull(dataLocation.listFiles()).length);
+
+      sql("DROP TABLE %s", tableName);
+      Assert.assertEquals("Data should not have been deleted.", 0,
+          Objects.requireNonNull(dataLocation.listFiles()).length);

Review comment:
       Can you please add a test or into this test a section that than then rebuilds the droppd table on top of the existing files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1030073667


   Thanks for the thoughts @rdblue . I think what you said about iceberg not having a concept of 'external' is definitely a good point. However, Iceberg's spark catalog does allow spark sql users to create iceberg tables from spark sql, and spark sql supports the 'external' concept. This leads to the situation where unexpected dropping of data happen. I think we can maybe make not supporting of external tables explicit by throwing a informative exception during create tables in spark catalog. @rdblue would that be a reasonable approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1030907123


   I don't think it is a good idea to throw an exception and disallow `EXTERNAL`. That will break existing pipelines that use `EXTERNAL` in DDL and Iceberg really doesn't fit cleanly into the idea of `MANAGED` or `EXTERNAL`---the concept no longer fits and we can't choose one part of the definition (delete behavior) over another (overwrite behavior).
   
   I think there are 2 better ways to address this. First, https://github.com/apache/iceberg/pull/3056 adds `PURGE` and makes whether to delete data or not explicit at delete time. That would solve this problem because you're not trying to carry some state about whether to drop data through the lifetime of the table. Second, you can use your own catalog implementation and customize this behavior for your environment. Catalogs can be easily plugged in and are the intended way to change policy like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SinghAsDev commented on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
SinghAsDev commented on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1028220335


   @szehon-ho I see your point now, thanks. I think it makes sense to me to do this in Spark Catalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #4018: [HiveCatalog] Make Hive's external table behavior match with Iceberg table's gc.enabled behavior.

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #4018:
URL: https://github.com/apache/iceberg/pull/4018#issuecomment-1027976504


   @SinghAsDev Thanks for your answer, I think you're raising a valid point here.
   As I see it, there are two separate issues:
   - When creating external tables using the Hive engine, Hive appends `external.table.purge=true` to the table properties behind the scenes in `HiveIcebergMetaHook`. This goes against user expectations even in Hive 4.0+. I'll prepare a PR to fix that up on the Hive engine side (which should fix your Hive test failures presumably).
   - On the HiveCatalog-level, how do we want to handle different combinations of external/managed tables with the gc.enabled flag. I think it makes sense to disallow the MANAGED + `gc.enabled=false` combination. But I would not disallow the EXTERNAL + `gc.enabled=true` combination, exactly because Hive 4.0 makes this combination valid via the `external.table.purge=true` flag
   
   Please let me know what you think, thanks!
   cc @pvary 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org