You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/02 08:10:59 UTC

[GitHub] [iceberg] zhangdove opened a new pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

zhangdove opened a new pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161


   I fixed the problem by doing some debugging and modifying some of the code.
   Isuues url:[1160](https://github.com/apache/iceberg/issues/1160)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-653936310


   Looks great, I'll merge this. Thanks @zhangdove!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449655771



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;
+    df.select("c1", "c2", "c3")
+            .write()

Review comment:
       Nit: Indentation is off. It should be 2 indents (4 spaces) from the indent of the line that is being continued, `df.select(...)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449656603



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;
+    df.select("c1", "c2", "c3")
+            .write()
+            .format("iceberg")
+            .mode("append")
+            .save(tableFileSystemPath);
+
+    df.write().mode("append").parquet(tableFileSystemPath + "/data");
+
+    Thread.sleep(1000);
+
+    long timestamp = System.currentTimeMillis();

Review comment:
       Minor: The other tests don't use a separate variable for this. Could this be embedded in the `olderThan` call?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r448850172



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'
+      return ((BaseTable) table()).operations().current().location() + "#" + type;

Review comment:
       Is possible to provide a unit test addressing your changes ? 
   Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449656410



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;
+    df.select("c1", "c2", "c3")
+            .write()
+            .format("iceberg")
+            .mode("append")
+            .save(tableFileSystemPath);
+
+    df.write().mode("append").parquet(tableFileSystemPath + "/data");
+
+    Thread.sleep(1000);
+
+    long timestamp = System.currentTimeMillis();
+
+    Actions actions = Actions.forTable(table);
+
+    List<String> result = actions.removeOrphanFiles()

Review comment:
       You can combine this with the previous line:
   
   ```java
   List<String> deletedFiles = Actions.forTable(table)
       .removeOrphanFiles()
       .olderThan(timestamp)
       .execute();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449655404



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'
+      return ((BaseTable) table()).operations().current().location() + "#" + type;

Review comment:
       Why not use `table().location()` instead? There's no need to cast to `BaseTable` and access `TableOperations`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-653817409


   Looks almost there. Just one more thing to fix and the comment to update. Thanks for fixing this, @zhangdove!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-653620001


   Thanks for fixing this, @zhangdove. I think you have the solution right. I'd just like for the test to be a bit cleaner.
   
   FYI @aokolnychyi.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
zhangdove commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449743798



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'
+      return ((BaseTable) table()).operations().current().location() + "#" + type;

Review comment:
       That sounds like a good idea to me. I'll fix it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449656810



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;

Review comment:
       Does this work using `table.location` instead of building the path here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
zhangdove commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-653833498


   Thanks for your review. I have fixed it. @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449808662



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +30,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // Load a path by HadoopCatalog or HadoopTables
+      return table().location() + "#" + type;
+    } else if (tableName.startsWith("hive.")) {
+      // HiveCatalog prepend a logical name which we need to drop for Spark 2.4
       return tableName.replaceFirst("(hadoop\\.)|(hive\\.)", "") + "." + type;

Review comment:
       Looks like this needs to be updated. There is no need to remove `hadoop.` if Hadoop tables don't use this code path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449656298



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;
+    df.select("c1", "c2", "c3")
+            .write()
+            .format("iceberg")
+            .mode("append")
+            .save(tableFileSystemPath);
+
+    df.write().mode("append").parquet(tableFileSystemPath + "/data");
+
+    Thread.sleep(1000);

Review comment:
       Can you add the comment that explains this line from the other test cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449808875



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'

Review comment:
       I made a suggestion. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449655939



##########
File path: spark2/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction.java
##########
@@ -542,4 +545,41 @@ public void testRemoveOrphanFilesWithRelativeFilePath() throws IOException, Inte
     Assert.assertEquals("Action should find 1 file", invalidFiles, result);
     Assert.assertTrue("Invalid file should be present", fs.exists(new Path(invalidFiles.get(0))));
   }
+
+  @Test
+  public void testRemoveOrphanFilesWithHadoopCatalog() throws InterruptedException {
+    HadoopCatalog catalog = new HadoopCatalog(new Configuration(), tableLocation);
+    String namespaceName = "testDb";
+    String tableName = "testTb";
+
+    Namespace namespace = Namespace.of(namespaceName);
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+    Table table = catalog.createTable(tableIdentifier, SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap());
+
+    List<ThreeColumnRecord> records = Lists.newArrayList(
+            new ThreeColumnRecord(1, "AAAAAAAAAA", "AAAA")
+    );
+    Dataset<Row> df = spark.createDataFrame(records, ThreeColumnRecord.class).coalesce(1);
+
+    String tableFileSystemPath = tableLocation + "/" + namespaceName + "/" + tableName;
+    df.select("c1", "c2", "c3")
+            .write()
+            .format("iceberg")
+            .mode("append")
+            .save(tableFileSystemPath);
+
+    df.write().mode("append").parquet(tableFileSystemPath + "/data");
+
+    Thread.sleep(1000);

Review comment:
       Why is there a call to sleep? This shouldn't be necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
zhangdove commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-653732552


   Would you mind going through it again at your convenience? @rdblue 
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
zhangdove commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449747864



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'

Review comment:
       Yes, just as you described.I have modified the comment, I wonder if you have a better one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449655254



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +31,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // HadoopCatalog tableName style is 'hadoop.ns.tb'

Review comment:
       I don't think this comment is correct because the solution is to convert to a path table reference.
   
   If i understand correctly, the problem in Spark 2.4 is that IcebergSource uses the Hive catalog or it uses HadoopTables to load a path. The table you're passing in was loaded by HadoopCatalog, so it doesn't work because that catalog is not available. I agree that the solution is to convert a Hadoop table to a path reference, but then this comment should explain what's happening: a HadoopCatalog was used to load the table, so convert to the path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
zhangdove commented on pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#issuecomment-652914516


   I added some unit test. Could you review it at your convenience? @openinx 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1161: Fix errors when use removeOrphanFiles By HadoopCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1161:
URL: https://github.com/apache/iceberg/pull/1161#discussion_r449808864



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseAction.java
##########
@@ -30,8 +30,11 @@ protected String metadataTableName(MetadataTableType type) {
     String tableName = table().toString();
     if (tableName.contains("/")) {
       return tableName + "#" + type;
-    } else if (tableName.startsWith("hadoop.") || tableName.startsWith("hive.")) {
-      // HiveCatalog and HadoopCatalog prepend a logical name which we need to drop for Spark 2.4
+    } else if (tableName.startsWith("hadoop.")) {
+      // Load a path by HadoopCatalog or HadoopTables

Review comment:
       How about "for HadoopCatalog tables, use the table location to load the metadata table because IcebergCatalog uses HiveCatalog when the table is identified by name".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org