You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/19 20:24:39 UTC

[GitHub] [iceberg] kbendick opened a new pull request #2617: Refine error message on most incremental scans on metadata tables

kbendick opened a new pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617


   Refines the error messages for metadata table scans that derive from `StaticTableScan` to be more user friendly, by putting the name of the metadata table in the error message
   
   This affects all scans of tables of most `MetadataTableType`s except for ` FilesTableScan` and `EntriesTableScan` (which could probably be refactored to also derive from `StaticTableScan` but currently derive from BaseTableScan directly). It also does not cover the aggregated ALL_* metadata table types, which derive rom `BaseAllMetadataTableScan` and for the most part implement an error message in the final derived classes themselves.
   
   This is to assist users who are testing incremental scans on a table `foo`, and might switch back and forth between querying `foo` and for example `foo.history` (to debug their development). Currently, the error message is just `Incremental scan is not supported`, which is not very clear when switching back and forth during development.
   
   This closes https://github.com/apache/iceberg/issues/2599
   
   cc @RussellSpitzer @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       **TLDR**
   
   It seems to me that the inheritance is mostly because of the time at which some of this code was written, but I believe it could (and should) be unified such that for each `MetadataTableType`:
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans:
   ```
   MetadataTableType  Current Scan Class Super Class         Refined Time Based Scan Specific Overrides
     ENTRIES                       BaseTableScan                                       None
     FILES                             BaseTableScan                                       None
     HISTORY                       StaticTableScan                                     None
     SNAPSHOTS                StaticTableScan                                      None
     MANIFESTS                  StaticTableScan                                     None
     PARTITIONS                 StaticTableScan                                     None
     ALL_DATA_FILES        BaseAllMetadataTableScan                   [asOfTime, useSnapshot](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
     ALL_MANIFESTS         BaseAllMetadataTableScan                  [asOfTime, useSnapshot](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
     ALL_ENTRIES               BaseAllMetadataTableScan                  None
     ```
   
   All of the above that have overrides are implemented in the final implementing class, as shown in the links.
   
   I think it would be much better to pass the actually scanned table name (and not just the table object, which references the Iceberg table and not its metadata table) to the two base scan classes, and then have consistent error messages for all which state the table name attempting to be scanned for users (as users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table).
   
   So I propose:
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends `StaticTableScan**`                                  None
     FILES                 **FilesTableScan extends StaticTableScan**                                 None
     HISTORY                 StaticTableScan                                     None
     SNAPSHOTS               StaticTableScan                                      None
     MANIFESTS               StaticTableScan                                     None
     PARTITIONS              StaticTableScan                                     None
     ALL_DATA_FILES        BaseAllMetadataTableScan                   _None -> Moved to BaseAllMetadataTableScan_
     ALL_MANIFESTS         BaseAllMetadataTableScan                  _None -> Moved to BaseAllMetadataTableScan_
     ALL_ENTRIES               BaseAllMetadataTableScan                  None
     ```
   
   
   
   If we decide to implement the functionality, then the final class can override it. This would cut down on code duplication and, imho, make the code much more readable.
   
   Passing the additional metadata (actual table name, as well as the metadata table type ideally) would not affect the serialization done on static table scans, as the information is only used when building the query (e.g. prior to distributing the information across the cluster for spark and likey all engines).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655764931



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Coming back to this now. I think that we probably don't want to make the `entries` and `files` tables use `StaticTableScan` because `StaticTableScan` assumes that there will be just one task created. The `entries` table is designed to actually create lots of tasks that can run in parallel and read when `rows` is called, in contrast to the static tables that read all necessary files on the driver and create a task that contains the serialized rows.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655764931



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Coming back to this now. I think that we probably don't want to make the `entries` and `files` tables use `StaticTableScan` because `StaticTableScan` assumes that there will be just one task created. The `entries` table is designed to actually create lots of tasks that can run in parallel and read when `rows` is called, in contrast to the static tables that read all necessary files on the driver and create a task that contains the serialized rows.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -26,7 +26,8 @@
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,
+                  Function<StaticTableScan, DataTask> buildTask) {

Review comment:
       Looks like this didn't need to change.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";

Review comment:
       Rather than overriding this several places and using a generic string here to avoid making this class abstract, I suggest passing the `MetadataTableType` in through the constructor so this just returns `tableType.name()`.

##########
File path: core/src/test/java/org/apache/iceberg/hadoop/TestStaticTable.java
##########
@@ -60,6 +60,19 @@ public void testCannotBeDeletedFrom() {
         () -> staticTable.newDelete().deleteFile(FILE_A).commit());
   }
 
+  @Test
+  public void testCannotDoIncrementalScanOnMetadataTable() {
+    table.newAppend().appendFile(FILE_A).commit();
+
+    for (MetadataTableType type : MetadataTableType.values()) {
+      Table staticTable = getStaticTable(type);
+      AssertHelpers.assertThrows("Static tables do not currently support incremental scans",

Review comment:
       I think that a better place for this test is now `TestMetadataTableScans` since it supports all metadata tables. Also, the context message still says "Static tables . . .".

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       This error message makes it sound like it is a combination of the table and the metadata table type, but no metadata tables support incremental scan (yet?).
   
   I think this should state that the metadata table does not support incremental scan: `"Cannot incrementally scan metadata table: %s"`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637799928



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I think so. Same thing for `EntriesTableScan`.
   
   Currently, `FilesTableScan` and `EntriesTableScan` inherit directly from `BaseTableScan`. However, they could easily be made to inherit from `StaticTableScan` and then just override the methods that they need.
   
   I wanted to keep this PR small so that it would be easy to digest, but I think that if we refactor, we could easily make `StaticTableScan` into the base class for all metadata table scans other than the aggregate metadata table scans (the ones that derive from `BaseAllMetadataTableScan`) - essentially making it into `BaseMetadataTableScan` to go along with `BaseAllMetadataTableScan`.
   
   Currently, `FilesTableScan` and `EntriesTableScan` inherit directly from `BaseTableScan`. However, they could easily be made to inherit from `StaticTableScan` and then just override the methods that they need. This way, the overrides like `useSnapshot` and `asOfTime` which are currently handled in a few places could all be handled in one place.
   
   Then, if we chose to implement that functionality for a particular metadata table type, we could just override in the actual implementing scan class.
   
   I can update this PR so that `FilesTableScan` and `EntiresTableScan` inherit from `StaticTableScan`, or I can add a similar update to `FilesTableScan` (and possibly `EntriesTableScan`) or handle them in a follow up PR.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r638298738



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Or if we prefer, I can just keep the structure as is for the other ones and override methods there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r649475404



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Yeah. We can do that. But `$table.name` will also need to be another abstract method. As the `table` object that is passed in doesn't reference the `#history` table or the specific metadata table (for the use cases where it's a metadata table).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r641260265



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;

Review comment:
       I went with `scannedTableName` as opposed to `entriesTableName` to match the same field name from `StaticScanClass`, as this class could be simplified by extending `StaticTableScan` in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       **TLDR**
   
   It seems to me that the inheritance is mostly because of the time at which some of this code was written, but I believe it could (and should) be unified such that for each `MetadataTableType`:
   1. The overrides occur in the base class for that metadata table type scan (either the aggregate `BaseAllMetadataTableScan` or the current `StaticTableScan`, which is only used for metadata table scans except ENTRIES and FILES).
   2. Metadata table scans that inherit from BaseTableScan are refactored to use `StaticTableScan` (which is feasible).
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   All of the above that have overrides are implemented in the final implementing class, as shown in the links.
   
   I think it would be much better to pass the actually scanned metadata table name (and not just the table object, which references the Iceberg table and not its metadata table) as well as the metadata table type (from the enum) to the two base scan classes, and then have consistent, enriched error messages for all metadata table types which state the table name attempting to be scanned for users (as users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 HistoryTableScan extends StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   
   
   If we decide to implement the snapshot specific functionality, then the actual concrete scan class can override it. This would cut down on code duplication and, imho, make the code much more readable and provide much more user friendly error messages.
   
   I also checked, and passing the additional metadata (actual table name, as well as the metadata table type) would not affect the serialization done on static table scans for Spark (or really any engine), as the information is only used when building the query (e.g. prior to distributing the information across the cluster which requires the serialization).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r635647971



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Should this PR also add a similar update to `FilesTableScan`? It also looks like we may need to update `BaseAllMetadataTableScan`. I see that `AllManifestsTableScan` overrides `useSnapshot` and `asOfTime` but not the incremental methods that also don't work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#issuecomment-866107409






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#issuecomment-861010834


   I've run the tests locally on Java 8 and Java 11 and I think the test failure is unrelated. It seems to be some sort of HMS connection failure.
   
   I'm also going to rebase to master locally and see if the current state of the tests is an issue, but I think this is just a plain old HMS transient failure.
   
   ```
   > Task :iceberg-hive-metastore:test
   org.apache.iceberg.hive.TestHiveCommitLocks > testLockAcquisitionAfterRetries FAILED
       java.lang.RuntimeException: Metastore operation failed for hivedb.tbl
           Caused by:
           NoSuchObjectException(message:hivedb: No current connection.)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637799928



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I think so. Same thing for `EntriesTableScan`.
   
   Currently, `FilesTableScan` and `EntriesTableScan` inherit directly from `BaseTableScan`. However, they could easily be made to inherit from `StaticTableScan` and then just override the specific methods that they need.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r641257413



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I hit all of the existing places you mentioned @rdblue. I'm going to resolve this conversation.
   
   I chose not to refactor the current scan classes that extend `BaseScanClass` to use `StaticScanClass` in hopes of making this PR more digestible.
   
   I can do that in a follow up if needed be.
   
   But this now passes the unit test for the refined error message for every metadata table.
   
   Feel free to open this conversation back up if you have any concerns. 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r650303265



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Yep that's what I was suggesting. If we really wanted it you could just do table
   
   History scan of table $tableName.$metadataIdentifier (which would only be wrong for hadoop tables)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655790772



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       I moved the argument to the constructor to get rid of the default of `"static"`.
   
   As `BaseAllMetadataTable`'s `tableType` method is already abstract, I left it as is (but I'm happy to change it to be consistent).
   
   I changed the error message universally to `Cannot incrementally scan table of type $tableType`. Please let me know if there's anything else you'd like me to update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655767282



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";

Review comment:
       Rather than overriding this several places and using a generic string here to avoid making this class abstract, I suggest passing the `MetadataTableType` in through the constructor so this just returns `tableType.name()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637831166



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I can update this PR so that `FilesTableScan` and `EntiresTableScan` inherit from `StaticTableScan`, or I can add a specific override without touching the inheritance further (or at all in this PR) for the ones you mentioned, or handle some of it in a follow up PR.
   
   I know that's a lot, but I've looked over it quite a bit and I think this would make things a lot cleaner. There's a decent amount of code that could be refactored and slimmed down to be more DRY.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655821558



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";

Review comment:
       That makes sense. We're going to see a `table type static` in a notebook eventually this way and it's going to be not very comprehensible.
   
   Unfortunately this method can't be abstract, so the constructor it is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655766844



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -26,7 +26,8 @@
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,
+                  Function<StaticTableScan, DataTask> buildTask) {

Review comment:
       Looks like this didn't need to change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655699018



##########
File path: core/src/main/java/org/apache/iceberg/AllDataFilesTable.java
##########
@@ -84,6 +84,11 @@ private AllDataFilesTableScan(TableOperations ops, Table table, Schema schema, S
       this.fileSchema = fileSchema;
     }
 
+    @Override
+    protected String tableType() {
+      return String.valueOf(MetadataTableType.ALL_DATA_FILES);

Review comment:
       I think these can all be .name() to get the value we are looking for




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r649492287



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Why do we need that when the subclass knows what kind of metadata table it is?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       **TLDR**
   
   It seems to me that the inheritance is mostly because of the time at which some of this code was written, but I believe it could (and should) be unified such that for each `MetadataTableType`:
   1. The overrides occur in the base class for that metadata table type scan (either the aggregate `BaseAllMetadataTableScan` or the current `StaticTableScan`, which is only used for metadata table scans except ENTRIES and FILES).
   2. Metadata table scans that inherit from BaseTableScan are refactored to use `StaticTableScan` (which is feasible).
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   All of the above that have overrides are implemented in the final implementing class, as shown in the links.
   
   I think it would be much better to pass more info to the two base scan classes for metadata tables, to have consistent, enriched and user-friendly error messages:
   1- the actually scanned metadata table name (and not just the table object, which references the Iceberg table and not its metadata table)
   2 - the metadata table type (from the enum)
   
   Users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table, and then reach out for help or spend a few minutes realizing where they went wrong and it would be a big quality of life upgrade (as well as making the code much more intentional).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   
   
   If we decide to implement the snapshot specific functionality, then the actual concrete scan class can override it. This would cut down on code duplication and, imho, make the code much more readable and provide much more user friendly error messages.
   
   I also checked, and passing the additional metadata (actual table name, as well as the metadata table type) would not affect the serialization done on static table scans for Spark (or really any engine), as the information is only used when building the query (e.g. prior to distributing the information across the cluster which requires the serialization).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick removed a comment on pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick removed a comment on pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#issuecomment-861010834


   I've run the tests locally on Java 8 and Java 11 and I think the test failure is unrelated. It seems to be some sort of HMS connection failure.
   
   I'm also going to rebase to master locally and see if the current state of the tests is an issue, but I think this is just a plain old HMS transient failure.
   
   ```
   > Task :iceberg-hive-metastore:test
   org.apache.iceberg.hive.TestHiveCommitLocks > testLockAcquisitionAfterRetries FAILED
       java.lang.RuntimeException: Metastore operation failed for hivedb.tbl
           Caused by:
           NoSuchObjectException(message:hivedb: No current connection.)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r650294316



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I guess we don't, especially if we want it to say `Incremental scan is not supported for HISTORY scan of table $table.name", where "$table.name" would be "foo" and not "foo#history".
   
   I would be ok with that.
   
   I think I misunderstood your original request for the change in the wording (it will no longer say "table foo#history", but the "for HISTORY scan of table foo" will suffice) 👍 

##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       So that makes sense.
   
   I got it to work with `Incremental scan is not supported for %s scan of table %s` with the first one being all possible values in `MetadataTableType.values()` and the table always being `table().name()` (e.g. without the $metadataIdentifier).
   
   However, not having the actual scanned table name still leaves us without a solution to this issue, that the events emitted reference the table itself and are potentially incorrect. https://github.com/apache/iceberg/issues/2635
   
   Maybe we can meet Monday or Tuesday and discuss this further? Its possible https://github.com/apache/iceberg/issues/2635 is not a bug. Maybe you can help me better understand the use case for the events, such as `ScanEvent`?
   
   It would seem to me that if we emit a `ScanEvent` when doing an scan on one of the AllMetadataTables, we'd want the emitted scan event to use the scanned table name (e.g. $tableName[.|#]$metadataIdentifier), to distinguish from a scan event on the table's data itself. But possibly that's not necessary?
   
   I can push the updated code and then roll back if we decide not to go with it after we talk.

##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Actually, I think that this is fine.
   
   I'm not too sure I fully understand the use case of `ScanEvent`, so I'm not going to worry about whether or not it's a bug at this moment in time.
   
   I agree that it's more consistent with the rest of the code to be using a method that's overridden vs a constructor parameter.
   
   I have updated to do that, and for the few classes inherit directly from `BaseTableScan`, I've used the same named helper function to keep things somewhat consistent.

##########
File path: core/src/main/java/org/apache/iceberg/HistoryTable.java
##########
@@ -82,6 +82,11 @@ private DataTask task(TableScan scan) {
       // override planFiles to avoid the check for a current snapshot because this metadata table is for all snapshots
       return CloseableIterable.withNoopClose(HistoryTable.this.task(this));
     }
+
+    @Override
+    protected String tableType() {
+      return String.valueOf(HistoryTable.this.metadataTableType());

Review comment:
       For scans that inherit from `StaticTableScan`, it's possible to get the table type from the surrounding class's method.
   
   For the entries that extend `BaseAllMetadataTableScan`, it's not possible to call methods from the surrounding class so I used the metadata table type enum values directly (such as [here](https://github.com/apache/iceberg/pull/2617/files#diff-c106422c51344f58d260f5578bfc8e720106dcd441f8348660155bb0b331062bR87-R91)).
   
   Not really in love with one style or another, though this one here would be harder to mess up in a refactor as it's more DRY (at the expense of being a bit more verbose).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655768624



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       This error message makes it sound like it is a combination of the table and the metadata table type, but no metadata tables support incremental scan (yet?).
   
   I think this should state that the metadata table does not support incremental scan: `"Cannot incrementally scan metadata table: %s"`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r648658993



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;
+
+    EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName) {
       super(ops, table, schema);
+      this.scannedTableName = scannedTableName;
     }
 
-    private EntriesTableScan(TableOperations ops, Table table, Schema schema, TableScanContext context) {
+    private EntriesTableScan(TableOperations ops, Table table, Schema schema, String scannedTableName,
+                             TableScanContext context) {
       super(ops, table, schema, context);
+      this.scannedTableName = scannedTableName;
+    }
+
+    @Override
+    public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+    }
+
+    @Override
+    public TableScan appendsAfter(long fromSnapshotId) {
+      throw new UnsupportedOperationException(
+          String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       Could we switch the constructor change to a abstract method? Then every subclass could just add in their own table type something like?
   
   String.format("Incremental scan is not supported for $tableType scan of table $table.name")
   
   Where tableType is defined by subclasses and table is just the table




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655706127



##########
File path: core/src/main/java/org/apache/iceberg/AllDataFilesTable.java
##########
@@ -84,6 +84,11 @@ private AllDataFilesTableScan(TableOperations ops, Table table, Schema schema, S
       this.fileSchema = fileSchema;
     }
 
+    @Override
+    protected String tableType() {
+      return String.valueOf(MetadataTableType.ALL_DATA_FILES);

Review comment:
       That's a great call. The `String.valueOf` was making me pretty sick tbh.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       Correct. No metadata table supports incremental scan.
   
   And it would be unlikely that they would as they all typically have snapshotId in them and time columns, though possibly as a convenience.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       I moved the argument to the constructor to get rid of the default of `"static"`.
   
   As `BaseAllMetadataTable`'s `tableType` method is already abstract, I left it as is.
   
   I changed the error message universally to `Cannot incrementally scan table of type $tableType`. Please let me know if there's anything else you'd like me to update.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -26,7 +26,8 @@
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,
+                  Function<StaticTableScan, DataTask> buildTask) {

Review comment:
       It didn't (was an artifact from a previous refactor), but now with the addition of the `tableType` parameter, it does.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";

Review comment:
       That makes sense. We're going to see a `table type static` in a notebook eventually this way and it's going to be not very comprehensible.
   
   Unfortunately this method can't be abstract, so the constructor it is.

##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       I moved the argument to the constructor to get rid of the default of `"static"`.
   
   As `BaseAllMetadataTable`'s `tableType` method is already abstract, I left it as is (but I'm happy to change it to be consistent).
   
   I changed the error message universally to `Cannot incrementally scan table of type $tableType`. Please let me know if there's anything else you'd like me to update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655790772



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       I moved the argument to the constructor to get rid of the default of `"static"`.
   
   As `BaseAllMetadataTable`'s `tableType` method is already abstract, I left it as is.
   
   I changed the error message universally to `Cannot incrementally scan table of type $tableType`. Please let me know if there's anything else you'd like me to update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I believe the `MetadataTableType` scans could be unified into just `StaticTableScan` and `BaseAllMetadataTableScan`, so that the overrides could be done in two places, with the names and metadata types provided to the user for the error message.
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot / Time / Incremental Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   The two that have overrides are implemented in the final implementing scan classes.
   
   To have consistent, enriched, user-friendly error messages and to ensure we catch all cases, we'd need to pass the scanned metadata table name (as it can technically be overridden - and is in some tests) and the MetadataTableType.
   
   Users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table, and then reach out for help or spend a few minutes realizing where they went wrong and it would be a big quality of life upgrade (as well as making the code much more intentional).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   If we decide to implement any of the snapshot / time / incremental scan specific functionality, then the actual concrete scan class can override it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655781667



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -37,6 +38,28 @@ private StaticTableScan(TableOperations ops, Table table, Schema schema,
     this.buildTask = buildTask;
   }
 
+  /**
+   * Type of scan being performed by the buildTask, such as {@link MetadataTableType#HISTORY} when scanning
+   * a table's {@link org.apache.iceberg.HistoryTable}.
+   * <p>
+   * Used for logging and error messages.
+   */
+  protected String tableType() {
+    return "static";
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for %s scan of table %s", tableType(), table().name()));

Review comment:
       Correct. No metadata table supports incremental scan.
   
   And it would be unlikely that they would as they all typically have snapshotId in them and time columns, though possibly as a convenience.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       **TLDR**
   
   It seems to me that the inheritance is mostly because of the time at which some of this code was written, but I believe it could (and should) be unified such that for each `MetadataTableType`:
   1. The overrides occur in the base class for that metadata table type scan (either the aggregate `BaseAllMetadataTableScan` or the current `StaticTableScan`, which is only used for metadata table scans except ENTRIES and FILES).
   2. Metadata table scans that inherit from BaseTableScan are refactored to use `StaticTableScan` (which is feasible).
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   All of the above that have overrides are implemented in the final implementing class, as shown in the links.
   
   I think it would be much better to pass the actually scanned metadata table name (and not just the table object, which references the Iceberg table and not its metadata table) as well as the metadata table type (from the enum) to the two base scan classes, and then have consistent, enriched error messages for all metadata table types which state the table name attempting to be scanned for users (as users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   
   
   If we decide to implement the snapshot specific functionality, then the actual concrete scan class can override it. This would cut down on code duplication and, imho, make the code much more readable and provide much more user friendly error messages.
   
   I also checked, and passing the additional metadata (actual table name, as well as the metadata table type) would not affect the serialization done on static table scans for Spark (or really any engine), as the information is only used when building the query (e.g. prior to distributing the information across the cluster which requires the serialization).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I believe the `MetadataTableType` scans could be unified into just `StaticTableScan` and `BaseAllMetadataTableScan`, so that the overrides could be done in two places, with the names and metadata types provided to the user for the error message.
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot / Time / Incremental Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   The two that have overrides are implemented in the final implementing scan classes.
   
   To have consistent, enriched, user-friendly error messages and to ensure we catch all cases, we'd need to pass the scanned metadata table name (as it can technically be overridden - and is in some tests) and the MetadataTableType.
   
   Users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table, and then reach out for help or spend a few minutes realizing where they went wrong and it would be a big quality of life upgrade (as well as making the code much more intentional).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   If we decide to implement the snapshot specific functionality, then the actual concrete scan class can override it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I believe the `MetadataTableType` scans could be unified into just `StaticTableScan` and `BaseAllMetadataTableScan`, so that the overrides could be done in two places, with the names and metadata types provided to the user for the error message.
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   The two that have overrides are implemented in the final implementing scan classes.
   
   To have consistent, enriched, user-friendly error messages and to ensure we catch all cases, we'd need to pass the scanned metadata table name (as it can technically be overridden - and is in some tests) and the MetadataTableType.
   
   Users encounter this problem often enough when switching back and forth between editing a snapshot specific query and inspecting a metadata table, and then reach out for help or spend a few minutes realizing where they went wrong and it would be a big quality of life upgrade (as well as making the code much more intentional).
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   If we decide to implement the snapshot specific functionality, then the actual concrete scan class can override it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I believe the `MetadataTableType` scans could be unified into just `StaticTableScan` and `BaseAllMetadataTableScan`, so that the overrides could be done in two places, with the names and metadata types provided to the user for the error message (which would need to be added as parameters, like I've done here for the table name).
   
   The current layout of refined error messages for time based / snapshot specific / incremental metadata table scans: 
   ```
   MetadataTableType  Current Scan Class Base        Snapshot / Time / Incremental Scan Overrides
     ENTRIES             BaseTableScan                        None
     FILES               BaseTableScan                        None
     HISTORY             StaticTableScan                      None
     SNAPSHOTS           StaticTableScan                      None
     MANIFESTS           StaticTableScan                      None
     PARTITIONS          StaticTableScan                      None
     ALL_DATA_FILES      BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_MANIFESTS       BaseAllMetadataTableScan     [asOfTime, useSnapshot]
     ALL_ENTRIES         BaseAllMetadataTableScan             None
     ```
     
     [AllDataFilesTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
   [AllManifestsTableScan specific overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
   
   The two that have overrides are implemented in the final implementing scan classes.
   
   So I propose changing the inheritance of these classes to match their `MetadataTableType` counterparts
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends StaticTableScan**
     FILES                 **FilesTableScan extends StaticTableScan**
     HISTORY                 StaticTableScan
     SNAPSHOTS               StaticTableScan
     MANIFESTS               StaticTableScan
     PARTITIONS              StaticTableScan
     ALL_DATA_FILES          BaseAllMetadataTableScan
     ALL_MANIFESTS           BaseAllMetadataTableScan
     ALL_ENTRIES             BaseAllMetadataTableScan
     ```
   
   If we decide to implement any of the snapshot / time / incremental scan specific functionality, then the actual concrete scan class can override it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655699018



##########
File path: core/src/main/java/org/apache/iceberg/AllDataFilesTable.java
##########
@@ -84,6 +84,11 @@ private AllDataFilesTableScan(TableOperations ops, Table table, Schema schema, S
       this.fileSchema = fileSchema;
     }
 
+    @Override
+    protected String tableType() {
+      return String.valueOf(MetadataTableType.ALL_DATA_FILES);

Review comment:
       I think these can all be .name() to get the value we are looking for




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655791424



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -26,7 +26,8 @@
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,
+                  Function<StaticTableScan, DataTask> buildTask) {

Review comment:
       It didn't (was an artifact from a previous refactor), but now with the addition of the `tableType` parameter, it does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655767760



##########
File path: core/src/test/java/org/apache/iceberg/hadoop/TestStaticTable.java
##########
@@ -60,6 +60,19 @@ public void testCannotBeDeletedFrom() {
         () -> staticTable.newDelete().deleteFile(FILE_A).commit());
   }
 
+  @Test
+  public void testCannotDoIncrementalScanOnMetadataTable() {
+    table.newAppend().appendFile(FILE_A).commit();
+
+    for (MetadataTableType type : MetadataTableType.values()) {
+      Table staticTable = getStaticTable(type);
+      AssertHelpers.assertThrows("Static tables do not currently support incremental scans",

Review comment:
       I think that a better place for this test is now `TestMetadataTableScans` since it supports all metadata tables. Also, the context message still says "Static tables . . .".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r641260265



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java
##########
@@ -69,18 +69,35 @@ MetadataTableType metadataTableType() {
 
   private static class EntriesTableScan extends BaseTableScan {
 
-    EntriesTableScan(TableOperations ops, Table table, Schema schema) {
+    private final String scannedTableName;

Review comment:
       I went with `scannedTableName` as opposed to `entriesTableName` to match `StaticScanClass`, as this class could be simplified by extending `StaticTableScan` which uses that name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r639331400



##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table %s", scannedTableName));

Review comment:
       I implemented the refined error message in all metadata table classes for `appendsBetween` and `appendsAfter`, and updated the test to test every single metadata table type.
   
   I did not know if I should go for `asOfTime` and `useSnapshot`, though I could pick those up too if you think that's a good idea @rdblue.
   
   I've also uncovered a bug in the emitted `ScanEvent` that will be very easy to fix once this is merged: https://github.com/apache/iceberg/issues/2635




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message when performing incremental scans on metadata tables

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r655706127



##########
File path: core/src/main/java/org/apache/iceberg/AllDataFilesTable.java
##########
@@ -84,6 +84,11 @@ private AllDataFilesTableScan(TableOperations ops, Table table, Schema schema, S
       this.fileSchema = fileSchema;
     }
 
+    @Override
+    protected String tableType() {
+      return String.valueOf(MetadataTableType.ALL_DATA_FILES);

Review comment:
       That's a great call. The `String.valueOf` was making me pretty sick tbh.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org