You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/04 21:37:21 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #2297: Spark: Refactor BaseSparkAction

aokolnychyi opened a new pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297


   This PR refactors `BaseSparkAction` and prepares it for upcoming changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587845489



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, MetadataTableType type) {
     Preconditions.checkArgument(!LOAD_CATALOG.isNoop(), "Cannot find Spark3Util class but Spark3 is in use");
     return LOAD_CATALOG.asStatic().invoke(spark, tableName, type);
   }
 
-  protected static Dataset<Row> loadMetadataTable(SparkSession spark, String tableName, String tableLocation,
-                                                  MetadataTableType type) {
+  protected Dataset<Row> loadMetadataTable(Table table, MetadataTableType type) {

Review comment:
       Previously, there was an inconsistency in what values we passed to this method.
   
   For example, we could pass `tableName` as the metadata location but `tableLocation` as the root table location. I moved it to use the `Table` object directly to avoid any surprises.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#issuecomment-793241289


   Thanks for reviewing, @rymurr and @RussellSpitzer!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587844230



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -48,7 +48,21 @@
 
 abstract class BaseSparkAction<ThisT, R> implements Action<ThisT, R> {
 
-  protected abstract Table table();
+  private final SparkSession spark;

Review comment:
       Previously, we always assumed there is going to be a table we operate on. That's not the case with actions like snapshot and migrate. Plus, it does not make sense to init the Java context in all actions. We can do this in parent.

##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -48,7 +48,21 @@
 
 abstract class BaseSparkAction<ThisT, R> implements Action<ThisT, R> {
 
-  protected abstract Table table();
+  private final SparkSession spark;

Review comment:
       Previously, we always assumed there is going to be a table we operate on. That's not the case with actions like snapshot and migrate. Plus, it does not make sense to init the Java context in all actions. We can do this in the parent class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587846233



##########
File path: spark/src/main/java/org/apache/iceberg/actions/RewriteManifestsAction.java
##########
@@ -199,11 +191,11 @@ public RewriteManifestsActionResult execute() {
   }
 
   private Dataset<Row> buildManifestEntryDF(List<ManifestFile> manifests) {
-    Dataset<Row> manifestDF = spark
+    Dataset<Row> manifestDF = spark()
         .createDataset(Lists.transform(manifests, ManifestFile::path), Encoders.STRING())
         .toDF("manifest");
 
-    Dataset<Row> manifestEntryDF = BaseSparkAction.loadMetadataTable(spark, table.name(), table().location(), ENTRIES)

Review comment:
       No longer call a static method in `BaseSparkAction`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r589795403



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, MetadataTableType type) {

Review comment:
       I think this is ok, I'm trying to imagine if a non subclass will eventually need access to these style of functions, like in distributed planning, but I think this is good for now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587845948



##########
File path: spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -223,9 +218,10 @@ public ExpireSnapshotsActionResult execute() {
   }
 
   private Dataset<Row> buildValidFileDF(TableMetadata metadata) {
-    return appendTypeString(buildValidDataFileDF(spark, metadata.metadataFileLocation()), DATA_FILE)
-        .union(appendTypeString(buildManifestFileDF(spark, metadata.metadataFileLocation()), MANIFEST))
-        .union(appendTypeString(buildManifestListDF(spark, metadata.metadataFileLocation()), MANIFEST_LIST));
+    Table staticTable = newStaticTable(metadata, table.io());

Review comment:
       This now uses a static table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587844798



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, MetadataTableType type) {

Review comment:
       This does not have to be static anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587845046



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {

Review comment:
       I removed SparkSession from args in all these methods to simplify it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r589793614



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, MetadataTableType type) {
     Preconditions.checkArgument(!LOAD_CATALOG.isNoop(), "Cannot find Spark3Util class but Spark3 is in use");
     return LOAD_CATALOG.asStatic().invoke(spark, tableName, type);
   }
 
-  protected static Dataset<Row> loadMetadataTable(SparkSession spark, String tableName, String tableLocation,
-                                                  MetadataTableType type) {
+  protected Dataset<Row> loadMetadataTable(Table table, MetadataTableType type) {

Review comment:
       I think this is a good call




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#issuecomment-790965238


   cc @RussellSpitzer @karuppayya @rdblue @rdsr @shardulm94 @rymurr 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2297: Spark: Refactor BaseSparkAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2297:
URL: https://github.com/apache/iceberg/pull/2297#discussion_r587845489



##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -101,47 +117,43 @@
     return allManifests.flatMap(new ReadManifest(ioBroadcast), Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestFileDF(SparkSession spark, String tableName) {
-    return loadMetadataTable(spark, tableName, table().location(), ALL_MANIFESTS).selectExpr("path as file_path");
+  protected Dataset<Row> buildManifestFileDF(Table table) {
+    return loadMetadataTable(table, ALL_MANIFESTS).selectExpr("path as file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, Table table) {
+  protected Dataset<Row> buildManifestListDF(Table table) {
     List<String> manifestLists = getManifestListPaths(table.snapshots());
     return spark.createDataset(manifestLists, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildManifestListDF(SparkSession spark, String metadataFileLocation) {
-    StaticTableOperations ops = new StaticTableOperations(metadataFileLocation, table().io());
-    return buildManifestListDF(spark, new BaseTable(ops, table().name()));
-  }
-
-  protected Dataset<Row> buildOtherMetadataFileDF(SparkSession spark, TableOperations ops) {
+  protected Dataset<Row> buildOtherMetadataFileDF(TableOperations ops) {
     List<String> otherMetadataFiles = getOtherMetadataFilePaths(ops);
     return spark.createDataset(otherMetadataFiles, Encoders.STRING()).toDF("file_path");
   }
 
-  protected Dataset<Row> buildValidMetadataFileDF(SparkSession spark, Table table, TableOperations ops) {
-    Dataset<Row> manifestDF = buildManifestFileDF(spark, table.name());
-    Dataset<Row> manifestListDF = buildManifestListDF(spark, table);
-    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(spark, ops);
+  protected Dataset<Row> buildValidMetadataFileDF(Table table, TableOperations ops) {
+    Dataset<Row> manifestDF = buildManifestFileDF(table);
+    Dataset<Row> manifestListDF = buildManifestListDF(table);
+    Dataset<Row> otherMetadataFileDF = buildOtherMetadataFileDF(ops);
 
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
   // Attempt to use Spark3 Catalog resolution if available on the path
   private static final DynMethods.UnboundMethod LOAD_CATALOG = DynMethods.builder("loadCatalogMetadataTable")
-      .hiddenImpl("org.apache.iceberg.spark.Spark3Util",
-          SparkSession.class, String.class, MetadataTableType.class)
+      .hiddenImpl("org.apache.iceberg.spark.Spark3Util", SparkSession.class, String.class, MetadataTableType.class)
       .orNoop()
       .build();
 
-  private static Dataset<Row> loadCatalogMetadataTable(SparkSession spark, String tableName, MetadataTableType type) {
+  private Dataset<Row> loadCatalogMetadataTable(String tableName, MetadataTableType type) {
     Preconditions.checkArgument(!LOAD_CATALOG.isNoop(), "Cannot find Spark3Util class but Spark3 is in use");
     return LOAD_CATALOG.asStatic().invoke(spark, tableName, type);
   }
 
-  protected static Dataset<Row> loadMetadataTable(SparkSession spark, String tableName, String tableLocation,
-                                                  MetadataTableType type) {
+  protected Dataset<Row> loadMetadataTable(Table table, MetadataTableType type) {

Review comment:
       Previously, there was an inconsistency in what values we passed to this method.
   
   For example, we could pass tableName as the metadata location but table location as the root table location. I moved it to use the Table object directly to avoid any surprises.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org