You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/18 20:38:32 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #5301: Core: Support building custom tasks in ManifestGroup

aokolnychyi commented on code in PR #5301:
URL: https://github.com/apache/iceberg/pull/5301#discussion_r923837639


##########
core/src/main/java/org/apache/iceberg/ManifestGroup.java:
##########
@@ -279,4 +287,84 @@ public void close() throws IOException {
           }
         });
   }
+
+  abstract static class ScanTaskFactory<T extends ScanTask> {
+    private final String schemaAsString;
+    private final String specAsString;
+    private final DeleteFileIndex deletes;
+    private final ResidualEvaluator residuals;
+    private final boolean dropStats;
+
+    ScanTaskFactory(PartitionSpec spec, DeleteFileIndex deletes, ResidualEvaluator residuals, boolean dropStats) {
+      this.schemaAsString = SchemaParser.toJson(spec.schema());
+      this.specAsString = PartitionSpecParser.toJson(spec);
+      this.deletes = deletes;
+      this.residuals = residuals;
+      this.dropStats = dropStats;
+    }
+
+    abstract CloseableIterable<T> createTasks(CloseableIterable<ManifestEntry<DataFile>> entries);
+
+    String schemaAsString() {
+      return schemaAsString;
+    }
+
+    String specAsString() {
+      return specAsString;
+    }
+
+    DeleteFileIndex deletes() {
+      return deletes;
+    }
+
+    ResidualEvaluator residuals() {
+      return residuals;
+    }
+
+    boolean shouldKeepStats() {
+      return !dropStats;
+    }
+
+    abstract static class Builder<T extends ScanTask> {

Review Comment:
   I am not super happy with having a builder as it adds more complexity. However, I did it this way so that we can have a loading cache of task factories per spec in `ManifestGroup`. Right now, we parse schema and spec JSON representations for each manifest, which is not required. As those JSON objects can get pretty large, I feel doing the parsing once per spec is an important optimization.
   
   If we want to get rid of the builder, then I'll have to implement per spec caching in each task factory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org