You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/05/10 23:53:52 UTC

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1190476171


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {

Review Comment:
   Can we call helper method like entries() as you wrote below?  We can have entries() return CloseableIterable<GenericManifestEntry<? extends ContentFile<?>>>, and then do another CloseableIterable cast here from GenericManifestEntry to StructLike, which should be safe.



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.read(manifest, io, specsById).project(fileSchema).entries(),
+                  entry -> (GenericManifestEntry<DataFile>) entry);
+        } else {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.readDeleteManifest(manifest, io, specsById)
+                      .project(fileSchema)
+                      .entries(),
+                  entry -> (GenericManifestEntry<DeleteFile>) entry);
+        }
+
+        // Project non-readable fields
+        Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
+        StructProjection structProjection = StructProjection.create(readSchema, this.projection);
+        return CloseableIterable.transform(prunedRows, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+        int metricsPosition = projection.columns().indexOf(readableMetricsField);
+        // projection minus virtual column such as readableMetrics

Review Comment:
   Nit: I would remove this comment , I think the variable name undereneath is pretty clear , and there are no other virtual columns at the moment.



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.read(manifest, io, specsById).project(fileSchema).entries(),
+                  entry -> (GenericManifestEntry<DataFile>) entry);
+        } else {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.readDeleteManifest(manifest, io, specsById)
+                      .project(fileSchema)
+                      .entries(),
+                  entry -> (GenericManifestEntry<DeleteFile>) entry);
+        }
+
+        // Project non-readable fields
+        Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());

Review Comment:
   I feel code can be re-used here as well.  Can we make a method like:
   
       private StructProjection projectNonReadable(Schema fileSchema, Schema projection, GenericManifestEntry<? extends ContentFile<?>> entry) 
   
   and call it here and in the else case (line 202)



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;

Review Comment:
   No need to declare this outside if , right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org