You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/05/05 20:40:42 UTC

[GitHub] [iceberg] dramaticlly opened a new pull request, #7539: Implement ReadableMetrics for Entries table

dramaticlly opened a new pull request, #7539:
URL: https://github.com/apache/iceberg/pull/7539

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#issuecomment-1548562443

   Hi, @dramaticlly  as I didnt want to block the change, I had some follow up that I made in #7613 including what we discussed in https://github.com/apache/iceberg/pull/7539#discussion_r1192497611.
   
   I think it makes the code between BaseFilesTable and BaseEntriesTable even  more similar and with more re-use, if you wanted to take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192497611


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {
+      Schema manifestEntrySchema = ManifestEntry.wrapFileSchema(fileProjection.asStruct());
+      return StructProjection.create(manifestEntrySchema, projectedSchema);
+    }
+
+    private CloseableIterable<? extends ManifestEntry<? extends ContentFile<?>>> entries(
+        Schema newFileProjection) {
+      return ManifestFiles.open(manifest, io, specsById).project(newFileProjection).entries();
+    }
+
+    private StructLike withReadableMetrics(
+        StructProjection structProjection,
+        ManifestEntry<? extends ContentFile<?>> entry,
+        Types.NestedField readableMetricsField) {
+      int projectionColumnCount = projection.columns().size();
+      int metricsPosition = projection.columns().indexOf(readableMetricsField);
+
+      StructProjection entryStruct = structProjection.wrap((StructLike) entry);
+
+      StructType projectedMetricType =
+          projection.findField(MetricsUtil.READABLE_METRICS).type().asStructType();
+      MetricsUtil.ReadableMetricsStruct readableMetrics =
+          MetricsUtil.readableMetricsStruct(dataTableSchema, entry.file(), projectedMetricType);
+
+      return new ManifestEntryStructWithMetrics(
+          projectionColumnCount, metricsPosition, entryStruct, readableMetrics);
     }
 
     @Override
     public Iterable<FileScanTask> split(long splitSize) {
       return ImmutableList.of(this); // don't split
     }
   }
+
+  static class ManifestEntryStructWithMetrics implements StructLike {

Review Comment:
   yeah that's a good idea. I think we can extract the common into separate StructLikeWithMetrics class to be used in both baseEntries and baseFiles



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191920873


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used

Review Comment:
   Hey I'm sorry I missed this, can we remove this line "Remove virtual columns from the file projection and."  
   
   I guess this comment was rather for the other operation, but I dont think we need the comment anymore now that its on a self explanatory method below (removeReadableMetrics)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191925271


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {

Review Comment:
   Quesiton, while we make one more change, what do you think of renaming this method to structProjection()?  Realize it's from the original comment but Im not 100% sure what that was referring to.  Looks like it is just a standard projection of ManifestEntry to me?  I may be missing something though



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {

Review Comment:
   Quesiton, while we make one more change, what do you think of renaming this method to structProjection()?  Realize it's from the original comment but Im not 100% sure what that was referring to, what is non-readable?  Looks like it is just a standard projection of ManifestEntry to me?  I may be missing something though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192587961


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = structProjection(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = structProjection(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Ensure that the underlying metrics used to create those columns are part of the file

Review Comment:
   Nit: looks like 'those columns' is unnecessarily ambiguous and we can maybe just do , to avoid repetition.  
   ```
   /**
    *  Ensure that the underlying metrics used to populate readable metrics column are part of the file projection.
    **/
    ```
    Its no big deal though if you don't  get to it, approved the pr and will merge if no further comment
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191925271


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {

Review Comment:
   Quesiton, while we make one more change, what do you think of renaming this method to structProjection()?  Realize it's from the original comment but Im not 100% sure what that was referring to.  Looks like it is just a standard projection of ManifestEntry to me?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192584099


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {

Review Comment:
   OK, yea to avoid confusion with readable metrics, it seems better to rename to this unless we can think if a more appropriate one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #7539:
URL: https://github.com/apache/iceberg/pull/7539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191811403


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,107 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileSchema),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());

Review Comment:
   thank you @szehon-ho . that does look much easier to follow the logic. I changed a bit since checkStyle fail on some method variables shadow class variable, but mostly followed your approach and appreciate your help!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192587961


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = structProjection(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = structProjection(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Ensure that the underlying metrics used to create those columns are part of the file

Review Comment:
   Nit: looks like 'those columns' is unnecessarily ambiguous and we can maybe just do , to avoid repetition.  
   
   /**
    * Ensure that the underlying metrics used to populate readable metrics column are part of the file projection.
    */
    
    Its no big deal though if you don't  get to it, approved the pr and will merge if no further comment
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1190476171


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {

Review Comment:
   Can we call helper method like entries() as you wrote below?  We can have entries() return CloseableIterable<GenericManifestEntry<? extends ContentFile<?>>>, and then do another CloseableIterable cast here from GenericManifestEntry to StructLike, which should be safe.



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.read(manifest, io, specsById).project(fileSchema).entries(),
+                  entry -> (GenericManifestEntry<DataFile>) entry);
+        } else {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.readDeleteManifest(manifest, io, specsById)
+                      .project(fileSchema)
+                      .entries(),
+                  entry -> (GenericManifestEntry<DeleteFile>) entry);
+        }
+
+        // Project non-readable fields
+        Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
+        StructProjection structProjection = StructProjection.create(readSchema, this.projection);
+        return CloseableIterable.transform(prunedRows, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+        int metricsPosition = projection.columns().indexOf(readableMetricsField);
+        // projection minus virtual column such as readableMetrics

Review Comment:
   Nit: I would remove this comment , I think the variable name undereneath is pretty clear , and there are no other virtual columns at the moment.



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.read(manifest, io, specsById).project(fileSchema).entries(),
+                  entry -> (GenericManifestEntry<DataFile>) entry);
+        } else {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.readDeleteManifest(manifest, io, specsById)
+                      .project(fileSchema)
+                      .entries(),
+                  entry -> (GenericManifestEntry<DeleteFile>) entry);
+        }
+
+        // Project non-readable fields
+        Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());

Review Comment:
   I feel code can be re-used here as well.  Can we make a method like:
   
       private StructProjection projectNonReadable(Schema fileSchema, Schema projection, GenericManifestEntry<? extends ContentFile<?>> entry) 
   
   and call it here and in the else case (line 202)



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;

Review Comment:
   No need to declare this outside if , right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191714292


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.read(manifest, io, specsById).project(fileSchema).entries(),
+                  entry -> (GenericManifestEntry<DataFile>) entry);
+        } else {
+          prunedRows =
+              CloseableIterable.transform(
+                  ManifestFiles.readDeleteManifest(manifest, io, specsById)
+                      .project(fileSchema)
+                      .entries(),
+                  entry -> (GenericManifestEntry<DeleteFile>) entry);
+        }
+
+        // Project non-readable fields
+        Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());

Review Comment:
   yep, definitely



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,128 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
       // Project data-file fields
       CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+      if (readableMetricsField == null) {
+        if (manifest.content() == ManifestContent.DATA) {

Review Comment:
   yeah make sense! Also realized from other PR that this can be simplified further with 
   
   `ManifestFiles.open(manifest, io, specsById)` so it's agnostic read to data and delete manifests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#issuecomment-1548254277

   Merged, thanks @dramaticlly for the change!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192615543


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = structProjection(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = structProjection(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Ensure that the underlying metrics used to create those columns are part of the file

Review Comment:
   thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192504063


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {

Review Comment:
   agreed, I think before this change the comment seem to suggest 
   - entry only field (like snapshot_id, status, sequence_number) are readable directly from ManifestEntry 
   - and wrap of files schema and data file as struct considered as non-readable 
   But after our refactoring, this method is only about wrap file schema into entry schema, and project entry schema to necessary columns. So I think `structProjection` sounds good to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191922266


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used
+     * to create those columns are part of the file projection
+     *
+     * @return file projection with required columns to read readable metrics
+     */
+    private Schema requiredFileProjection() {
+      Schema projectionForReadableMetrics =
+          new Schema(
+              MetricsUtil.READABLE_METRIC_COLS.stream()
+                  .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
+                  .collect(Collectors.toList()));
+      return TypeUtil.join(fileProjection, projectionForReadableMetrics);
+    }
 
-      // Project non-readable fields
-      Schema readSchema = ManifestEntry.wrapFileSchema(fileSchema.asStruct());
-      StructProjection projection = StructProjection.create(readSchema, schema);
-      return CloseableIterable.transform(prunedRows, projection::wrap);
+    private Schema removeReadableMetrics(Types.NestedField readableMetricsField) {
+      Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
+      return TypeUtil.selectNot(projection, readableMetricsIds);
+    }
+
+    private StructProjection projectNonReadable(Schema projectedSchema) {
+      Schema manifestEntrySchema = ManifestEntry.wrapFileSchema(fileProjection.asStruct());
+      return StructProjection.create(manifestEntrySchema, projectedSchema);
+    }
+
+    private CloseableIterable<? extends ManifestEntry<? extends ContentFile<?>>> entries(
+        Schema newFileProjection) {
+      return ManifestFiles.open(manifest, io, specsById).project(newFileProjection).entries();
+    }
+
+    private StructLike withReadableMetrics(
+        StructProjection structProjection,
+        ManifestEntry<? extends ContentFile<?>> entry,
+        Types.NestedField readableMetricsField) {
+      int projectionColumnCount = projection.columns().size();
+      int metricsPosition = projection.columns().indexOf(readableMetricsField);
+
+      StructProjection entryStruct = structProjection.wrap((StructLike) entry);
+
+      StructType projectedMetricType =
+          projection.findField(MetricsUtil.READABLE_METRICS).type().asStructType();
+      MetricsUtil.ReadableMetricsStruct readableMetrics =
+          MetricsUtil.readableMetricsStruct(dataTableSchema, entry.file(), projectedMetricType);
+
+      return new ManifestEntryStructWithMetrics(
+          projectionColumnCount, metricsPosition, entryStruct, readableMetrics);
     }
 
     @Override
     public Iterable<FileScanTask> split(long splitSize) {
       return ImmutableList.of(this); // don't split
     }
   }
+
+  static class ManifestEntryStructWithMetrics implements StructLike {

Review Comment:
   For another pr, I think we can combine this and BaseFilesTable's similar class (StructProjection is instance of StructLike)



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = projectNonReadable(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Remove virtual columns from the file projection and ensure that the underlying metrics used

Review Comment:
   Hey I'm sorry I missed this, can we remove this line "Remove virtual columns from the file projection and."  
   
   I guess this comment was rather for the other operation, but I dont think we need the comment anymore now that its on a self explanatory method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191798950


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,107 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileSchema),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());

Review Comment:
   I had some idea to make this method even a bit more clear, again by making some private method that better describe these steps:
   
   ```
       @Override
       public CloseableIterable<StructLike> rows() {
         Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
   
         if (readableMetricsField == null) {
           CloseableIterable<StructLike> entryAsStruct =
               CloseableIterable.transform(
                   entries(fileProjection),
                   entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
   
           StructProjection structProjection = projectNonReadable(projection);
           return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
         } else {
           Schema requiredFileProjection = requiredFileProjection(fileProjection);
           Schema actualProjection = removeReadableMetrics(projection, readableMetricsField);
   
           StructProjection structProjection = projectNonReadable(actualProjection);
           return CloseableIterable.transform(
               entries(requiredFileProjection),
               entry ->
                   withReadableMetrics(structProjection, entry, readableMetricsField));
         }
       }
   ```
   
   
   Now its a bit more symmetrical.  
   
   We would need following methods:
   ```
       /**
        * Remove virtual columns from the file projection and ensure that the underlying metrics
        * used to create those columns are part of the file projection
        * @return file projection with required columns to read readable metrics
        */
       private Schema requiredFileProjection(Schema fileProjection) {
         Schema projectionForReadableMetrics =
             new Schema(
                 MetricsUtil.READABLE_METRIC_COLS.stream()
                     .map(MetricsUtil.ReadableMetricColDefinition::originalCol)
                     .collect(Collectors.toList()));
         return TypeUtil.join(fileProjection, projectionForReadableMetrics);
       }
   
       private Schema removeReadableMetrics(Schema projection, Types.NestedField readableMetricsField) {
         Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());
         return TypeUtil.selectNot(projection, readableMetricsIds);
       }
   ```
   
   I also change withReadableMetrics a bit for this to work:
   
   ```
       private StructLike withReadableMetrics(
           StructProjection structProjection,
           ManifestEntry<? extends ContentFile<?>> entry,
           Types.NestedField readableMetricsField) {
         int projectionColumnCount = projection.columns().size();
         int metricsPosition = projection.columns().indexOf(readableMetricsField);
   
         StructProjection entryStruct = structProjection.wrap((StructLike) entry);
   
         StructType projectedMetricType =
             projection.findField(MetricsUtil.READABLE_METRICS).type().asStructType();
         MetricsUtil.ReadableMetricsStruct readableMetrics =
             MetricsUtil.readableMetricsStruct(dataTableSchema, entry.file(), projectedMetricType);
         
         return new ManifestEntryStructWithMetrics(
             projectionColumnCount, metricsPosition, entryStruct, readableMetrics);
       }
   ```



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -92,26 +95,28 @@ static CloseableIterable<FileScanTask> planFiles(
   }
 
   static class ManifestReadTask extends BaseFileScanTask implements DataTask {
-    private final Schema schema;
+    private final Schema projection;
     private final Schema fileSchema;
+    private final Schema dataTableSchema;
     private final FileIO io;
     private final ManifestFile manifest;
     private final Map<Integer, PartitionSpec> specsById;
 
     ManifestReadTask(
         Table table,
         ManifestFile manifest,
-        Schema schema,
+        Schema projection,
         String schemaString,
         String specString,
         ResidualEvaluator residuals) {
       super(DataFiles.fromManifest(manifest), null, schemaString, specString, residuals);
-      this.schema = schema;
+      this.projection = projection;
       this.io = table.io();
       this.manifest = manifest;
       this.specsById = Maps.newHashMap(table.specs());
+      this.dataTableSchema = table.schema();
 
-      Type fileProjection = schema.findType("data_file");
+      Type fileProjection = projection.findType("data_file");

Review Comment:
   Can we rename to be more consistent with the schema => projection change?  
   
   fileProjection => fileProjectionType, fileSchema => fileProjection



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1191811403


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,107 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileSchema),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = projectNonReadable(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Set<Integer> readableMetricsIds = TypeUtil.getProjectedIds(readableMetricsField.type());

Review Comment:
   thank you @szehon-ho . that does look much easier to follow the logic. I changed a bit since checkStyle fail on some method variable shadow class variable but mostly followed your approach and appreciate your help!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192587961


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = structProjection(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = structProjection(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Ensure that the underlying metrics used to create those columns are part of the file

Review Comment:
   Nit: looks like 'those columns' is unnecessarily ambiguous and we can maybe just do , to avoid repetition with the return javadoc below
   ```
   /**
    *  Ensure that the underlying metrics used to populate readable metrics column are part of the file projection.
    **/
    ```
    Its no big deal though if you don't  get to it, approved the pr and will merge if no further comment
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7539: Implement ReadableMetrics for Entries table

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#discussion_r1192611572


##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -125,31 +130,120 @@ ManifestFile manifest() {
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      // Project data-file fields
-      CloseableIterable<StructLike> prunedRows;
-      if (manifest.content() == ManifestContent.DATA) {
-        prunedRows =
+      Types.NestedField readableMetricsField = projection.findField(MetricsUtil.READABLE_METRICS);
+
+      if (readableMetricsField == null) {
+        CloseableIterable<StructLike> entryAsStruct =
             CloseableIterable.transform(
-                ManifestFiles.read(manifest, io).project(fileSchema).entries(),
-                file -> (GenericManifestEntry<DataFile>) file);
+                entries(fileProjection),
+                entry -> (GenericManifestEntry<? extends ContentFile<?>>) entry);
+
+        StructProjection structProjection = structProjection(projection);
+        return CloseableIterable.transform(entryAsStruct, structProjection::wrap);
       } else {
-        prunedRows =
-            CloseableIterable.transform(
-                ManifestFiles.readDeleteManifest(manifest, io, specsById)
-                    .project(fileSchema)
-                    .entries(),
-                file -> (GenericManifestEntry<DeleteFile>) file);
+        Schema requiredFileProjection = requiredFileProjection();
+        Schema actualProjection = removeReadableMetrics(readableMetricsField);
+        StructProjection structProjection = structProjection(actualProjection);
+
+        return CloseableIterable.transform(
+            entries(requiredFileProjection),
+            entry -> withReadableMetrics(structProjection, entry, readableMetricsField));
       }
+    }
+
+    /**
+     * Ensure that the underlying metrics used to create those columns are part of the file

Review Comment:
   thank you and updated per your suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bwliu62 commented on pull request #7539: Implement ReadableMetrics for Entries table

Posted by "bwliu62 (via GitHub)" <gi...@apache.org>.
bwliu62 commented on PR #7539:
URL: https://github.com/apache/iceberg/pull/7539#issuecomment-1614239870

   Hello, I want to get upper/lower bounds from the manifest file like:
   
   `
   val icebergTalbe = catalog.loadTable(tableIdentifier)
   val filter = Expressions.greaterThan("id", 10)
   val scan = icebergTable.newScan().filter(filter)
   scan.includeColumnStats().planFiles().forEach { task =>
       val upperBounds = task.file().upperBounds()
       val lowerBounds = task.file().lowerBounds()
       log.info(s"upperBounds: $upperBounds")
       log.info(s"upperBounds: $upperBounds")
   }
   `
   The log will be byteBuffer, I see @szehon-ho  pr for a readable metrics, I am wondering do we have a java api we can use, or I have to use DDL? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org