You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/25 07:30:51 UTC

[GitHub] [iceberg] nastra opened a new pull request, #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

nastra opened a new pull request, #6047:
URL: https://github.com/apache/iceberg/pull/6047

   The motivation behind this change is that the projected schema might get quite big and contain information such as doc comments, which make it quite hard to read/consume. This change makes sure that we only include the minimal set of information from a schema
   (schemaId/fieldIds/fieldNames).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6047:
URL: https://github.com/apache/iceberg/pull/6047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004617156


##########
core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java:
##########
@@ -71,8 +71,24 @@ public static void toJsonWithoutStartEnd(ScanReport scanReport, JsonGenerator ge
     gen.writeFieldName(FILTER);
     ExpressionParser.toJson(scanReport.filter(), gen);
 
-    gen.writeFieldName(PROJECTION);
-    SchemaParser.toJson(scanReport.projection(), gen);
+    gen.writeNumberField(SCHEMA_ID, scanReport.schemaId());
+
+    // fieldIds & fieldNames can be empty with an empty projection
+    if (!scanReport.projectedFieldIds().isEmpty()) {

Review Comment:
   Can we write an empty list if the projected IDs are empty? I think that way we know explicitly that nothing was projected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004620626


##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
             planningDuration.stop();
             ScanReport scanReport =
                 ImmutableScanReport.builder()
-                    .projection(schema())
+                    .schemaId(schema().schemaId())
+                    .projectedFieldIds(
+                        schema().columns().stream()
+                            .map(NestedField::fieldId)
+                            .collect(Collectors.toList()))
+                    .projectedFieldNames(
+                        schema().columns().stream()
+                            .map(NestedField::name)
+                            .collect(Collectors.toList()))

Review Comment:
   This should use the set of projected IDs and look up names using `Schema.findColumnName(Integer)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#issuecomment-1294141919

   Looks good. Thanks, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1005272262


##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
             planningDuration.stop();
             ScanReport scanReport =
                 ImmutableScanReport.builder()
-                    .projection(schema())
+                    .schemaId(schema().schemaId())
+                    .projectedFieldIds(
+                        schema().columns().stream()
+                            .map(NestedField::fieldId)
+                            .collect(Collectors.toList()))

Review Comment:
   done



##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
             planningDuration.stop();
             ScanReport scanReport =
                 ImmutableScanReport.builder()
-                    .projection(schema())
+                    .schemaId(schema().schemaId())
+                    .projectedFieldIds(
+                        schema().columns().stream()
+                            .map(NestedField::fieldId)
+                            .collect(Collectors.toList()))
+                    .projectedFieldNames(
+                        schema().columns().stream()
+                            .map(NestedField::name)
+                            .collect(Collectors.toList()))

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004646307


##########
core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java:
##########
@@ -71,8 +71,24 @@ public static void toJsonWithoutStartEnd(ScanReport scanReport, JsonGenerator ge
     gen.writeFieldName(FILTER);
     ExpressionParser.toJson(scanReport.filter(), gen);
 
-    gen.writeFieldName(PROJECTION);
-    SchemaParser.toJson(scanReport.projection(), gen);
+    gen.writeNumberField(SCHEMA_ID, scanReport.schemaId());
+
+    // fieldIds & fieldNames can be empty with an empty projection
+    if (!scanReport.projectedFieldIds().isEmpty()) {

Review Comment:
   sure, I'll adjust accordingly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004619505


##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
             planningDuration.stop();
             ScanReport scanReport =
                 ImmutableScanReport.builder()
-                    .projection(schema())
+                    .schemaId(schema().schemaId())
+                    .projectedFieldIds(
+                        schema().columns().stream()
+                            .map(NestedField::fieldId)
+                            .collect(Collectors.toList()))

Review Comment:
   This needs to also include the nested IDs by calling `TypeUtil.getProjectedIds`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org