You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/25 07:30:51 UTC
[GitHub] [iceberg] nastra opened a new pull request, #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
nastra opened a new pull request, #6047:
URL: https://github.com/apache/iceberg/pull/6047
The motivation behind this change is that the projected schema might get quite big and contain information such as doc comments, which make it quite hard to read/consume. This change makes sure that we only include the minimal set of information from a schema
(schemaId/fieldIds/fieldNames).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue merged pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6047:
URL: https://github.com/apache/iceberg/pull/6047
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004617156
##########
core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java:
##########
@@ -71,8 +71,24 @@ public static void toJsonWithoutStartEnd(ScanReport scanReport, JsonGenerator ge
gen.writeFieldName(FILTER);
ExpressionParser.toJson(scanReport.filter(), gen);
- gen.writeFieldName(PROJECTION);
- SchemaParser.toJson(scanReport.projection(), gen);
+ gen.writeNumberField(SCHEMA_ID, scanReport.schemaId());
+
+ // fieldIds & fieldNames can be empty with an empty projection
+ if (!scanReport.projectedFieldIds().isEmpty()) {
Review Comment:
Can we write an empty list if the projected IDs are empty? I think that way we know explicitly that nothing was projected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004620626
##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
planningDuration.stop();
ScanReport scanReport =
ImmutableScanReport.builder()
- .projection(schema())
+ .schemaId(schema().schemaId())
+ .projectedFieldIds(
+ schema().columns().stream()
+ .map(NestedField::fieldId)
+ .collect(Collectors.toList()))
+ .projectedFieldNames(
+ schema().columns().stream()
+ .map(NestedField::name)
+ .collect(Collectors.toList()))
Review Comment:
This should use the set of projected IDs and look up names using `Schema.findColumnName(Integer)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#issuecomment-1294141919
Looks good. Thanks, @nastra!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1005272262
##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
planningDuration.stop();
ScanReport scanReport =
ImmutableScanReport.builder()
- .projection(schema())
+ .schemaId(schema().schemaId())
+ .projectedFieldIds(
+ schema().columns().stream()
+ .map(NestedField::fieldId)
+ .collect(Collectors.toList()))
Review Comment:
done
##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
planningDuration.stop();
ScanReport scanReport =
ImmutableScanReport.builder()
- .projection(schema())
+ .schemaId(schema().schemaId())
+ .projectedFieldIds(
+ schema().columns().stream()
+ .map(NestedField::fieldId)
+ .collect(Collectors.toList()))
+ .projectedFieldNames(
+ schema().columns().stream()
+ .map(NestedField::name)
+ .collect(Collectors.toList()))
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004646307
##########
core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java:
##########
@@ -71,8 +71,24 @@ public static void toJsonWithoutStartEnd(ScanReport scanReport, JsonGenerator ge
gen.writeFieldName(FILTER);
ExpressionParser.toJson(scanReport.filter(), gen);
- gen.writeFieldName(PROJECTION);
- SchemaParser.toJson(scanReport.projection(), gen);
+ gen.writeNumberField(SCHEMA_ID, scanReport.schemaId());
+
+ // fieldIds & fieldNames can be empty with an empty projection
+ if (!scanReport.projectedFieldIds().isEmpty()) {
Review Comment:
sure, I'll adjust accordingly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a diff in pull request #6047: Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6047:
URL: https://github.com/apache/iceberg/pull/6047#discussion_r1004619505
##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -135,7 +137,15 @@ public CloseableIterable<FileScanTask> planFiles() {
planningDuration.stop();
ScanReport scanReport =
ImmutableScanReport.builder()
- .projection(schema())
+ .schemaId(schema().schemaId())
+ .projectedFieldIds(
+ schema().columns().stream()
+ .map(NestedField::fieldId)
+ .collect(Collectors.toList()))
Review Comment:
This needs to also include the nested IDs by calling `TypeUtil.getProjectedIds`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org