You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/05 22:39:00 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #5207: Core: Add length arg to FileIO.newInputFile

rdblue opened a new pull request, #5207:
URL: https://github.com/apache/iceberg/pull/5207

   While investigating #5206, I found that a lot of time is spent in `InputFile#getLength()` for manifest files during job planning. This isn't necessary because `ManifestFile` stored in a manifest list has the manifest file length. This PR avoids round-trip calls to get object lengths by passing the length into `FileIO.newInputFile`. This implements the improvement for S3, GCP, and Hadoop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5207: Core: Add length arg to FileIO.newInputFile

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5207:
URL: https://github.com/apache/iceberg/pull/5207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5207: Core: Add length arg to FileIO.newInputFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5207:
URL: https://github.com/apache/iceberg/pull/5207#discussion_r914272390


##########
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSOutputFile.java:
##########
@@ -67,6 +67,6 @@ public PositionOutputStream createOrOverwrite() {
 
   @Override
   public InputFile toInputFile() {
-    return new GCSInputFile(storage(), blobId(), gcpProperties(), metrics());
+    return new GCSInputFile(storage(), blobId(), null, gcpProperties(), metrics());

Review Comment:
   I just didn't want to change very much here. We probably could.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5207: Core: Add length arg to FileIO.newInputFile

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5207:
URL: https://github.com/apache/iceberg/pull/5207#discussion_r914268894


##########
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSOutputFile.java:
##########
@@ -67,6 +67,6 @@ public PositionOutputStream createOrOverwrite() {
 
   @Override
   public InputFile toInputFile() {
-    return new GCSInputFile(storage(), blobId(), gcpProperties(), metrics());
+    return new GCSInputFile(storage(), blobId(), null, gcpProperties(), metrics());

Review Comment:
   Could we just use `GCSInputFile.fromLocation(...)` to avoid calling the constructor with null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5207: Core: Add length arg to FileIO.newInputFile

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5207:
URL: https://github.com/apache/iceberg/pull/5207#discussion_r914269138


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java:
##########
@@ -70,7 +70,7 @@ public PositionOutputStream createOrOverwrite() {
 
   @Override
   public InputFile toInputFile() {
-    return new S3InputFile(client(), uri(), awsProperties(), metrics());
+    return new S3InputFile(client(), uri(), null, awsProperties(), metrics());

Review Comment:
   Similar to GCS, could we just call `S3InputFile.fromLocation(...)` to avoid calling the constructor with null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org