You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/05 22:35:29 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #5206: Core: Defer reading Avro metadata until ManifestFile is read

rdblue opened a new pull request, #5206:
URL: https://github.com/apache/iceberg/pull/5206

   This improves job planning performance by moving `ManifestFiles.read` setup into the `ParallelIterator` that is used to plan tasks. `ParallelIterator` accepts an `Iterable` of `CloseableIterable`. The outer iterable is iterated over to submit tasks that run in the worker pool. In `ManifestGroup`, the `Iterable` that was returned would call `ManifestFiles.read` to prepare the inner iterable, but the `ManifestReader` needs to read Avro file metadata and will open a stream. That initial file open was running in the consuming thread as tasks were submitted, instead of in the worker pool.
   
   This updates `ManifestGroup` to use a custom `Iterable` that defers calling `ManifestFiles.read` until the inner iterable is used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5206: Core: Defer reading Avro metadata until ManifestFile is read

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5206:
URL: https://github.com/apache/iceberg/pull/5206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lirui-apache commented on pull request #5206: Core: Defer reading Avro metadata until ManifestFile is read

Posted by GitBox <gi...@apache.org>.
lirui-apache commented on PR #5206:
URL: https://github.com/apache/iceberg/pull/5206#issuecomment-1323119574

   We encountered authorization failures reading manifest files after applied this PR, and thought it might be related. Since the worker pool in use is by default a global static pool, the threads in this pool may not be able to see the correct UGI submitting the operation. And hit permission denied errors when trying to open the stream. I wonder whether we should call `doAs` in the pool, or whether we should let different users use separate pools?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org