You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/02 15:53:02 UTC

[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #2780: Add partition files to SparkBatchScan description

RussellSpitzer edited a comment on pull request #2780:
URL: https://github.com/apache/iceberg/pull/2780#issuecomment-873091125


   This is adding every file touched to the description which is probably too much (since this could be in the thousands of files). One of the big issues here is this description will sit on the Spark Driver for a while so it's a pretty large chunk of memory. Maybe just a summary would be sufficient? Reading Z Manifests - X files from Y partitions?
   
   I think it's a little dangerous to run planning in description since that may be called even when the plan isn't executed and since it becomes caching in this implementation it may have issues if a user runs "Explain" then adds more clauses or something. For a long planned query that also means calling "explain" ends up being a rather expensive operation.
   
   That said I would really like to have more information visible so let's keep thinking on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org