You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/02 10:18:20 UTC

[GitHub] [iceberg] openinx commented on pull request #4596: Use bounded queue to avoid consuming too much memory

openinx commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1144693457

Actually, I'd prefer to give my +1 to the bound queue solution. Because:

1. If there is an existing table which just has included too many manifests (and some of them just have many manifest entries), then the approaches will just don't work ( such as merging metadata so that those manifest size is an idea size, tuning the thread size etc). We can do nothing in the real production environment unless we increment the heap size of spark driver or trino coordinator. But what if we are not allowed to restart the spark driver & trino coordinator because of the other serving querying jobs ?

2. Does the blocking queue approach introduce any substantial performance bottleneck ? If we think the default blocking queue size is a bit small, then we can increase this default blocking queue size to 10000. I think most of the cases won't be effected by the default blocking queue size, unless we have an extremely large table with so many manifest entries. But in the case it seems to be easily OOM if we don't have any limited queue size.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org