You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/20 02:32:44 UTC

[GitHub] [iceberg] lirui-apache opened a new issue, #4822: ParallelIterator is using too much memory

lirui-apache opened a new issue, #4822:
URL: https://github.com/apache/iceberg/issues/4822

   ParallelIterator launches multiple threads to read manifest files and store the entries in an unbounded queue. For large tables (~ 2 million data files in our example), planning files can easily lead to full GC or OOM, especially when column stats is included.
   We verified that switching to a bounded blocking queue can solve the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #4822: ParallelIterator is using too much memory

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #4822:
URL: https://github.com/apache/iceberg/issues/4822#issuecomment-1136512731

   Looks like same issue as reported : https://github.com/apache/iceberg/pull/4596


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #4822: ParallelIterator is using too much memory

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #4822: ParallelIterator is using too much memory
URL: https://github.com/apache/iceberg/issues/4822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #4822: ParallelIterator is using too much memory

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #4822:
URL: https://github.com/apache/iceberg/issues/4822#issuecomment-1338428008

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lirui-apache commented on issue #4822: ParallelIterator is using too much memory

Posted by GitBox <gi...@apache.org>.
lirui-apache commented on issue #4822:
URL: https://github.com/apache/iceberg/issues/4822#issuecomment-1132396666

   By analyzing the heap dump, we also noticed some memory inefficiency regarding how column stats is stored. E.g. object overhead ratio is high because there's a map for each single metric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #4822: ParallelIterator is using too much memory

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #4822:
URL: https://github.com/apache/iceberg/issues/4822#issuecomment-1321286939

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org