You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/10 10:07:10 UTC
[jira] [Commented] (TAJO-1271) Improve memory usage of Hash-shuffle
[ https://issues.apache.org/jira/browse/TAJO-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998283#comment-14998283 ]
ASF GitHub Bot commented on TAJO-1271:
--------------------------------------
Github user jinossy commented on the pull request:
https://github.com/apache/tajo/pull/837#issuecomment-155363149
I add `HASH_SHUFFLE_BUFFER_SIZE`. if total buffer capacity are required more than `HASH_SHUFFLE_BUFFER_SIZE * BUFFER_THRESHOLD_FACTOR`, all partitions are flushed and the buffers are released
This PR is ready to review.
Thanks.
> Improve memory usage of Hash-shuffle
> ------------------------------------
>
> Key: TAJO-1271
> URL: https://issues.apache.org/jira/browse/TAJO-1271
> Project: Tajo
> Issue Type: Improvement
> Components: Data Shuffle
> Affects Versions: 0.9.0
> Reporter: Jinho Kim
> Assignee: Jinho Kim
>
> Currently, Hash-shuffle keeps intermediate file appender and tuple list in memory and the required memory will be in proportion to the input size
> If input size is 10GB, the hash-join key partition count will be 78125 (10TB / 128MB) and the required memory is 10GB (78125 * 128KB).
> We should improve the hash-shuffle file writer as following :
> * Separate the buffer from the file writer
> * Keep the tuples in off-heap buffer and reuse the buffer
> * Flush the buffers, if total buffer capacity is required more than maxBufferSize
> * Write the partition files asynchronously
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)