You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Chen Luo (Jira)" <ji...@apache.org> on 2020/09/28 18:33:00 UTC

[jira] [Created] (ASTERIXDB-2784) Join memory requirement for large objects

Chen Luo created ASTERIXDB-2784:
-----------------------------------

             Summary: Join memory requirement for large objects
                 Key: ASTERIXDB-2784
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: COMP - Compiler, RT - Runtime
            Reporter: Chen Luo


Currently the compiler assumes the minimum number of join frames is 5 [1]. However, this does not guarantee a join will always succeed in case of large objects. The actual join memory requirement is actually MAX(5, #partitions * #large object size). The reason is that in the spill policy [2], we only spill a partition if it hasn't been spilled before. As a result, when we are writing to an empty partition, it is possible that each of other partitions has one large object (which could be larger than the frame size) but no partition can be spilled. Thus, the join memory requirement becomes #partitions * #large object size in this case.

[1] [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).]

[2] https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55



--
This message was sent by Atlassian Jira
(v8.3.4#803005)