You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2017/08/11 19:26:00 UTC

[jira] [Created] (SYSTEMML-1837) Unary aggregate w/ corrections output to large physical blocks

Matthias Boehm created SYSTEMML-1837:
----------------------------------------

             Summary: Unary aggregate w/ corrections output to large physical blocks
                 Key: SYSTEMML-1837
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm


Many unary aggregate operations store corrections in additional columns or rows. For example, {{rowSums(X)}} uses a two-column output to store sums and corrections. In CP, we drop these corrections immediately after the operations, while in MR and Spark these corrections are dropped after final aggregation. The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does not actually drop the correction but simply shifts all values in the right starting positions. Hence, the physical output is actually larger than what the memory estimates represent. This leads to unnecessary large memory consumption during subsequent operations and in the buffer pool, which can lead to OOMs. This task aims to fix {{MatrixBlock::dropLastRowsOrColums}}. 

In a subsequent task, we could also modify all unary aggregates to never allocate the multi-column/row output when executed in CP. However, this requires custom code paths for the different backends. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)