You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/09/24 04:48:20 UTC

[jira] [Resolved] (SYSTEMML-946) OOM on spark dataframe-matrix / csv-matrix conversion

     [ https://issues.apache.org/jira/browse/SYSTEMML-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm resolved SYSTEMML-946.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: SystemML 0.11

> OOM on spark dataframe-matrix / csv-matrix conversion
> -----------------------------------------------------
>
>                 Key: SYSTEMML-946
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-946
>             Project: SystemML
>          Issue Type: Bug
>          Components: Runtime
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.11
>
>         Attachments: mnist_lenet.dml
>
>
> The decision on dense/sparse block allocation in our dataframeToBinaryBlock and csvToBinaryBlock data converters is purely based on the sparsity. This works very well for the common case of tall & skinny matrices. However, for scenarios with dense data but huge number of columns a single partition will rarely have 1000 rows to fill an entire row of blocks. This leads to unnecessary allocation and dense-sparse conversion as well as potential out-of-memory errors because the temporary memory requirement can be up to 1000x larger than the input partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)