You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2017/04/21 02:21:04 UTC

[jira] [Updated] (SYSTEMML-1548) Performance ultra-sparse matrix read

     [ https://issues.apache.org/jira/browse/SYSTEMML-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm updated SYSTEMML-1548:
-------------------------------------
    Description: 
Reading ultra-sparse matrices shows for certain data sizes and memory configurations poor performance due to garbage collection overheads.

In detail, this task covers two scenarios that will be addressed independently:

1) Large heap: In case of large heaps, the problem are temporarily deserialized sparse blocks which are not reused due to inefficient resent, leading to lots of garbage and hence high cost for full garbage collection. This will be addressed by using our CSR sparse blocks for ultra-sparse blocks because CSR has smaller memory footprint and allows efficient reset.

2) Small heap: In case of small heaps not the temporary blocks but the memory overhead of the target sparse matrix becomes the bottleneck. This is due to a relatively large memory overhead per sparse row which is not amortized if a rows has just one or very few non-zeros. This will be addressed via a modification of the MCSR representation for ultra-sparse matrices. Note that we cannot use CSR or COO here because we want to support efficient multi-threaded incremental construction.

  was:Reading ultra-sparse matrices shows for certain data sizes and memory configurations poor performance due to garbage collection overheads.


> Performance ultra-sparse matrix read
> ------------------------------------
>
>                 Key: SYSTEMML-1548
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1548
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>
> Reading ultra-sparse matrices shows for certain data sizes and memory configurations poor performance due to garbage collection overheads.
> In detail, this task covers two scenarios that will be addressed independently:
> 1) Large heap: In case of large heaps, the problem are temporarily deserialized sparse blocks which are not reused due to inefficient resent, leading to lots of garbage and hence high cost for full garbage collection. This will be addressed by using our CSR sparse blocks for ultra-sparse blocks because CSR has smaller memory footprint and allows efficient reset.
> 2) Small heap: In case of small heaps not the temporary blocks but the memory overhead of the target sparse matrix becomes the bottleneck. This is due to a relatively large memory overhead per sparse row which is not amortized if a rows has just one or very few non-zeros. This will be addressed via a modification of the MCSR representation for ultra-sparse matrices. Note that we cannot use CSR or COO here because we want to support efficient multi-threaded incremental construction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)