You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@singa.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2015/08/12 11:49:46 UTC

[jira] [Commented] (SINGA-47) Fix a bug in data layers that leads to out-of-memory when group size is too large

    [ https://issues.apache.org/jira/browse/SINGA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693236#comment-14693236 ] 

ASF subversion and git services commented on SINGA-47:
------------------------------------------------------

Commit 7a61a687c2ceb4fc7e05c2d3bbd9817e8ba59e3f in incubator-singa's branch refs/heads/master from Wei Wang
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=7a61a68 ]

SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large

The bug is fixed by closing the data source (e.g., lmdb or datashard) after reading a sample record in the Setup function.
The data source would cacahe memory which eat up all memory if there are many data layers.


> Fix a bug in data layers that leads to out-of-memory when group size is too large 
> ----------------------------------------------------------------------------------
>
>                 Key: SINGA-47
>                 URL: https://issues.apache.org/jira/browse/SINGA-47
>             Project: Singa
>          Issue Type: Bug
>            Reporter: wangwei
>
> The Setup function of a data layer opens the database (e.g., DataShard or LMDB) and reads a sample record. The sample record is necessary for setting upper layers' data shape. Every data layer's Setup function is called when SINGA creates the NeuralNet object. If there the group size is 128 and partitioning is on dimension 0, then 128 data layers will be created. The memory would be used up if the database object has large cache (prefetch) size.
> Although every process has the full NeuralNet object, i.e., all layers. Each process has a subset of workers which run over a subset of (data) layers. Consequently, in one process, only a small number of data layers will call ComputeFeature to read data records.
> To fix the bug, we just close the database after reading one sample record in Setup function, and re-open it in ComputeFeature function. In this way, only a smaller number of database instances are open in each process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)