You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/04/08 04:39:25 UTC

[jira] [Commented] (SINGA-130) Implement a layer subclass for data prefetching

    [ https://issues.apache.org/jira/browse/SINGA-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231526#comment-15231526 ] 

ASF subversion and git services commented on SINGA-130:
-------------------------------------------------------

Commit a0bdd0b85ddba7d670ab04c5de04a29c8366e868 in incubator-singa's branch refs/heads/master from [~ug93tad]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=a0bdd0b ]

SINGA-130 Data prefetching layer

Extended StoreInputLayer to support prefetching of data. It maintains a buffer for (key,value) pairs read from the storage
layer. In Setup(), it launches a new thread for reading data into the buffer. This thread stores data into the buffer. The
ComputeFeature() method waits for thread to finish (join) before parsing it into data_ and aux_ field. Finally, it launches
another thread.

In terms of memory consumption, this prefetching use extra (batchsize*recordsize) bytes for the buffer. However, we observe
no visible runtime improvement, as I/O time is very small (in order of milliseconds without prefetching, and tens of microsecond
with prefetching) compared to CPU time.


> Implement a layer subclass for data prefetching
> -----------------------------------------------
>
>                 Key: SINGA-130
>                 URL: https://issues.apache.org/jira/browse/SINGA-130
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>            Assignee: Anh Dinh
>              Labels: data, multi-threading, prefetch
>
> Data prefetching is important for training with GPU, because the IO would become the bottleneck when the computation is very fast.
> One idea is to create a general prefetch layer which embeds the application specific data loading layers. 
> {code}
> PrefetchLayer::ComptueFeature() {
>   wait until the pretch thread finishes.
>   swap the prefeth_data_ and data_ blobs.
>   if (first time)
>      load data into data_ blobs
>   spawn a new thread to call functions from data loading layers for loading data into prefetch_data_.
> }
> {code}
>  
> If the prefetch layer has multiple loading layers and is connected to multiple destination layers, then different destination layer may want data loaded by different loading layers. This case should be handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)