You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/06/29 07:20:45 UTC
[jira] [Commented] (SINGA-210) Enable checkpoint and resume for v1.0

    [ https://issues.apache.org/jira/browse/SINGA-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354718#comment-15354718 ] 

ASF subversion and git services commented on SINGA-210:
-------------------------------------------------------

Commit 62c6603ff7a3fe9f9749021e84ad9ec35f3fef7d in incubator-singa's branch refs/heads/dev from WANG Ji
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=62c6603 ]

SINGA-210 Enable checkpoint and resume for v1.0

This ticket is going to add code for dumping the model parameters as
checkpoint files, which could be used for fine-tuning and deployment.

Serialize Tensor into TensorProto and save it in BinFile, which is
stored as <prefix>.model, and generate description about parameters
in <prefix>.desc.

Unit test cases passed for kFloat, kInt and kDouble data type.


> Enable checkpoint and resume for v1.0
> -------------------------------------
>
>                 Key: SINGA-210
>                 URL: https://issues.apache.org/jira/browse/SINGA-210
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>
> This ticket is going to add code for dumping the model parameters as checkpoint files, which could be used for fine-tuning and deployment.
> The model parameters should be separated from model definition, i.e., net construction. Users either random initialize the layer parameters or using the parameters from checkpoint files after creating the neural net. In other words, we do not add a pair of serializing and parsing functions in the Layer class.
> We need to decide the format of the checkpoint file and how to write and read it:
> 1. the checkpoint file consists of the model parameters, which could be serialized as key-value pairs, where the key is the parameter name and value is a protobuf object including the shape and values. Optionally, there could be a text file including the parameter meta info, e..g, name and shape, which would be useful for users to know the model parameters without parsing the binary checkpoint file.
> 2. the binary checkpoint file can be serialized using the Writer SINGA-202 and loaded into memory using the Reader (SINGA-202).
> 3. A checkpoint utility class should be implemented for 1 and 2. Compatibility with caffe checkpoint files may also be considered to re-use models from caffe model zoo http://caffe.berkeleyvision.org/model_zoo.html.
> {code}
> class Checkpoint {
>   // <prefix>.model is the binary file for parameter key-value pair;   
>   // <prefix>.meta is the text file, one line per parameter. 
>   Checkpoint(prefix, mode=[R|W]);  
>   Read();  // read .model
>   ReadMeta() ; // read meta only
>   Get(key);  // return the value protobuf obj.
>   GetMeta(key);
>   Read(key);
>   Write(key, value);  // write to both .model and .meta files.
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)