You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2015/07/02 09:50:05 UTC

Re: Future plan for large scale DNN.

Here's new user interface design idea I propose. Any advices are welcome!

https://wiki.apache.org/hama/Neuron

On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Hey all,
>
>  As you know, the lastest Apache Hama provides distributed training of
> an Artificial Neural Network using its BSP computing engine. In
> general, the training data is stored in HDFS and is distributed in
> multiple machines. In Hama, two kinds of components are involved in
> the training procedure: the master task and the groom task. The master
> task is in charge of merging the model updating information and
> sending model updating information to all the groom tasks. The groom
> tasks is in charge of calculate the weight updates according to the
> training data.
>
>  The training procedure is iterative and each iteration consists of
> two phases: update weights and merge update. In the update weights
> phase, each groom task would first update the local model according to
> the received message from the master task. Then they would compute the
> weight updates locally with assigned data partitions (mini-batch SGD)
> and finally send the updated weights to the master task. In the merge
> update phase, the master task would update the model according to the
> messages received from the groom tasks. Then it would distribute the
> updated model to all groom tasks. The two phases will repeat
> alternatively until the termination condition is met (reach a
> specified number of iterations).
>
>  The model is designed in a hierarchical way. The base class is more
> abstract than the derived class, so that the structure of the ANN
> model can be freely set by the user, as long as it is a layered model.
> Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> can all be uniformly represented by an ANN.
>
>  However, as described in above, currently the data parallelism is
> only used. Each node will have a copy of the model. In each iteration,
> the computation is conducted on each node and a final aggregation is
> conducted in one node. Then the updated model will be synchronized to
> each node. So, the performance is one thing; the parameters should fit
> into the memory of a single machine.
>
>  Here is a tentative near future plan I propose for applications
> needing large model with huge memory consumptions, moderate
> computational power for one mini-batch, and lots of training data. The
> main idea is use of Parameter Server to parallelize model creation and
> distribute training across machines. Apache Hama framework assigns
> each split of training data stored in HDFS to each BSP task. Then, the
> BSP task assigns each of the N threads a small portion of work, much
> smaller than 1/Nth of the total size of a mini-batch, and assigns new
> portions whenever they are free. With this approach, faster threads do
> more work than slower threads. Each thread asynchronously asks the
> Parameter Server who stores the parameters in distributed machines for
> an updated copy of its model, computes the gradients on the assigned
> data, and sends updated gradients back to the parameter server. This
> architecture is inspired by Google's DistBelief (Jeff Dean et al,
> 2012). Finally, I have no concrete idea regarding programming
> interface at the moment but I'll try to provide neuron-centric
> programming model like Google's Pregel if possible.
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon

RE: Future plan for large scale DNN.

Posted by "Edward J. Yoon" <ed...@samsung.com>.

I fixed typo. :-) Thank you very much!

See also https://twitter.com/eddieyoon/status/616515242701357056

> You are planned others modes of training? (aka Backpropagation/online/etc.)

I've not think about it yet.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Julio Pires [mailto:juliocspires@gmail.com]
Sent: Friday, July 03, 2015 10:10 AM
To: dev@hama.apache.org
Subject: Re: Future plan for large scale DNN.


Hi

Edward, very nice!

1)



?You are planned others modes of training? (aka Backpropagation/online/etc.)
2) I suggest change setMomemtum to setMome*n*tum.

Thanks!


2015-07-02 4:50 GMT-03:00 Edward J. Yoon <ed...@apache.org>:

> Here's new user interface design idea I propose. Any advices are welcome!
>
> https://wiki.apache.org/hama/Neuron
>
> On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hey all,
> >
> >  As you know, the lastest Apache Hama provides distributed training of
> > an Artificial Neural Network using its BSP computing engine. In
> > general, the training data is stored in HDFS and is distributed in
> > multiple machines. In Hama, two kinds of components are involved in
> > the training procedure: the master task and the groom task. The master
> > task is in charge of merging the model updating information and
> > sending model updating information to all the groom tasks. The groom
> > tasks is in charge of calculate the weight updates according to the
> > training data.
> >
> >  The training procedure is iterative and each iteration consists of
> > two phases: update weights and merge update. In the update weights
> > phase, each groom task would first update the local model according to
> > the received message from the master task. Then they would compute the
> > weight updates locally with assigned data partitions (mini-batch SGD)
> > and finally send the updated weights to the master task. In the merge
> > update phase, the master task would update the model according to the
> > messages received from the groom tasks. Then it would distribute the
> > updated model to all groom tasks. The two phases will repeat
> > alternatively until the termination condition is met (reach a
> > specified number of iterations).
> >
> >  The model is designed in a hierarchical way. The base class is more
> > abstract than the derived class, so that the structure of the ANN
> > model can be freely set by the user, as long as it is a layered model.
> > Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> > can all be uniformly represented by an ANN.
> >
> >  However, as described in above, currently the data parallelism is
> > only used. Each node will have a copy of the model. In each iteration,
> > the computation is conducted on each node and a final aggregation is
> > conducted in one node. Then the updated model will be synchronized to
> > each node. So, the performance is one thing; the parameters should fit
> > into the memory of a single machine.
> >
> >  Here is a tentative near future plan I propose for applications
> > needing large model with huge memory consumptions, moderate
> > computational power for one mini-batch, and lots of training data. The
> > main idea is use of Parameter Server to parallelize model creation and
> > distribute training across machines. Apache Hama framework assigns
> > each split of training data stored in HDFS to each BSP task. Then, the
> > BSP task assigns each of the N threads a small portion of work, much
> > smaller than 1/Nth of the total size of a mini-batch, and assigns new
> > portions whenever they are free. With this approach, faster threads do
> > more work than slower threads. Each thread asynchronously asks the
> > Parameter Server who stores the parameters in distributed machines for
> > an updated copy of its model, computes the gradients on the assigned
> > data, and sends updated gradients back to the parameter server. This
> > architecture is inspired by Google's DistBelief (Jeff Dean et al,
> > 2012). Finally, I have no concrete idea regarding programming
> > interface at the moment but I'll try to provide neuron-centric
> > programming model like Google's Pregel if possible.
> >
> > --
> > Best Regards, Edward J. Yoon
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Future plan for large scale DNN.

Posted by Júlio Pires <ju...@gmail.com>.


Hi
 
Edward, very nice!

1)



You are planned others modes of training? (aka Backpropagation/online/etc.)
2) I suggest change setMomemtum to setMome*n*tum.

Thanks!


2015-07-02 4:50 GMT-03:00 Edward J. Yoon <ed...@apache.org>:

> Here's new user interface design idea I propose. Any advices are welcome!
>
> https://wiki.apache.org/hama/Neuron
>
> On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hey all,
> >
> >  As you know, the lastest Apache Hama provides distributed training of
> > an Artificial Neural Network using its BSP computing engine. In
> > general, the training data is stored in HDFS and is distributed in
> > multiple machines. In Hama, two kinds of components are involved in
> > the training procedure: the master task and the groom task. The master
> > task is in charge of merging the model updating information and
> > sending model updating information to all the groom tasks. The groom
> > tasks is in charge of calculate the weight updates according to the
> > training data.
> >
> >  The training procedure is iterative and each iteration consists of
> > two phases: update weights and merge update. In the update weights
> > phase, each groom task would first update the local model according to
> > the received message from the master task. Then they would compute the
> > weight updates locally with assigned data partitions (mini-batch SGD)
> > and finally send the updated weights to the master task. In the merge
> > update phase, the master task would update the model according to the
> > messages received from the groom tasks. Then it would distribute the
> > updated model to all groom tasks. The two phases will repeat
> > alternatively until the termination condition is met (reach a
> > specified number of iterations).
> >
> >  The model is designed in a hierarchical way. The base class is more
> > abstract than the derived class, so that the structure of the ANN
> > model can be freely set by the user, as long as it is a layered model.
> > Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> > can all be uniformly represented by an ANN.
> >
> >  However, as described in above, currently the data parallelism is
> > only used. Each node will have a copy of the model. In each iteration,
> > the computation is conducted on each node and a final aggregation is
> > conducted in one node. Then the updated model will be synchronized to
> > each node. So, the performance is one thing; the parameters should fit
> > into the memory of a single machine.
> >
> >  Here is a tentative near future plan I propose for applications
> > needing large model with huge memory consumptions, moderate
> > computational power for one mini-batch, and lots of training data. The
> > main idea is use of Parameter Server to parallelize model creation and
> > distribute training across machines. Apache Hama framework assigns
> > each split of training data stored in HDFS to each BSP task. Then, the
> > BSP task assigns each of the N threads a small portion of work, much
> > smaller than 1/Nth of the total size of a mini-batch, and assigns new
> > portions whenever they are free. With this approach, faster threads do
> > more work than slower threads. Each thread asynchronously asks the
> > Parameter Server who stores the parameters in distributed machines for
> > an updated copy of its model, computes the gradients on the assigned
> > data, and sends updated gradients back to the parameter server. This
> > architecture is inspired by Google's DistBelief (Jeff Dean et al,
> > 2012). Finally, I have no concrete idea regarding programming
> > interface at the moment but I'll try to provide neuron-centric
> > programming model like Google's Pregel if possible.
> >
> > --
> > Best Regards, Edward J. Yoon
>
>
>
> --
> Best Regards, Edward J. Yoon
>