You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2015/06/29 09:38:40 UTC

Future plan for large scale DNN.

Hey all,

 As you know, the lastest Apache Hama provides distributed training of
an Artificial Neural Network using its BSP computing engine. In
general, the training data is stored in HDFS and is distributed in
multiple machines. In Hama, two kinds of components are involved in
the training procedure: the master task and the groom task. The master
task is in charge of merging the model updating information and
sending model updating information to all the groom tasks. The groom
tasks is in charge of calculate the weight updates according to the
training data.

 The training procedure is iterative and each iteration consists of
two phases: update weights and merge update. In the update weights
phase, each groom task would first update the local model according to
the received message from the master task. Then they would compute the
weight updates locally with assigned data partitions (mini-batch SGD)
and finally send the updated weights to the master task. In the merge
update phase, the master task would update the model according to the
messages received from the groom tasks. Then it would distribute the
updated model to all groom tasks. The two phases will repeat
alternatively until the termination condition is met (reach a
specified number of iterations).

 The model is designed in a hierarchical way. The base class is more
abstract than the derived class, so that the structure of the ANN
model can be freely set by the user, as long as it is a layered model.
Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
can all be uniformly represented by an ANN.

 However, as described in above, currently the data parallelism is
only used. Each node will have a copy of the model. In each iteration,
the computation is conducted on each node and a final aggregation is
conducted in one node. Then the updated model will be synchronized to
each node. So, the performance is one thing; the parameters should fit
into the memory of a single machine.

 Here is a tentative near future plan I propose for applications
needing large model with huge memory consumptions, moderate
computational power for one mini-batch, and lots of training data. The
main idea is use of Parameter Server to parallelize model creation and
distribute training across machines. Apache Hama framework assigns
each split of training data stored in HDFS to each BSP task. Then, the
BSP task assigns each of the N threads a small portion of work, much
smaller than 1/Nth of the total size of a mini-batch, and assigns new
portions whenever they are free. With this approach, faster threads do
more work than slower threads. Each thread asynchronously asks the
Parameter Server who stores the parameters in distributed machines for
an updated copy of its model, computes the gradients on the assigned
data, and sends updated gradients back to the parameter server. This
architecture is inspired by Google's DistBelief (Jeff Dean et al,
2012). Finally, I have no concrete idea regarding programming
interface at the moment but I'll try to provide neuron-centric
programming model like Google's Pregel if possible.

-- 
Best Regards, Edward J. Yoon

RE: Future plan for large scale DNN.

Posted by "Edward J. Yoon" <ed...@samsung.com>.

I fixed typo. :-) Thank you very much!

See also https://twitter.com/eddieyoon/status/616515242701357056

> You are planned others modes of training? (aka Backpropagation/online/etc.)

I've not think about it yet.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Julio Pires [mailto:juliocspires@gmail.com]
Sent: Friday, July 03, 2015 10:10 AM
To: dev@hama.apache.org
Subject: Re: Future plan for large scale DNN.


Hi

Edward, very nice!

1)



?You are planned others modes of training? (aka Backpropagation/online/etc.)
2) I suggest change setMomemtum to setMome*n*tum.

Thanks!


2015-07-02 4:50 GMT-03:00 Edward J. Yoon <ed...@apache.org>:

> Here's new user interface design idea I propose. Any advices are welcome!
>
> https://wiki.apache.org/hama/Neuron
>
> On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hey all,
> >
> >  As you know, the lastest Apache Hama provides distributed training of
> > an Artificial Neural Network using its BSP computing engine. In
> > general, the training data is stored in HDFS and is distributed in
> > multiple machines. In Hama, two kinds of components are involved in
> > the training procedure: the master task and the groom task. The master
> > task is in charge of merging the model updating information and
> > sending model updating information to all the groom tasks. The groom
> > tasks is in charge of calculate the weight updates according to the
> > training data.
> >
> >  The training procedure is iterative and each iteration consists of
> > two phases: update weights and merge update. In the update weights
> > phase, each groom task would first update the local model according to
> > the received message from the master task. Then they would compute the
> > weight updates locally with assigned data partitions (mini-batch SGD)
> > and finally send the updated weights to the master task. In the merge
> > update phase, the master task would update the model according to the
> > messages received from the groom tasks. Then it would distribute the
> > updated model to all groom tasks. The two phases will repeat
> > alternatively until the termination condition is met (reach a
> > specified number of iterations).
> >
> >  The model is designed in a hierarchical way. The base class is more
> > abstract than the derived class, so that the structure of the ANN
> > model can be freely set by the user, as long as it is a layered model.
> > Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> > can all be uniformly represented by an ANN.
> >
> >  However, as described in above, currently the data parallelism is
> > only used. Each node will have a copy of the model. In each iteration,
> > the computation is conducted on each node and a final aggregation is
> > conducted in one node. Then the updated model will be synchronized to
> > each node. So, the performance is one thing; the parameters should fit
> > into the memory of a single machine.
> >
> >  Here is a tentative near future plan I propose for applications
> > needing large model with huge memory consumptions, moderate
> > computational power for one mini-batch, and lots of training data. The
> > main idea is use of Parameter Server to parallelize model creation and
> > distribute training across machines. Apache Hama framework assigns
> > each split of training data stored in HDFS to each BSP task. Then, the
> > BSP task assigns each of the N threads a small portion of work, much
> > smaller than 1/Nth of the total size of a mini-batch, and assigns new
> > portions whenever they are free. With this approach, faster threads do
> > more work than slower threads. Each thread asynchronously asks the
> > Parameter Server who stores the parameters in distributed machines for
> > an updated copy of its model, computes the gradients on the assigned
> > data, and sends updated gradients back to the parameter server. This
> > architecture is inspired by Google's DistBelief (Jeff Dean et al,
> > 2012). Finally, I have no concrete idea regarding programming
> > interface at the moment but I'll try to provide neuron-centric
> > programming model like Google's Pregel if possible.
> >
> > --
> > Best Regards, Edward J. Yoon
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Future plan for large scale DNN.

Posted by Júlio Pires <ju...@gmail.com>.


Hi
 
Edward, very nice!

1)



You are planned others modes of training? (aka Backpropagation/online/etc.)
2) I suggest change setMomemtum to setMome*n*tum.

Thanks!


2015-07-02 4:50 GMT-03:00 Edward J. Yoon <ed...@apache.org>:

> Here's new user interface design idea I propose. Any advices are welcome!
>
> https://wiki.apache.org/hama/Neuron
>
> On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hey all,
> >
> >  As you know, the lastest Apache Hama provides distributed training of
> > an Artificial Neural Network using its BSP computing engine. In
> > general, the training data is stored in HDFS and is distributed in
> > multiple machines. In Hama, two kinds of components are involved in
> > the training procedure: the master task and the groom task. The master
> > task is in charge of merging the model updating information and
> > sending model updating information to all the groom tasks. The groom
> > tasks is in charge of calculate the weight updates according to the
> > training data.
> >
> >  The training procedure is iterative and each iteration consists of
> > two phases: update weights and merge update. In the update weights
> > phase, each groom task would first update the local model according to
> > the received message from the master task. Then they would compute the
> > weight updates locally with assigned data partitions (mini-batch SGD)
> > and finally send the updated weights to the master task. In the merge
> > update phase, the master task would update the model according to the
> > messages received from the groom tasks. Then it would distribute the
> > updated model to all groom tasks. The two phases will repeat
> > alternatively until the termination condition is met (reach a
> > specified number of iterations).
> >
> >  The model is designed in a hierarchical way. The base class is more
> > abstract than the derived class, so that the structure of the ANN
> > model can be freely set by the user, as long as it is a layered model.
> > Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> > can all be uniformly represented by an ANN.
> >
> >  However, as described in above, currently the data parallelism is
> > only used. Each node will have a copy of the model. In each iteration,
> > the computation is conducted on each node and a final aggregation is
> > conducted in one node. Then the updated model will be synchronized to
> > each node. So, the performance is one thing; the parameters should fit
> > into the memory of a single machine.
> >
> >  Here is a tentative near future plan I propose for applications
> > needing large model with huge memory consumptions, moderate
> > computational power for one mini-batch, and lots of training data. The
> > main idea is use of Parameter Server to parallelize model creation and
> > distribute training across machines. Apache Hama framework assigns
> > each split of training data stored in HDFS to each BSP task. Then, the
> > BSP task assigns each of the N threads a small portion of work, much
> > smaller than 1/Nth of the total size of a mini-batch, and assigns new
> > portions whenever they are free. With this approach, faster threads do
> > more work than slower threads. Each thread asynchronously asks the
> > Parameter Server who stores the parameters in distributed machines for
> > an updated copy of its model, computes the gradients on the assigned
> > data, and sends updated gradients back to the parameter server. This
> > architecture is inspired by Google's DistBelief (Jeff Dean et al,
> > 2012). Finally, I have no concrete idea regarding programming
> > interface at the moment but I'll try to provide neuron-centric
> > programming model like Google's Pregel if possible.
> >
> > --
> > Best Regards, Edward J. Yoon
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Future plan for large scale DNN.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Here's new user interface design idea I propose. Any advices are welcome!

https://wiki.apache.org/hama/Neuron

On Mon, Jun 29, 2015 at 4:38 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Hey all,
>
>  As you know, the lastest Apache Hama provides distributed training of
> an Artificial Neural Network using its BSP computing engine. In
> general, the training data is stored in HDFS and is distributed in
> multiple machines. In Hama, two kinds of components are involved in
> the training procedure: the master task and the groom task. The master
> task is in charge of merging the model updating information and
> sending model updating information to all the groom tasks. The groom
> tasks is in charge of calculate the weight updates according to the
> training data.
>
>  The training procedure is iterative and each iteration consists of
> two phases: update weights and merge update. In the update weights
> phase, each groom task would first update the local model according to
> the received message from the master task. Then they would compute the
> weight updates locally with assigned data partitions (mini-batch SGD)
> and finally send the updated weights to the master task. In the merge
> update phase, the master task would update the model according to the
> messages received from the groom tasks. Then it would distribute the
> updated model to all groom tasks. The two phases will repeat
> alternatively until the termination condition is met (reach a
> specified number of iterations).
>
>  The model is designed in a hierarchical way. The base class is more
> abstract than the derived class, so that the structure of the ANN
> model can be freely set by the user, as long as it is a layered model.
> Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor
> can all be uniformly represented by an ANN.
>
>  However, as described in above, currently the data parallelism is
> only used. Each node will have a copy of the model. In each iteration,
> the computation is conducted on each node and a final aggregation is
> conducted in one node. Then the updated model will be synchronized to
> each node. So, the performance is one thing; the parameters should fit
> into the memory of a single machine.
>
>  Here is a tentative near future plan I propose for applications
> needing large model with huge memory consumptions, moderate
> computational power for one mini-batch, and lots of training data. The
> main idea is use of Parameter Server to parallelize model creation and
> distribute training across machines. Apache Hama framework assigns
> each split of training data stored in HDFS to each BSP task. Then, the
> BSP task assigns each of the N threads a small portion of work, much
> smaller than 1/Nth of the total size of a mini-batch, and assigns new
> portions whenever they are free. With this approach, faster threads do
> more work than slower threads. Each thread asynchronously asks the
> Parameter Server who stores the parameters in distributed machines for
> an updated copy of its model, computes the gradients on the assigned
> data, and sends updated gradients back to the parameter server. This
> architecture is inspired by Google's DistBelief (Jeff Dean et al,
> 2012). Finally, I have no concrete idea regarding programming
> interface at the moment but I'll try to provide neuron-centric
> programming model like Google's Pregel if possible.
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon