You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@horn.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2016/06/01 10:53:59 UTC
[jira] [Updated] (HORN-27) Effective Parallel Training of Large Deep DropConnect Neural Networks

     [ https://issues.apache.org/jira/browse/HORN-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HORN-27:
-------------------------------
    Description: 
As you already might know, training a large scale deep ANN architectures, such as Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs), is challenging because the training process not only involves how to parallelize the training of large models but also it can be quite prone to over fitting due to large size of the network, even with large data sets. There are popular techniques for regularizing artificial neural networks, called DropOut [1] and DropConnect [2], by randomly dropping out hidden units and its connections during training of neural networks. 

This is exactly why we doing this project.  Of course at the moment, it's rough idea, I'm thinking about ensemble concept of drop out and connect which allows distributed parallel training with small communication requirements. The core idea of this is to create many model replicas on different subsets of the data, and partition each network model into multiple processors randomly, thus dropping connections and achieving locality of computation at the same time.

There have been already attempts to parallelize SGD-based training for large-scale deep learning models on distributed systems. Its basic concept is that each worker trains a copy of the model and combines their results synchronously, or updates through a centralized parameter server in asynchronous way. For large model, it generally uses layer-wise model parallelism based on matrix operations. However, this leads to a large communication overhead between host and device, or between hosts or devices (like blow image).

!https://4.bp.blogspot.com/-S6-akP8wGOE/V0eU9DrzESI/AAAAAAAAF-o/qAKZ08VgJDo9ZPJFHt1SXnfZ26yueBY2gCLcB/s640/modelparallel.png!

Differently, my basic approach is as follows: we assign the training data and model copy into a number of worker groups. Then, each group divides a large model irregularly into few disconnected sub-model of the parent model so that each worker runs independently of each other.


  was:
As you already might know, training a large scale deep ANN architectures, such as Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs), is challenging because the training process not only involves how to parallelize the training of large models but also it can be quite prone to over fitting due to large size of the network, even with large data sets. There are popular techniques for regularizing artificial neural networks, called DropOut [1] and DropConnect [2], by randomly dropping out hidden units and its connections during training of neural networks. 

This is just my rough idea at the moment, I'm thinking about ensemble concept of drop out and connect which allows distributed parallel training with small communication requirements. The core idea of this is to create many model replicas on different subsets of the data, and partition each network model into
multiple processors randomly, thus dropping connections and achieving
locality of computation at the same time.

There have been already attempts to parallelize SGD-based training for
large-scale deep learning models on distributed systems. Its basic concept
is that each worker trains a copy of the model and combines their results
synchronously, or updates through a centralized parameter server in
asynchronous way. For large model, it generally uses layer-wise model
parallelism based on matrix operations. However, this leads to a large
communication overhead between host and device, or between hosts or devices.

Differently, my basic approach is as follows: we assign the training data and model copy into a number of worker groups. Then, each group divides a large model irregularly into few disconnected sub-model of the parent model so that each worker runs independently of each other.



> Effective Parallel Training of Large Deep DropConnect Neural Networks
> ---------------------------------------------------------------------
>
>                 Key: HORN-27
>                 URL: https://issues.apache.org/jira/browse/HORN-27
>             Project: Apache Horn
>          Issue Type: Bug
>            Reporter: Edward J. Yoon
>
> As you already might know, training a large scale deep ANN architectures, such as Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs), is challenging because the training process not only involves how to parallelize the training of large models but also it can be quite prone to over fitting due to large size of the network, even with large data sets. There are popular techniques for regularizing artificial neural networks, called DropOut [1] and DropConnect [2], by randomly dropping out hidden units and its connections during training of neural networks. 
> This is exactly why we doing this project.  Of course at the moment, it's rough idea, I'm thinking about ensemble concept of drop out and connect which allows distributed parallel training with small communication requirements. The core idea of this is to create many model replicas on different subsets of the data, and partition each network model into multiple processors randomly, thus dropping connections and achieving locality of computation at the same time.
> There have been already attempts to parallelize SGD-based training for large-scale deep learning models on distributed systems. Its basic concept is that each worker trains a copy of the model and combines their results synchronously, or updates through a centralized parameter server in asynchronous way. For large model, it generally uses layer-wise model parallelism based on matrix operations. However, this leads to a large communication overhead between host and device, or between hosts or devices (like blow image).
> !https://4.bp.blogspot.com/-S6-akP8wGOE/V0eU9DrzESI/AAAAAAAAF-o/qAKZ08VgJDo9ZPJFHt1SXnfZ26yueBY2gCLcB/s640/modelparallel.png!
> Differently, my basic approach is as follows: we assign the training data and model copy into a number of worker groups. Then, each group divides a large model irregularly into few disconnected sub-model of the parent model so that each worker runs independently of each other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)