You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2015/08/15 17:49:53 UTC

svn commit: r1696057 - in /incubator/singa/site/trunk: ./ content/ content/markdown/docs/ content/resources/images/arch/ content/resources/images/distributed/

Author: wangsh
Date: Sat Aug 15 15:49:53 2015
New Revision: 1696057

URL: http://svn.apache.org/r1696057
Log:
update distributed training guide

Added:
    incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
    incubator/singa/site/trunk/content/markdown/docs/frameworks.md
    incubator/singa/site/trunk/content/resources/images/distributed/
    incubator/singa/site/trunk/content/resources/images/distributed/frameworks.png   (with props)
    incubator/singa/site/trunk/content/resources/images/distributed/logical.png   (with props)
Removed:
    incubator/singa/site/trunk/content/markdown/docs/data.md
    incubator/singa/site/trunk/content/resources/images/arch/
Modified:
    incubator/singa/site/trunk/content/markdown/docs/architecture.md
    incubator/singa/site/trunk/content/site.xml
    incubator/singa/site/trunk/pom.xml

Modified: incubator/singa/site/trunk/content/markdown/docs/architecture.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/architecture.md?rev=1696057&r1=1696056&r2=1696057&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/architecture.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/architecture.md Sat Aug 15 15:49:53 2015
@@ -2,155 +2,53 @@
 
 ___
 
-We design the architecture of Singa with the consideration to make it general
-to unify existing architectures. Then users can easily test the performance of
-different architectures and select the best for their models.
-
 ### Logical Architecture
 
-<img src="../images/arch/logical.png" style="width: 550px"/>
+<img src="../images/distributed/logical.png" style="width: 550px"/>
 <p><strong> Fig.1 - Logical system architecture</strong></p>
 
-The logical system architecture is shown in Fig.1 with four worker groups and
-two server groups. Each worker group runs against a partition of the
-training dataset (called data parallelism) to compute the updates (e.g., the
-gradients) of parameters of one model replica.  Worker groups run
-asynchronously, while workers within one group run synchronously with each
-worker computing updates for a partition of the model parameters (called
-model parallelism). Each server group maintains one replica of the model
-parameters. It receives requests (e.g., Get/Put/Update) from workers and
-handles these requests. Server groups synchronize with neighboring server groups
-periodically or according to some rules. One worker (or server) group consists
-of multiple (user defined number) threads. These threads may resident in one
-process or span across multiple processes.
-
-Each worker (or server) group has a ParamShard object, which contains a full
-set of Param objects for a model replica. Since the workers (or servers) may
-span across multiple processes, this ParamShard may also be partitioned across
-multiple processes.  The ParamShards can share the same memory space for
-parameter values if they resident in the same process like
-[Caffe](http://caffe.berkeleyvision.org/)'s parallel implementation. Sharing
-memory could save memory space, but it could also change the training logic and
-thus affects the convergence rate.
-
-Each worker (thread) has a PMWorker (abbr for parameter management on workers)
-object, e.g., pmw1, pmw2, which calls the Get/Put/Update functions to get
-parameters, put parameters and update parameters. These functions may send
-requests to servers that use PMServer (abbr for parameter management on
-servers), e.g., pms1, pms2, to handle these requests.
-
-### Physical Architecture
-
-In this section, we describe how to configure Singa to generalize the logical
-architecture to the physical architectures of existing systems, e.g., Caffe and
-Baidu's DeepImage, Google Brain and Microsoft Adam. The architecture
-configuration includes:
-
-  * Number of worker threads per worker group and per process, which decides
-   the partitioning of worker side ParamShard
-
-  * Number of server threads per server group and per process, which decides
-   the partitioning of server side ParamShard
-
-  * Separation of servers and workers in different processes
-
-  * Number of worker groups per server group
-
-  * Topology of server groups, e.g., ring, tree, fully connected, etc.
-
-We put automatic optimization of the configuration as a future feature.
-
-#### No Partition of ParamShard
-
-<img src="../images/arch/arch1.png" style="width: 550px"/>
-<p><strong> Fig.2 - Physical system architecture without 
-partitioning ParamShard</strong></p>
-
-Fig.2 shows the architecture by configuring three threads per worker group, two
-threads per server group, two worker groups and one server group per process.
-Worker threads and server threads run as sub-threads. The main thread runs a
-loop as a stub to forward messages. For instance, the Get/Put/Update requests
-from workers are forwarded to the local servers, while Sync requests from the
-local servers are forwarded to remote servers. In this architecture, every
-group is fully contained in one process, hence the ParamShard objects is not
-partitioned.  If the ParamShards within one process share the same memory space
-for parameter values, the training procedure then follows
-[Hogwild](http://i.stanford.edu/hazy/hazy/victor/Hogwild/).  If we only
-launch process 1, Singa then runs in standalone mode. If we launch multiple
-processes, we can connect the server groups to form different topologies,
-e.g., ring or tree. Synchronization is conducted via inter-process
-communication between neighboring server groups. In
-[Caffe's](http://caffe.berkeleyvision.org/) parallel training architecture,
-processes are arranged into a ring. Caffe luanches one thread per model
-replica, hence it only supports data parallelism. Singa can also support
-model parallelism by partition the model replica among multiple worker
-threads within one group.
-
-#### Partition Server Side ParamShard
-
-<img src="../images/arch/arch2.png" style="width: 550px"/>
-<p><strong> Fig.3 - Physical system architecture, partitioning
-server ParamShard</strong></p>
-
-Fig.3 shows another physical architecture by configuring one worker group per
-process, and two processes per server group. Because the server group spans two
-processes, the ParamShard of the server group is partitioned across the two
-processes. We only show one server groups in the figure. The vertical lines
-represent inter-process communication to synchronize server groups if there
-are multiple server groups. In process 1, if the update for a parameter that
-residents in process 2, then the PMWorker's update request would be sent to
-process 2 via inter-process communication. If the parameter is maintained by
-process 1, then the update request is sent to pms2 directly via intra-process
-communication. The processes for other requests are the same. Baidu's
-[DeepImage](http://arxiv.org/abs/1501.02876) system uses one server group and
-its ParamShard is partitioned across all processes. Consequently, the stub of each process is
-connected with all other processes' stubs for updating parameters. Like Caffe,
-it launches only one thread per worker group, thus only support data
-parallelism.
-
-#### Partition Worker Side ParamShard
-
-<img src="../images/arch/arch3.png" style="width: 550px"/>
-<p><strong> Fig.4 - Physical system architecture, partitioning
-worker ParamShard</strong></p>
-
-The main difference of the architectures in Fig.4 and Fig.3 is that the worker
-group is partitioned over two processes in Fig.4. Consequently, the ParamShard
-is partitioned across process 1 and process 2. There are two kinds of
-parameters to consider:
-
-  * unique parameter: this kind of parameter exists in only one of the
-  partitioned ParamShards (due to model partition), and is updated by one
-  worker in its host process.  Hence the update is conducted similar to that in
-  Fig.2.
-
-  * replicated parameter: this kind of parameter is replicated in all
-  ParamShards (due to data partition), and its update is the aggregation of
-  updates from all workers.  The aggregation is processed as follows. First the
-  first process is selected as the leader process. (Update/Get/Put) requests of
-  PMWorkers on other processes are aggregated locally and then forwarded to the
-  leader process which handles it in the main thread. *The main thread of the
-  leader process creates a PMWorker over the ParamShard it owns. It handles
-  the request by calling the correspondingly functions, e.g., Update/Get/Put().
-  The responses from the servers are sent to the first worker of all sub ParamShards.*
-
-#### Workers and Servers in Different Processes
-
-<img src="../images/arch/arch4.png" style="width: 550px"/>
-<p><strong> Fig.5 - Physical system architecture, separating workers
-and servers</strong></p>
-
-Fig.5 shows an architecture similar to that in Fig.4 except that the servers
-and workers are separated into different processes. Consequently all requests
-are sent via inter-process communication and handled by remote servers. More
-details on this architecture are explained in
-[Worker-Server Architecture](parameter-management.html). This is
-the architecture used by Google's Brain and Microsoft's Adam system. It is also
-called the Downpour architecture.
-
-### Conclusion
-
-We have shown that Singa's architecture is general enough
-to unify the architectures of existing distributed training systems.
-Since the rest architectures can be derived similarly as the above four by
-setting different architecture configurations, we do not numerate them here.
+SINGA has flexible architecture to support different distributed
+[training frameworks](frameworks.html) (both synchronous and asynchronous).
+The logical system architecture is shown in Fig.1.
+The architecture consists of multiple server groups and worker groups:
+
+* **Server group**
+  A server group maintains a complete replica of the model parameters,
+  and is responsible for handling get/update requests from worker groups.
+  Neighboring server groups synchronize their parameters periodically.
+  Typically, a server group contains a number of servers, 
+  and each server manages a partition of model parameters.
+* **Worker group**
+  Each worker group communicates with only one server group.
+  A worker group trains a complete model replica
+  against a partition of the training dataset, 
+  and is responsible for computing parameter gradients. 
+  All worker groups run and communicate with the corresponding
+  server groups asynchronously.
+  However, inside each worker group,
+  the workers synchronously compute parameter updates for the model replica.
+
+There are different strategies to distribute the training workload among workers
+within a group: 
+  
+  * **Model parallelism**. Each worker computes a subset of parameters
+  against all data partitioned to the group.
+  * **Data parallelism**. Each worker computes all parameters
+  against a subset of data.
+  * [**Hybrid parallelism**](). SINGA also supports hybrid parallelism.
+
+
+### Implementation
+In SINGA, servers and workers are execution units running in separate threads.
+They communicate through [messages](communication.html).
+Every process runs the main thread as a stub that aggregates local messages
+and forwards them to corresponding (remote) receivers.
+
+Each server group and worker group have a *ParamShard*
+object representing a complete model replica. If workers and servers
+resident in the same process, their *ParamShard* (partitions) can
+be configured to share the same memory space. In this case, the
+messages transferred between different execution units just contain
+pointers to the data, which reduces the communication cost.
+Unlike in inter-process cases, 
+the messages have to include the parameter values.

Added: incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1696057&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md Sat Aug 15 15:49:53 2015
@@ -0,0 +1,14 @@
+## Distributed Training
+
+___
+
+SINGA is designed for distributed training of large deep learning models with
+huge amount of training data.
+
+Here we introduce distrbuted SINGA in following aspects:
+
+* [System Architecture](architecture.html)
+
+* [Training Frameworks](frameworks.html)
+
+* [System Communication](communication.html)

Added: incubator/singa/site/trunk/content/markdown/docs/frameworks.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/frameworks.md?rev=1696057&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/frameworks.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/frameworks.md Sat Aug 15 15:49:53 2015
@@ -0,0 +1,122 @@
+## Distributed Training Frameworks
+
+___
+
+### Cluster Topology Configuration
+
+Here we describe how to configure SINGA's cluster topology to support
+different distributed training frameworks.
+The cluster topology is configured in the `cluster` field in `JobProto`.
+The `cluster` is of type `ClusterProto`:
+
+    message ClusterProto {
+      optional int32 nworker_groups = 1;
+      optional int32 nserver_groups = 2;
+      optional int32 nworkers_per_group = 3 [default = 1];
+      optional int32 nservers_per_group = 4 [default = 1];
+      optional int32 nworkers_per_procs = 5 [default = 1];
+      optional int32 nservers_per_procs = 6 [default = 1];
+    
+      // servers and workers in different processes?
+      optional bool server_worker_separate = 20 [default = false];
+      
+      ......
+    }
+
+
+The mostly used fields are as follows:
+
+  * `nworkers_per_group` and `nworkers_per_procs`:
+  decide the partitioning of worker side ParamShard.
+  * `nservers_per_group` and `nservers_per_procs`:
+  decide the partitioning of server side ParamShard.
+  * `server_worker_separate`:
+  separate servers and workers in different processes.
+
+### Different Training Frameworks
+
+In SINGA, worker groups run asynchronously and
+workers within one group run synchronously.
+Users can leverage this general design to run
+both **synchronous** and **asynchronous** training frameworks.
+Here we illustrate how to configure
+popular distributed training frameworks in SINGA.
+
+<img src="../images/distributed/frameworks.png" style="width: 800px"/>
+<p><strong> Fig.1 - Training frameworks in SINGA</strong></p>
+
+####Sandblaster
+
+This is a **synchronous** framework used by Google Brain.
+Fig.2(a) shows the Sandblaster framework implemented in SINGA.
+Its configuration is as follows:
+    
+    cluster {
+        nworker_groups: 1
+        nserver_groups: 1
+        nworkers_per_group: 3
+        nservers_per_group: 2
+        server_worker_separate: true
+    }
+
+A single server group is launched to handle all requests from workers.
+A worker computes on its partition of the model,
+and only communicates with servers handling related parameters.
+
+    
+####AllReduce
+
+This is a **synchronous** framework used by Baidu's DeepImage.
+Fig.2(b) shows the AllReduce framework implemented in SINGA.
+Its configuration is as follows:
+    
+    cluster {
+        nworker_groups: 1
+        nserver_groups: 1
+        nworkers_per_group: 3
+        nservers_per_group: 3
+        server_worker_separate: false
+    }
+
+We bind each worker with a server on the same node, so that each
+node is responsible for maintaining a partition of parameters and
+collecting updates from all other nodes.
+
+####Downpour
+
+This is a **asynchronous** framework used by Google Brain.
+Fig.2(c) shows the Downpour framework implemented in SINGA.
+Its configuration is as follows:
+    
+    cluster {
+        nworker_groups: 2
+        nserver_groups: 1
+        nworkers_per_group: 2
+        nservers_per_group: 2
+        server_worker_separate: true
+    }
+
+Similar to the synchronous Sandblaster, all workers send
+requests to a global server group. We divide workers into several
+worker groups, each running independently and working on parameters
+from the last *update* response.
+
+####Distributed Hogwild
+
+This is a **asynchronous** framework used by Caffe.
+Fig.2(d) shows the Distributed Hogwild framework implemented in SINGA.
+Its configuration is as follows:
+    
+    cluster {
+        nworker_groups: 3
+        nserver_groups: 3
+        nworkers_per_group: 1
+        nservers_per_group: 1
+        server_worker_separate: false
+    }
+
+Each node contains a complete server group and a complete worker group.
+Parameter updates are done locally, so that communication cost
+during each training step is minimized.
+However, the server group must periodically synchronize with 
+neighboring groups to improve the training convergence.

Added: incubator/singa/site/trunk/content/resources/images/distributed/frameworks.png
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/resources/images/distributed/frameworks.png?rev=1696057&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/singa/site/trunk/content/resources/images/distributed/frameworks.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: incubator/singa/site/trunk/content/resources/images/distributed/frameworks.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/singa/site/trunk/content/resources/images/distributed/logical.png
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/resources/images/distributed/logical.png?rev=1696057&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/singa/site/trunk/content/resources/images/distributed/logical.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: incubator/singa/site/trunk/content/resources/images/distributed/logical.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/singa/site/trunk/content/site.xml
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1696057&r1=1696056&r2=1696057&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Sat Aug 15 15:49:53 2015
@@ -61,10 +61,13 @@
         <item name="Layer" href="docs/layer.html"/>
         <item name="Param" href="docs/param.html"/>
       </item>
-      <item name = "Data Preparation" href = "docs/data.html"/>
-      <item name = "Checkpoint" href = "docs/checkpoint.html"/>
-      <item name="System Architecture" href="docs/architecture.html"/>
-      <item name="Communication" href="docs/communication.html"/>
+      <item name="Distributed Training" href="docs/distributed-training.html">
+        <item name="System Architecture" href="docs/architecture.html"/>
+        <item name="Frameworks" href="docs/frameworks.html"/>
+        <item name="Communication" href="docs/communication.html"/>
+      </item>
+      <item name="Data Preparation" href="docs/data.html"/>
+      <item name="Checkpoint" href="docs/checkpoint.html"/>
       <item name="Examples" href="docs/examples.html"/>
       <item name="Debug" href="docs/debug.html"/>
     </menu>

Modified: incubator/singa/site/trunk/pom.xml
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/pom.xml?rev=1696057&r1=1696056&r2=1696057&view=diff
==============================================================================
--- incubator/singa/site/trunk/pom.xml (original)
+++ incubator/singa/site/trunk/pom.xml Sat Aug 15 15:49:53 2015
@@ -75,21 +75,13 @@
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-project-info-reports-plugin</artifactId>
         <version>2.7</version>
+        <configuration>
+          <dependencyLocationsEnabled>false</dependencyLocationsEnabled>
+        </configuration>
         <reportSets>
           <reportSet>
             <reports/>
           </reportSet>
-          <reportSet>
-            <id>aggregate</id>
-            <inherited>false</inherited>
-            <reports>
-              <!-- report>index</report -->
-              <!-- report>mailing-list</report -->
-              <!-- report>scm</report -->
-              <!-- report>issue-tracking</report -->
-              <!-- report>project-team</report -->
-            </reports>
-          </reportSet>
         </reportSets>
       </plugin>
     </plugins>