You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@systemml.apache.org by Sourav Mazumder <so...@gmail.com> on 2016/02/25 16:33:09 UTC

Using RDMA for Deep Learning

Hi,

Was checking CaffeOnSpark for Deep Learning which has just got open sourced
by Yahoo -
http://yahoohadoop.tumblr.com/post/139916563586/caffeonspark-open-sourced-for-distributed-deep?soc_src=mail&soc_trk=ma
.

Wondering whether SystemML can also leverage the RDMA based Model
synchronization approach as CaffeOnSpark can do.

If not can that be considered in future roadmap ?

Regards,
Sourav

Re: Using RDMA for Deep Learning

Posted by Niketan Pansare <np...@us.ibm.com>.

Hi Sourav,

Please see below comments:
>> 1. May be instead of trying to re-create MPI-based communication layer
necessary for distributed learning for Deep Networks why not System ML also
plan to support CaffeOnSpark itself as another run time platform.
We are mixing runtime platform and library here. Example of runtime
platforms are hadoop, spark and singlenode (and may be in future GPU),
which can support large subset of SystemML's physical operators (for
example: matrix-matrix multiplication, matrix-matrix addition, and so on).
In essence, a core requirement of adding a new runtime platform into
SystemML is that existing DML algorithms should be able to run on the new
runtime. Caffe (and hence by extension CaffeOnSpark) does not meet this
requirement.

I would call Caffe, a library (or may be a system) for deep learning. There
are two possibilities for a library:
a. Can the given library be used "along with" SystemML ? If CaffeOnSpark do
expose RDD interface, then yes. Example of libraries that can be used along
with SystemML are Spark SQL, MLPipeline, etc.
b. Can the given library be used "in" SystemML ? An example of this is
commons math (please see
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibCommonsMath.java
). Then, here we ask ourselves following questions:
- Whether we are talking about CaffeOnSpark or Caffe ?
- How we expose these primitives on DML level ?
- Do these primitives even need an optimizer ? If not, does SystemML
provide value in wrapping these libraries ?

I would advocate the first option that let CaffeOnSpark be an external
library. If they expose a RDD interface, let us come up with usecase where
we do preprocessing using SparkSQL, then Deep Network training using
CaffeOnSpark and potentially some machine learning after that using
SystemML (via MLContext) and put that usecase into our docs.

>> 2. Can today someone still write a DML for Neural Network using the
existing facilities of System ML ?
Yes, you can write neural network in DML.  (Some operations need to be
added, for example: pooling).

If yes what would be the drawback of
that (because as I understand current implementation will not have the
advanced features mentioned in epic 540 and 445) ?
The drawback is simple: doing LOT of simple core operations (like the ones
described below) on GPU is much faster than doing it on CPU. Again the
emphasis is on "LOT" :)

At lowest level, three core operations* needed to implement a neural
network are "matrix multiplication", "certain element-wise operations" and
"primitives required for SGD". The primitives required for SGD are "left
indexing (W[i,j] = ....)" and yes, lot and lot of low-cost threads. The
reason why GPUs are so attractive for Deep Learning is that they do
precisely these operations "efficiently". This is why most deep learning
software rely on GPUs for efficiency (and hence are singlenode system
without any optimizer), whereas they focus on ease of use (i.e. network
expressibility and autodifferentiation).  The exceptions* are the systems
that use some variant of "parameter server" (for example:
https://github.com/dmlc/mxnet and of course CaffeOnSpark).

* => Another distributed deep learning system is http://deeplearning4j.org/
.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

From:	Sourav Mazumder <so...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	02/26/2016 07:51 AM
Subject:	Re: Using RDMA for Deep Learning

Hi Niketan,

Thanks for the detailed information.

Here are few follow on comments/questions -

1. May be instead of trying to re-create MPI-based communication layer
necessary for distributed learning for Deep Networks why not System ML also
plan to support CaffeOnSpark itself as another run time platform. At least
to start with. May be later on System ML can have its more sophisticated
implementation.
Also may be sooner or later CaffeOnSpark components become part of Spark
Distro (core or extension).

2. Can today someone still write a DML for Neural Network using the
existing facilities of System ML ? If yes what would be the drawback of
that (because as I understand current implementation will not have the
advanced features mentioned in epic 540 and 445) ?

Regards,
Sourav
<https://issues.apache.org/jira/browse/SYSTEMML-540>

On Thu, Feb 25, 2016 at 10:24 AM, Niketan Pansare <np...@us.ibm.com>
wrote:

> Hi Sourav,
>
> RDMA-based model synchronization can be considered in our future roadmap.
> Here are two epic that this task is dependent on:
> https://issues.apache.org/jira/browse/SYSTEMML-540
> https://issues.apache.org/jira/browse/SYSTEMML-445
>
> It is also important to note that "model parallelism" gets us into
> accuracy v/s performance tradeoff discussion. SystemML optimizer/rewrites
> in its current state, makes decisions to improve the performance and
> doesnot change the semantics of the input DML script. So, an imprecise
> language level primitive (for example "minimize(layer)") might be
required
> to support "model parallelism".
>
> Another point: my guess is that Yahoo ML team have added a parallel
> MPI-based communication layer (for example: using MVAPICH2 to enable CUDA
> 5.0's GPUDirect RDMA) rather than rely on Spark's communication layer. I
> understand this is necessary for distributed learning for Deep Networks
(to
> avoid paying the cost of communicating to/from JVM). However, it is a
> non-trivial addition to any system which needs detailed discussion (for
> example: on fault-tolerance, permissions, homogenous clusters, etc) :)
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Sourav Mazumder ---02/25/2016 07:33:21
> AM---Hi, Was checking CaffeOnSpark for Deep Learning which has]Sourav
> Mazumder ---02/25/2016 07:33:21 AM---Hi, Was checking CaffeOnSpark for
Deep
> Learning which has just got open sourced
>
> From: Sourav Mazumder <so...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 02/25/2016 07:33 AM
> Subject: Using RDMA for Deep Learning
> ------------------------------
>
>
>
> Hi,
>
> Was checking CaffeOnSpark for Deep Learning which has just got open
sourced
> by Yahoo -
>
>
http://yahoohadoop.tumblr.com/post/139916563586/caffeonspark-open-sourced-for-distributed-deep?soc_src=mail&soc_trk=ma

> .
>
> Wondering whether SystemML can also leverage the RDMA based Model
> synchronization approach as CaffeOnSpark can do.
>
> If not can that be considered in future roadmap ?
>
> Regards,
> Sourav
>
>
>

Re: Using RDMA for Deep Learning

Posted by Sourav Mazumder <so...@gmail.com>.

Hi Niketan,

Thanks for the detailed information.

Here are few follow on comments/questions -

1. May be instead of trying to re-create MPI-based communication layer
necessary for distributed learning for Deep Networks why not System ML also
plan to support CaffeOnSpark itself as another run time platform. At least
to start with. May be later on System ML can have its more sophisticated
implementation.
Also may be sooner or later CaffeOnSpark components become part of Spark
Distro (core or extension).

2. Can today someone still write a DML for Neural Network using the
existing facilities of System ML ? If yes what would be the drawback of
that (because as I understand current implementation will not have the
advanced features mentioned in epic 540 and 445) ?

Regards,
Sourav
<https://issues.apache.org/jira/browse/SYSTEMML-540>

On Thu, Feb 25, 2016 at 10:24 AM, Niketan Pansare <np...@us.ibm.com>
wrote:

> Hi Sourav,
>
> RDMA-based model synchronization can be considered in our future roadmap.
> Here are two epic that this task is dependent on:
> https://issues.apache.org/jira/browse/SYSTEMML-540
> https://issues.apache.org/jira/browse/SYSTEMML-445
>
> It is also important to note that "model parallelism" gets us into
> accuracy v/s performance tradeoff discussion. SystemML optimizer/rewrites
> in its current state, makes decisions to improve the performance and
> doesnot change the semantics of the input DML script. So, an imprecise
> language level primitive (for example "minimize(layer)") might be required
> to support "model parallelism".
>
> Another point: my guess is that Yahoo ML team have added a parallel
> MPI-based communication layer (for example: using MVAPICH2 to enable CUDA
> 5.0's GPUDirect RDMA) rather than rely on Spark's communication layer. I
> understand this is necessary for distributed learning for Deep Networks (to
> avoid paying the cost of communicating to/from JVM). However, it is a
> non-trivial addition to any system which needs detailed discussion (for
> example: on fault-tolerance, permissions, homogenous clusters, etc) :)
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Sourav Mazumder ---02/25/2016 07:33:21
> AM---Hi, Was checking CaffeOnSpark for Deep Learning which has]Sourav
> Mazumder ---02/25/2016 07:33:21 AM---Hi, Was checking CaffeOnSpark for Deep
> Learning which has just got open sourced
>
> From: Sourav Mazumder <so...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 02/25/2016 07:33 AM
> Subject: Using RDMA for Deep Learning
> ------------------------------
>
>
>
> Hi,
>
> Was checking CaffeOnSpark for Deep Learning which has just got open sourced
> by Yahoo -
>
> http://yahoohadoop.tumblr.com/post/139916563586/caffeonspark-open-sourced-for-distributed-deep?soc_src=mail&soc_trk=ma
> .
>
> Wondering whether SystemML can also leverage the RDMA based Model
> synchronization approach as CaffeOnSpark can do.
>
> If not can that be considered in future roadmap ?
>
> Regards,
> Sourav
>
>
>

Re: Using RDMA for Deep Learning

Posted by Niketan Pansare <np...@us.ibm.com>.

Hi Sourav,

RDMA-based model synchronization can be considered in our future roadmap.
Here are two epic that this task is dependent on:
https://issues.apache.org/jira/browse/SYSTEMML-540
https://issues.apache.org/jira/browse/SYSTEMML-445

It is also important to note that "model parallelism" gets us into accuracy
v/s performance tradeoff discussion. SystemML optimizer/rewrites in its
current state, makes decisions to improve the performance and doesnot
change the semantics of the input DML script. So, an imprecise language
level primitive (for example "minimize(layer)") might be required to
support "model parallelism".

Another point: my guess is that Yahoo ML team have added a parallel
MPI-based communication layer (for example: using MVAPICH2 to enable CUDA
5.0's GPUDirect RDMA) rather than rely on Spark's communication layer. I
understand this is necessary for distributed learning for Deep Networks (to
avoid paying the cost of communicating to/from JVM). However, it is a
non-trivial addition to any system which needs detailed discussion (for
example: on fault-tolerance, permissions, homogenous clusters, etc) :)

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Sourav Mazumder <so...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	02/25/2016 07:33 AM
Subject:	Using RDMA for Deep Learning



Hi,

Was checking CaffeOnSpark for Deep Learning which has just got open sourced
by Yahoo -
http://yahoohadoop.tumblr.com/post/139916563586/caffeonspark-open-sourced-for-distributed-deep?soc_src=mail&soc_trk=ma

.

Wondering whether SystemML can also leverage the RDMA based Model
synchronization approach as CaffeOnSpark can do.

If not can that be considered in future roadmap ?

Regards,
Sourav