You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by Wang Wei <wa...@comp.nus.edu.sg> on 2015/04/21 06:05:19 UTC

Communication between GPUs

As planed in the previous discussion, we are stabilizing the APIs of each
module.
One problem I am encountered is about the communication APIs to support
GPUs.

We can use some libraries like cudamat (https://code.google.com/p/cudamat/)
for linear algebra computation. Hence, the APIs on computation would almost
the same as those for CPU. But I have poor knowledge on the communication
between GPU and CPU, and the communication between GPUs.
I am asking you for your suggestions.

Wangyuan, Wuwei and Haibo: Since you are working on deep learning using
GPUs, it would be appreciated if you can give some feedback.

As far as I know that traditionally messages are transferred from GPU
memory to CPU memory and then transferred through TCP/IP to other nodes and
then transferred from CPU memory to GPU memory. We can easily support such
communication using the current APIs for CPU. But the transferring between
GPU and CPU would bring extra cost.
NVDIA has provided a technique called GPUDirect, which enables directly
message passing from GPU memory to network (e.g., infiniband) card. Some
MPI variants now use this technique. But we have switched from MPI to
ZeroMQ, we need to make sure that ZeroMQ supports GPUDirect and
Infiniband.  Do you have any investigations on this? Or how do you
implement the message transferring in your implementation?

Thanks.

regards,
Wei

Re: Re: Communication between GPUs

Posted by 陈海波 <hz...@corp.netease.com>.
       I agree with you. If we design a general messaging APIs,Singa may need to change a lot,especially for program framework.But 
I think it is worth doing ,because we don't need to care more about the implementation of the underlying message protocol 
and it makes sure that Singa can be scalable and extensible in the future.
        And we will also learning more about these. If we have some good idea,we will discuss with you.
   Keep in touch,thanks~

在2015-04-22 10:33:02,陈海波<hz...@corp.netease.com>写道:
> Thanks a lot!
> 
> 
> 
> On Tue, Apr 21, 2015 at 5:51 PM, 陈海波 <hz...@corp.netease.com> wrote:
> 
> > Hi~,
> >     In our preivious work of deep learning on GPU, we focus on parallel
> > training of DNN(without convolution layer) for our speech recognition.
> > It is't easy to adopt model parallelization strategy to speed training.And
> > with the consideration of transfering big model between node and
> > node,so we decided to use a single node with multi-GPU for training. And
> > we use CUDA APIs For transfering messages between GPU and GPU
> > (both support GPUDirect and without GPUDirect).In our plan,it does not
> > exits the problem of multi node communication.
> > Some discussions:
> > 1) We think cudamat is a good choice for linear algebra computataion. But
> > we find you use mshadow libraries to develop singa.
> >    As we know,mshadow can provide a GPU Matrix/Tensor Template libary,and
> > it also support some simple interfaces for Multi-GPU.So we think
> > we can go on using mshadow for linear algebra computataion on both GPU and
> > CPU.
> >
> Yes. We will continue using Mshadow.
> 
> > 2)  we consult NVIDIA's officials and they give an answer that they are
> > not sure whether ZeroMQ supports GPUDirect and Infiniband or not,and
> >    they suggest us adopting OpenMPI.
> >
> > ZeroMQ should support Infiniband (http://zeromq.org/area:results). But
> may not support GPUDirect. It seems Caffe (
> https://github.com/BVLC/caffe/blob/parallel/src/caffe/parallel.cpp) is
> implementing distributed training using GPU+Infiniband, but GPUDirect is
> not used. I will learn more about GPUDirect and discuss with you.
> Another solution that I am trying to do is to provide a general messaging
> API (like https://github.com/dmlc/rabit) and provide different
> implementations (ZeroMQ or MPI).
> 
>    And I think we can discuss more.
> >
> > thanks~
> >
> > 在2015-04-21 12:05:19,陈海波<hz...@corp.netease.com>写道:
> > > As planed in the previous discussion, we are stabilizing the APIs of each
> > > module.
> > > One problem I am encountered is about the communication APIs to support
> > > GPUs.
> > >
> > > We can use some libraries like cudamat (
> > https://code.google.com/p/cudamat/)
> > > for linear algebra computation. Hence, the APIs on computation would
> > almost
> > > the same as those for CPU. But I have poor knowledge on the communication
> > > between GPU and CPU, and the communication between GPUs.
> > > I am asking you for your suggestions.
> > >
> > > Wangyuan, Wuwei and Haibo: Since you are working on deep learning using
> > > GPUs, it would be appreciated if you can give some feedback.
> > >
> > > As far as I know that traditionally messages are transferred from GPU
> > > memory to CPU memory and then transferred through TCP/IP to other nodes
> > and
> > > then transferred from CPU memory to GPU memory. We can easily support
> > such
> > > communication using the current APIs for CPU. But the transferring
> > between
> > > GPU and CPU would bring extra cost.
> > > NVDIA has provided a technique called GPUDirect, which enables directly
> > > message passing from GPU memory to network (e.g., infiniband) card. Some
> > > MPI variants now use this technique. But we have switched from MPI to
> > > ZeroMQ, we need to make sure that ZeroMQ supports GPUDirect and
> > > Infiniband.  Do you have any investigations on this? Or how do you
> > > implement the message transferring in your implementation?
> > >
> > > Thanks.
> > >
> > > regards,
> > > Wei
> >
> >


Re: Communication between GPUs

Posted by Wang Wei <wa...@comp.nus.edu.sg>.
Thanks a lot!



On Tue, Apr 21, 2015 at 5:51 PM, 陈海波 <hz...@corp.netease.com> wrote:

> Hi~,
>     In our preivious work of deep learning on GPU, we focus on parallel
> training of DNN(without convolution layer) for our speech recognition.
> It is't easy to adopt model parallelization strategy to speed training.And
> with the consideration of transfering big model between node and
> node,so we decided to use a single node with multi-GPU for training. And
> we use CUDA APIs For transfering messages between GPU and GPU
> (both support GPUDirect and without GPUDirect).In our plan,it does not
> exits the problem of multi node communication.
> Some discussions:
> 1) We think cudamat is a good choice for linear algebra computataion. But
> we find you use mshadow libraries to develop singa.
>    As we know,mshadow can provide a GPU Matrix/Tensor Template libary,and
> it also support some simple interfaces for Multi-GPU.So we think
> we can go on using mshadow for linear algebra computataion on both GPU and
> CPU.
>
Yes. We will continue using Mshadow.

> 2)  we consult NVIDIA's officials and they give an answer that they are
> not sure whether ZeroMQ supports GPUDirect and Infiniband or not,and
>    they suggest us adopting OpenMPI.
>
> ZeroMQ should support Infiniband (http://zeromq.org/area:results). But
may not support GPUDirect. It seems Caffe (
https://github.com/BVLC/caffe/blob/parallel/src/caffe/parallel.cpp) is
implementing distributed training using GPU+Infiniband, but GPUDirect is
not used. I will learn more about GPUDirect and discuss with you.
Another solution that I am trying to do is to provide a general messaging
API (like https://github.com/dmlc/rabit) and provide different
implementations (ZeroMQ or MPI).

   And I think we can discuss more.
>
> thanks~
>
> 在2015-04-21 12:05:19,陈海波<hz...@corp.netease.com>写道:
> > As planed in the previous discussion, we are stabilizing the APIs of each
> > module.
> > One problem I am encountered is about the communication APIs to support
> > GPUs.
> >
> > We can use some libraries like cudamat (
> https://code.google.com/p/cudamat/)
> > for linear algebra computation. Hence, the APIs on computation would
> almost
> > the same as those for CPU. But I have poor knowledge on the communication
> > between GPU and CPU, and the communication between GPUs.
> > I am asking you for your suggestions.
> >
> > Wangyuan, Wuwei and Haibo: Since you are working on deep learning using
> > GPUs, it would be appreciated if you can give some feedback.
> >
> > As far as I know that traditionally messages are transferred from GPU
> > memory to CPU memory and then transferred through TCP/IP to other nodes
> and
> > then transferred from CPU memory to GPU memory. We can easily support
> such
> > communication using the current APIs for CPU. But the transferring
> between
> > GPU and CPU would bring extra cost.
> > NVDIA has provided a technique called GPUDirect, which enables directly
> > message passing from GPU memory to network (e.g., infiniband) card. Some
> > MPI variants now use this technique. But we have switched from MPI to
> > ZeroMQ, we need to make sure that ZeroMQ supports GPUDirect and
> > Infiniband.  Do you have any investigations on this? Or how do you
> > implement the message transferring in your implementation?
> >
> > Thanks.
> >
> > regards,
> > Wei
>
>

Re: Communication between GPUs

Posted by 陈海波 <hz...@corp.netease.com>.
Hi~,
    In our preivious work of deep learning on GPU, we focus on parallel training of DNN(without convolution layer) for our speech recognition.
It is't easy to adopt model parallelization strategy to speed training.And with the consideration of transfering big model between node and 
node,so we decided to use a single node with multi-GPU for training. And we use CUDA APIs For transfering messages between GPU and GPU
(both support GPUDirect and without GPUDirect).In our plan,it does not exits the problem of multi node communication.
Some discussions:
1) We think cudamat is a good choice for linear algebra computataion. But we find you use mshadow libraries to develop singa.
   As we know,mshadow can provide a GPU Matrix/Tensor Template libary,and it also support some simple interfaces for Multi-GPU.So we think
we can go on using mshadow for linear algebra computataion on both GPU and CPU.
2)  we consult NVIDIA's officials and they give an answer that they are not sure whether ZeroMQ supports GPUDirect and Infiniband or not,and
   they suggest us adopting OpenMPI.

   And I think we can discuss more.

thanks~

在2015-04-21 12:05:19,陈海波<hz...@corp.netease.com>写道:
> As planed in the previous discussion, we are stabilizing the APIs of each
> module.
> One problem I am encountered is about the communication APIs to support
> GPUs.
> 
> We can use some libraries like cudamat (https://code.google.com/p/cudamat/)
> for linear algebra computation. Hence, the APIs on computation would almost
> the same as those for CPU. But I have poor knowledge on the communication
> between GPU and CPU, and the communication between GPUs.
> I am asking you for your suggestions.
> 
> Wangyuan, Wuwei and Haibo: Since you are working on deep learning using
> GPUs, it would be appreciated if you can give some feedback.
> 
> As far as I know that traditionally messages are transferred from GPU
> memory to CPU memory and then transferred through TCP/IP to other nodes and
> then transferred from CPU memory to GPU memory. We can easily support such
> communication using the current APIs for CPU. But the transferring between
> GPU and CPU would bring extra cost.
> NVDIA has provided a technique called GPUDirect, which enables directly
> message passing from GPU memory to network (e.g., infiniband) card. Some
> MPI variants now use this technique. But we have switched from MPI to
> ZeroMQ, we need to make sure that ZeroMQ supports GPUDirect and
> Infiniband.  Do you have any investigations on this? Or how do you
> implement the message transferring in your implementation?
> 
> Thanks.
> 
> regards,
> Wei