You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2017/08/07 18:15:41 UTC

[DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

hi all,

A group of companies have created a project called the GPU Open
Analytics Initiative (GOAI), with the purpose of creating open source
software and specifications for analytics on GPU.

So far, they have focused on building a "GPU Data Frame", which is
effectively putting Arrow data on the GPU:

https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
http://gpuopenanalytics.com/

Shared memory IPC and analytics on Arrow data beyond the CPU are
definitely in scope for the Arrow project, so we should look for ways
to collaborate and help each other. I am sure this will not be the
last time that someone needs to use Arrow memory with GPUs, so it
would be useful for the community to develop memory management and
utility code to assist with using Arrow in a mixed-device setting.

I am not sure how to best proceed but wanted to make everyone aware of
GOAI and look for opportunities to grow the Arrow community.

Thanks,
Wes

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Posted by Wes McKinney <we...@gmail.com>.
I created a number of JIRAs that would make sense to implement
initially and attached them to
https://issues.apache.org/jira/browse/ARROW-1055. The first goal will
be to simplify transmitting a sequence of Arrow record batches that
live on the GPU with zero-copy IPC from one process to another.

Once we have simplified tools for GPU IPC and memory management, then
adding this to Plasma may not be too difficult, but it will require
some additions to the client-server protocol. For example, user
requests memory to be allocated on a particular GPU -- Plasma calls
cudaMalloc and then exports the memory for use in other processes

On Wed, Aug 16, 2017 at 5:22 PM, Wes McKinney <we...@gmail.com> wrote:
> To motivate the use case, the folks in GOAI are building applications
> with multiple components which interact with the GPU.
>
> For example: MapD (GPU database) allocates GPU memory, hands off to
> Python. Python can then decref and cudaFree on the device. Perhaps
> Python then uses cudaMalloc and wishes to transfer ownership of this
> new memory back to MapD.
>
> I might consider implementing a more generic device numbering schema
> (so CPU shared memory is always device 0). This will enable
> applications to "plug in" other kinds of devices, which can be
> discovered by other users of the Plasma client. Therefore, the Plasma
> server might have no direct knowledge of GPUs except how to copy
> memory to it and how to allocate and free memory (and how to enable
> clients to obtain a pointer to the virtual address spaces of
> interest).
>
> On Wed, Aug 16, 2017 at 5:09 PM, Robert Nishihara
> <ro...@gmail.com> wrote:
>> That makes a lot of sense. In some contexts it could make sense to run
>> multiple Plasma stores per machine (possibly for different devices or
>> different NUMA zones). Though that could make it slightly harder to take
>> advantage of faster GPU to GPU communication.
>>
>> On Wed, Aug 16, 2017 at 2:01 PM Philipp Moritz <pc...@gmail.com> wrote:
>>
>>> One observation here is that as far as I know shared memory is not
>>> typically used between multiple gpus and on a single gpu there is already a
>>> unified shared address space that each cuda thread can access.
>>>
>>> One reasonable extension of the APIs and facilities given these limitations
>>> would be the following:
>>>
>>> 1.) Extend plasma::Create to take an optional flag (CPU/HOST/SHARED, GPU0,
>>> GPU1, etc.) which allocates the object on the desired device (host shared
>>> memory,  gpu 0, gpu 1, etc.)
>>>
>>> 2.) Extend plasma::Get to take the same flag and will transparently copy
>>> the data to the desired device as neccessary and return a pointer that is
>>> valid on the specified device.
>>>
>>> 3.) Extend the status and notification APIs to account for these changes
>>> and also the object lifetime tracking.
>>>
>>> I wonder if people would find that useful, let me know about your thoughts!
>>> Ideally we would also have some integration into say TensorFlow or other
>>> deep learning frameworks that can make use of these capabilities (the way
>>> we typically use gpus in Ray at the moment is mostly through TensorFlow by
>>> feeding data through placeholders, which has some performance bottlenecks
>>> but so far we mostly managed to work around them).
>>>
>>>
>>>
>>> On Wed, Aug 16, 2017 at 1:01 PM, Wes McKinney <we...@gmail.com> wrote:
>>>
>>> > One idea is whether the Plasma object store could be extended to
>>> > support devices other than POSIX shared memory, like GPU device memory
>>> > (or multiple GPUs on a single host).
>>> >
>>> > Philipp or Robert or any of the people who know the Plasma code best,
>>> > any idea how this might be approached? It would have to be developed
>>> > as an optional extension so that users without e.g. a CUDA
>>> > installation don't have to bother with nvcc (which is proprietary) or
>>> > the CUDA runtime libraries.
>>> >
>>> > - Wes
>>> >
>>> > On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > > hi all,
>>> > >
>>> > > A group of companies have created a project called the GPU Open
>>> > > Analytics Initiative (GOAI), with the purpose of creating open source
>>> > > software and specifications for analytics on GPU.
>>> > >
>>> > > So far, they have focused on building a "GPU Data Frame", which is
>>> > > effectively putting Arrow data on the GPU:
>>> > >
>>> > > https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
>>> > > http://gpuopenanalytics.com/
>>> > >
>>> > > Shared memory IPC and analytics on Arrow data beyond the CPU are
>>> > > definitely in scope for the Arrow project, so we should look for ways
>>> > > to collaborate and help each other. I am sure this will not be the
>>> > > last time that someone needs to use Arrow memory with GPUs, so it
>>> > > would be useful for the community to develop memory management and
>>> > > utility code to assist with using Arrow in a mixed-device setting.
>>> > >
>>> > > I am not sure how to best proceed but wanted to make everyone aware of
>>> > > GOAI and look for opportunities to grow the Arrow community.
>>> > >
>>> > > Thanks,
>>> > > Wes
>>> >
>>>

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Posted by Wes McKinney <we...@gmail.com>.
To motivate the use case, the folks in GOAI are building applications
with multiple components which interact with the GPU.

For example: MapD (GPU database) allocates GPU memory, hands off to
Python. Python can then decref and cudaFree on the device. Perhaps
Python then uses cudaMalloc and wishes to transfer ownership of this
new memory back to MapD.

I might consider implementing a more generic device numbering schema
(so CPU shared memory is always device 0). This will enable
applications to "plug in" other kinds of devices, which can be
discovered by other users of the Plasma client. Therefore, the Plasma
server might have no direct knowledge of GPUs except how to copy
memory to it and how to allocate and free memory (and how to enable
clients to obtain a pointer to the virtual address spaces of
interest).

On Wed, Aug 16, 2017 at 5:09 PM, Robert Nishihara
<ro...@gmail.com> wrote:
> That makes a lot of sense. In some contexts it could make sense to run
> multiple Plasma stores per machine (possibly for different devices or
> different NUMA zones). Though that could make it slightly harder to take
> advantage of faster GPU to GPU communication.
>
> On Wed, Aug 16, 2017 at 2:01 PM Philipp Moritz <pc...@gmail.com> wrote:
>
>> One observation here is that as far as I know shared memory is not
>> typically used between multiple gpus and on a single gpu there is already a
>> unified shared address space that each cuda thread can access.
>>
>> One reasonable extension of the APIs and facilities given these limitations
>> would be the following:
>>
>> 1.) Extend plasma::Create to take an optional flag (CPU/HOST/SHARED, GPU0,
>> GPU1, etc.) which allocates the object on the desired device (host shared
>> memory,  gpu 0, gpu 1, etc.)
>>
>> 2.) Extend plasma::Get to take the same flag and will transparently copy
>> the data to the desired device as neccessary and return a pointer that is
>> valid on the specified device.
>>
>> 3.) Extend the status and notification APIs to account for these changes
>> and also the object lifetime tracking.
>>
>> I wonder if people would find that useful, let me know about your thoughts!
>> Ideally we would also have some integration into say TensorFlow or other
>> deep learning frameworks that can make use of these capabilities (the way
>> we typically use gpus in Ray at the moment is mostly through TensorFlow by
>> feeding data through placeholders, which has some performance bottlenecks
>> but so far we mostly managed to work around them).
>>
>>
>>
>> On Wed, Aug 16, 2017 at 1:01 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>> > One idea is whether the Plasma object store could be extended to
>> > support devices other than POSIX shared memory, like GPU device memory
>> > (or multiple GPUs on a single host).
>> >
>> > Philipp or Robert or any of the people who know the Plasma code best,
>> > any idea how this might be approached? It would have to be developed
>> > as an optional extension so that users without e.g. a CUDA
>> > installation don't have to bother with nvcc (which is proprietary) or
>> > the CUDA runtime libraries.
>> >
>> > - Wes
>> >
>> > On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>> > > hi all,
>> > >
>> > > A group of companies have created a project called the GPU Open
>> > > Analytics Initiative (GOAI), with the purpose of creating open source
>> > > software and specifications for analytics on GPU.
>> > >
>> > > So far, they have focused on building a "GPU Data Frame", which is
>> > > effectively putting Arrow data on the GPU:
>> > >
>> > > https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
>> > > http://gpuopenanalytics.com/
>> > >
>> > > Shared memory IPC and analytics on Arrow data beyond the CPU are
>> > > definitely in scope for the Arrow project, so we should look for ways
>> > > to collaborate and help each other. I am sure this will not be the
>> > > last time that someone needs to use Arrow memory with GPUs, so it
>> > > would be useful for the community to develop memory management and
>> > > utility code to assist with using Arrow in a mixed-device setting.
>> > >
>> > > I am not sure how to best proceed but wanted to make everyone aware of
>> > > GOAI and look for opportunities to grow the Arrow community.
>> > >
>> > > Thanks,
>> > > Wes
>> >
>>

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Posted by Robert Nishihara <ro...@gmail.com>.
That makes a lot of sense. In some contexts it could make sense to run
multiple Plasma stores per machine (possibly for different devices or
different NUMA zones). Though that could make it slightly harder to take
advantage of faster GPU to GPU communication.

On Wed, Aug 16, 2017 at 2:01 PM Philipp Moritz <pc...@gmail.com> wrote:

> One observation here is that as far as I know shared memory is not
> typically used between multiple gpus and on a single gpu there is already a
> unified shared address space that each cuda thread can access.
>
> One reasonable extension of the APIs and facilities given these limitations
> would be the following:
>
> 1.) Extend plasma::Create to take an optional flag (CPU/HOST/SHARED, GPU0,
> GPU1, etc.) which allocates the object on the desired device (host shared
> memory,  gpu 0, gpu 1, etc.)
>
> 2.) Extend plasma::Get to take the same flag and will transparently copy
> the data to the desired device as neccessary and return a pointer that is
> valid on the specified device.
>
> 3.) Extend the status and notification APIs to account for these changes
> and also the object lifetime tracking.
>
> I wonder if people would find that useful, let me know about your thoughts!
> Ideally we would also have some integration into say TensorFlow or other
> deep learning frameworks that can make use of these capabilities (the way
> we typically use gpus in Ray at the moment is mostly through TensorFlow by
> feeding data through placeholders, which has some performance bottlenecks
> but so far we mostly managed to work around them).
>
>
>
> On Wed, Aug 16, 2017 at 1:01 PM, Wes McKinney <we...@gmail.com> wrote:
>
> > One idea is whether the Plasma object store could be extended to
> > support devices other than POSIX shared memory, like GPU device memory
> > (or multiple GPUs on a single host).
> >
> > Philipp or Robert or any of the people who know the Plasma code best,
> > any idea how this might be approached? It would have to be developed
> > as an optional extension so that users without e.g. a CUDA
> > installation don't have to bother with nvcc (which is proprietary) or
> > the CUDA runtime libraries.
> >
> > - Wes
> >
> > On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <we...@gmail.com>
> wrote:
> > > hi all,
> > >
> > > A group of companies have created a project called the GPU Open
> > > Analytics Initiative (GOAI), with the purpose of creating open source
> > > software and specifications for analytics on GPU.
> > >
> > > So far, they have focused on building a "GPU Data Frame", which is
> > > effectively putting Arrow data on the GPU:
> > >
> > > https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
> > > http://gpuopenanalytics.com/
> > >
> > > Shared memory IPC and analytics on Arrow data beyond the CPU are
> > > definitely in scope for the Arrow project, so we should look for ways
> > > to collaborate and help each other. I am sure this will not be the
> > > last time that someone needs to use Arrow memory with GPUs, so it
> > > would be useful for the community to develop memory management and
> > > utility code to assist with using Arrow in a mixed-device setting.
> > >
> > > I am not sure how to best proceed but wanted to make everyone aware of
> > > GOAI and look for opportunities to grow the Arrow community.
> > >
> > > Thanks,
> > > Wes
> >
>

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Posted by Philipp Moritz <pc...@gmail.com>.
One observation here is that as far as I know shared memory is not
typically used between multiple gpus and on a single gpu there is already a
unified shared address space that each cuda thread can access.

One reasonable extension of the APIs and facilities given these limitations
would be the following:

1.) Extend plasma::Create to take an optional flag (CPU/HOST/SHARED, GPU0,
GPU1, etc.) which allocates the object on the desired device (host shared
memory,  gpu 0, gpu 1, etc.)

2.) Extend plasma::Get to take the same flag and will transparently copy
the data to the desired device as neccessary and return a pointer that is
valid on the specified device.

3.) Extend the status and notification APIs to account for these changes
and also the object lifetime tracking.

I wonder if people would find that useful, let me know about your thoughts!
Ideally we would also have some integration into say TensorFlow or other
deep learning frameworks that can make use of these capabilities (the way
we typically use gpus in Ray at the moment is mostly through TensorFlow by
feeding data through placeholders, which has some performance bottlenecks
but so far we mostly managed to work around them).



On Wed, Aug 16, 2017 at 1:01 PM, Wes McKinney <we...@gmail.com> wrote:

> One idea is whether the Plasma object store could be extended to
> support devices other than POSIX shared memory, like GPU device memory
> (or multiple GPUs on a single host).
>
> Philipp or Robert or any of the people who know the Plasma code best,
> any idea how this might be approached? It would have to be developed
> as an optional extension so that users without e.g. a CUDA
> installation don't have to bother with nvcc (which is proprietary) or
> the CUDA runtime libraries.
>
> - Wes
>
> On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <we...@gmail.com> wrote:
> > hi all,
> >
> > A group of companies have created a project called the GPU Open
> > Analytics Initiative (GOAI), with the purpose of creating open source
> > software and specifications for analytics on GPU.
> >
> > So far, they have focused on building a "GPU Data Frame", which is
> > effectively putting Arrow data on the GPU:
> >
> > https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
> > http://gpuopenanalytics.com/
> >
> > Shared memory IPC and analytics on Arrow data beyond the CPU are
> > definitely in scope for the Arrow project, so we should look for ways
> > to collaborate and help each other. I am sure this will not be the
> > last time that someone needs to use Arrow memory with GPUs, so it
> > would be useful for the community to develop memory management and
> > utility code to assist with using Arrow in a mixed-device setting.
> >
> > I am not sure how to best proceed but wanted to make everyone aware of
> > GOAI and look for opportunities to grow the Arrow community.
> >
> > Thanks,
> > Wes
>

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Posted by Wes McKinney <we...@gmail.com>.
One idea is whether the Plasma object store could be extended to
support devices other than POSIX shared memory, like GPU device memory
(or multiple GPUs on a single host).

Philipp or Robert or any of the people who know the Plasma code best,
any idea how this might be approached? It would have to be developed
as an optional extension so that users without e.g. a CUDA
installation don't have to bother with nvcc (which is proprietary) or
the CUDA runtime libraries.

- Wes

On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <we...@gmail.com> wrote:
> hi all,
>
> A group of companies have created a project called the GPU Open
> Analytics Initiative (GOAI), with the purpose of creating open source
> software and specifications for analytics on GPU.
>
> So far, they have focused on building a "GPU Data Frame", which is
> effectively putting Arrow data on the GPU:
>
> https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
> http://gpuopenanalytics.com/
>
> Shared memory IPC and analytics on Arrow data beyond the CPU are
> definitely in scope for the Arrow project, so we should look for ways
> to collaborate and help each other. I am sure this will not be the
> last time that someone needs to use Arrow memory with GPUs, so it
> would be useful for the community to develop memory management and
> utility code to assist with using Arrow in a mixed-device setting.
>
> I am not sure how to best proceed but wanted to make everyone aware of
> GOAI and look for opportunities to grow the Arrow community.
>
> Thanks,
> Wes