You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Alex Baden <al...@omnisci.com> on 2020/05/24 22:10:15 UTC

Re: Dictionary Memo serialization for CUDA IPC

Hi Wes,

Thanks for the reply. I scanned through JIRA but it didn't look like
this was filed or anyone was working on it, so I filed
https://issues.apache.org/jira/browse/ARROW-8927. I have a branch and
things seem to look pretty good, was able to duplicate
TestCudaArrowIpc_BasicWriteRead but provide a record batch with
dictionaries.


Alex

On Thu, Apr 16, 2020 at 12:51 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Alex,
>
> I haven't looked at the details of your code, but having APIs that
> "collapse" the process of writing a single record batch along with its
> dictionaries as a sequence of end-to-end IPC messages (and then having
> a function to reverse that process to reconstruct the record batch)
> and making that work for writing to GPU memory (using the new device
> API) seems reasonable to me. There's a bit of refactoring that would
> need to take place to be able to reuse certain code paths relating to
> dictionary batch handling. Note also that we're due to implement delta
> dictionaries and dictionary replacements so we might want to take all
> of these needs into account to reduce the amount of code churn that
> takes place.
>
> - Wes
>
> On Thu, Apr 16, 2020 at 1:44 PM Alex Baden <al...@omnisci.com> wrote:
> >
> > Hi all,
> >
> > OmniSci (formerly MapD) has been a long time user of Arrow for IPC
> > serialization and mem sharing of query results, primarily through our
> > python connector. We recently upgraded from Arrow 0.13 to Arrow 0.16.
> > This required us to change our Arrow conversion routines to handle the
> > new DictionaryMemo for serializing dictionaries. For CPU, this was
> > fairly easy as I was able to just write the record batch stream using
> > `arrow::ipc::WriteRecordBatchStream` (and read it using
> > `RecordBatchStreamReader` on the client). For GPU/CUDA, however, I did
> > not see a way to serialize the dictionary alongside the CUDA data and
> > wrap that in a single "object" (the semantics of which probably need
> > to be broken down, which I will do in a second). So, I came up with
> > our own: https://github.com/omnisci/omniscidb/blob/4ab6622bd0ee15e478bff4263f083ab761fc965c/QueryEngine/ArrowResultSetConverter.cpp#L219
> >
> > Essentially, I assemble a RecordBatch with the dictionaries I want to
> > serialize and call WriteRecordBatchStream to serialize into a CPU IPC
> > stream, which I copy to CPU shared memory. I then serialize the GPU
> > record batch using SerializeRecordBatch into a CUDABuffer. The
> > CudaBuffer is exported for IPC sharing, and I send both memory handles
> > (CPU and GPU) over to the client. The client then has to read the
> > RecordBatch containing the dictionaries and place the dictionaries
> > into a DictionaryMemo, which is used to read the record batches from
> > GPU. The process of building the DictionaryMemo on the client is here:
> > https://github.com/omnisci/omniscidb/blob/master/Tests/ArrowIpcIntegrationTest.cpp#L380
> >
> > This seems to work ok, at least for C++, but I am interested in making
> > it more compact and possibly contributing some or all to mainline
> > Arrow. Therefore, I have two questions:
> > 1) Does this look like a reasonable way to go about handling a
> > serialized RecordBatch in CUDA (that is, separate the dictionaries and
> > return two objects, or a single object holding two handles)?
> > 2) Is this something that the Arrow community would be interested in
> > seeing contributed in whatever form we agree upon for (1)?
> >
> > Thanks,
> > Alex