You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dimitri Vorona <al...@googlemail.com> on 2018/02/05 16:37:57 UTC
Delta dictionaries: implementation
Hi,
ARROW-1727 added format support for delta dictionaries. It makes possible
to interleave record batches which contain dictionary encoded field with
delta dictionary batches which add new dictionary entries.
As far as I can see there is not implementation of this feature in cpp,
yet. Is anyone working on it right now? Are there any ideas what the API
should look like?
Cheers,
Dimitri.
Re: Delta dictionaries: implementation
Posted by Brian Hulette <br...@ccri.com>.
Glad to see someone is interested in dictionary deltas!
The Javascript implementation does handle deltas, but we only have an
arrow reader implementation at the moment, which can handle deltas
pretty trivially (here's the relevant line in the JS IPC reader:
https://github.com/apache/arrow/blob/master/js/src/ipc/reader/vector.ts#L56).
I haven't put any thought into what the writer API for deltas should
look like - Paul Taylor has been working on a JS writer so he may have
some thoughts, but I'm not sure.
If you're only interested in deltas so that you don't have to collect
every distinct value before you can start sending data you could also
consider using the file format
(https://github.com/apache/arrow/blob/master/format/IPC.md#file-format).
When using the file format, it's perfectly fine to just send your
dictionary batches at the end of the message, after sending record
batches, since it's intended for random access. So if it's ok for your
reader to not have knowledge of the dictionary values until it's
received all the data, that may work for you.
Brian
On 02/05/2018 04:10 PM, Wes McKinney wrote:
> hi Dimitri,
>
> No one is working on it yet in C++, nor have we worked on any API
> design sketches. I think there may be some work in JavaScript.
>
> Please feel free to open some JIRAs and propose APIs / behavior or
> work on an implementation.
>
> Thanks,
> Wes
>
> On Mon, Feb 5, 2018 at 11:37 AM, Dimitri Vorona <al...@googlemail.com> wrote:
>> Hi,
>>
>> ARROW-1727 added format support for delta dictionaries. It makes possible
>> to interleave record batches which contain dictionary encoded field with
>> delta dictionary batches which add new dictionary entries.
>>
>> As far as I can see there is not implementation of this feature in cpp,
>> yet. Is anyone working on it right now? Are there any ideas what the API
>> should look like?
>>
>> Cheers,
>> Dimitri.
Re: Delta dictionaries: implementation
Posted by Wes McKinney <we...@gmail.com>.
hi Dimitri,
No one is working on it yet in C++, nor have we worked on any API
design sketches. I think there may be some work in JavaScript.
Please feel free to open some JIRAs and propose APIs / behavior or
work on an implementation.
Thanks,
Wes
On Mon, Feb 5, 2018 at 11:37 AM, Dimitri Vorona <al...@googlemail.com> wrote:
> Hi,
>
> ARROW-1727 added format support for delta dictionaries. It makes possible
> to interleave record batches which contain dictionary encoded field with
> delta dictionary batches which add new dictionary entries.
>
> As far as I can see there is not implementation of this feature in cpp,
> yet. Is anyone working on it right now? Are there any ideas what the API
> should look like?
>
> Cheers,
> Dimitri.