You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dimitri Vorona <al...@googlemail.com> on 2018/02/05 16:37:57 UTC

Delta dictionaries: implementation

Hi,

ARROW-1727 added format support for delta dictionaries. It makes possible
to interleave record batches which contain dictionary encoded field with
delta dictionary batches which add new dictionary entries.

As far as I can see there is not implementation of this feature in cpp,
yet. Is anyone working on it right now? Are there any ideas what the API
should look like?

Cheers,
Dimitri.

Re: Delta dictionaries: implementation

Posted by Brian Hulette <br...@ccri.com>.
Glad to see someone is interested in dictionary deltas!

The Javascript implementation does handle deltas, but we only have an 
arrow reader implementation at the moment, which can handle deltas 
pretty trivially (here's the relevant line in the JS IPC reader: 
https://github.com/apache/arrow/blob/master/js/src/ipc/reader/vector.ts#L56). 
I haven't put any thought into what the writer API for deltas should 
look like - Paul Taylor has been working on a JS writer so he may have 
some thoughts, but I'm not sure.

If you're only interested in deltas so that you don't have to collect 
every distinct value before you can start sending data you could also 
consider using the file format 
(https://github.com/apache/arrow/blob/master/format/IPC.md#file-format). 
When using the file format, it's perfectly fine to just send your 
dictionary batches at the end of the message, after sending record 
batches, since it's intended for random access. So if it's ok for your 
reader to not have knowledge of the dictionary values until it's 
received all the data, that may work for you.

Brian


On 02/05/2018 04:10 PM, Wes McKinney wrote:
> hi Dimitri,
>
> No one is working on it yet in C++, nor have we worked on any API
> design sketches. I think there may be some work in JavaScript.
>
> Please feel free to open some JIRAs and propose APIs / behavior or
> work on an implementation.
>
> Thanks,
> Wes
>
> On Mon, Feb 5, 2018 at 11:37 AM, Dimitri Vorona <al...@googlemail.com> wrote:
>> Hi,
>>
>> ARROW-1727 added format support for delta dictionaries. It makes possible
>> to interleave record batches which contain dictionary encoded field with
>> delta dictionary batches which add new dictionary entries.
>>
>> As far as I can see there is not implementation of this feature in cpp,
>> yet. Is anyone working on it right now? Are there any ideas what the API
>> should look like?
>>
>> Cheers,
>> Dimitri.


Re: Delta dictionaries: implementation

Posted by Wes McKinney <we...@gmail.com>.
hi Dimitri,

No one is working on it yet in C++, nor have we worked on any API
design sketches. I think there may be some work in JavaScript.

Please feel free to open some JIRAs and propose APIs / behavior or
work on an implementation.

Thanks,
Wes

On Mon, Feb 5, 2018 at 11:37 AM, Dimitri Vorona <al...@googlemail.com> wrote:
> Hi,
>
> ARROW-1727 added format support for delta dictionaries. It makes possible
> to interleave record batches which contain dictionary encoded field with
> delta dictionary batches which add new dictionary entries.
>
> As far as I can see there is not implementation of this feature in cpp,
> yet. Is anyone working on it right now? Are there any ideas what the API
> should look like?
>
> Cheers,
> Dimitri.