You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ben Kietzman <be...@rstudio.com> on 2018/12/19 23:51:08 UTC
Dictionary with repeated values?
Is it legal to create a DictionaryType whose dictionary has repeated
values?
Re: Dictionary with repeated values?
Posted by Wes McKinney <we...@gmail.com>.
The way that dictionary encoding is implemented in C++ (with
DictionaryType, DictionaryArray) is a construct particular to the
library.
At the protocol level, dictionary encoding is a property of field at
some level of a schema tree [1].
The dictionary itself is a record batch with a single field/column [2]
So based on the protocol there is no requirement for uniqueness in the
dictionary. I would say it would be preferable for implementations to
avoid constructing dictionaries with duplicates, though.
- Wes
[1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L226
[2]: https://github.com/apache/arrow/blob/master/format/Message.fbs#L71
On Wed, Dec 19, 2018 at 5:51 PM Ben Kietzman <be...@rstudio.com> wrote:
>
> Is it legal to create a DictionaryType whose dictionary has repeated
> values?