You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/11 16:06:00 UTC
[jira] [Commented] (ARROW-3997) [C++] [Doc] Clarify dictionary
encoding integer signedness (and width?)
[ https://issues.apache.org/jira/browse/ARROW-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717441#comment-16717441 ]
Wes McKinney commented on ARROW-3997:
-------------------------------------
See comment at
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L232
The intent was to support any integer index, with the constraint of being positive. I would be in favor with constraining to signed integer types (from 8 to 64 bits) until there is demand / use case for unsigned integers
> [C++] [Doc] Clarify dictionary encoding integer signedness (and width?)
> -----------------------------------------------------------------------
>
> Key: ARROW-3997
> URL: https://issues.apache.org/jira/browse/ARROW-3997
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Documentation, Format
> Affects Versions: 0.11.1
> Reporter: Antoine Pitrou
> Priority: Major
>
> The Arrow spec states that a dictionary-encoded array uses int32 indices. Signed or unsigned? The spec doesn't say.
> Also, the C++ implementation supports all kinds of integers as indices (8- to 64-bit, signed and unsigned). I wonder if we should at least mandate a specific signedness.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)