You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/11 16:06:00 UTC

[jira] [Commented] (ARROW-3997) [C++] [Doc] Clarify dictionary encoding integer signedness (and width?)

    [ https://issues.apache.org/jira/browse/ARROW-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717441#comment-16717441 ] 

Wes McKinney commented on ARROW-3997:
-------------------------------------

See comment at

https://github.com/apache/arrow/blob/master/format/Schema.fbs#L232

The intent was to support any integer index, with the constraint of being positive. I would be in favor with constraining to signed integer types (from 8 to 64 bits) until there is demand / use case for unsigned integers

> [C++] [Doc] Clarify dictionary encoding integer signedness (and width?)
> -----------------------------------------------------------------------
>
>                 Key: ARROW-3997
>                 URL: https://issues.apache.org/jira/browse/ARROW-3997
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Documentation, Format
>    Affects Versions: 0.11.1
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> The Arrow spec states that a dictionary-encoded array uses int32 indices. Signed or unsigned? The spec doesn't say.
> Also, the C++ implementation supports all kinds of integers as indices (8- to 64-bit, signed and unsigned). I wonder if we should at least mandate a specific signedness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)