You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Justin Talbot (Jira)" <ji...@apache.org> on 2021/03/26 18:36:00 UTC

[jira] [Created] (ARROW-12101) [Format] Consider adding int0 and other small integer types for more efficient Dictionary encoding

Justin Talbot created ARROW-12101:
-------------------------------------

             Summary: [Format] Consider adding int0 and other small integer types for more efficient Dictionary encoding
                 Key: ARROW-12101
                 URL: https://issues.apache.org/jira/browse/ARROW-12101
             Project: Apache Arrow
          Issue Type: Wish
          Components: Format
            Reporter: Justin Talbot


I often come across the need to store single-valued columns. The current Arrow format doesn't have an efficient way to represent these, I believe. One possible improvement would be to introduce an int0 type (where all values are 0) that, like null, does not have a buffer allocated. Then this could be used as an index into a Dictionary with a single value.

For low cardinality columns, I also often find myself wishing for int1, int2, and int4 types to use as an index.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)