You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Justin Talbot (Jira)" <ji...@apache.org> on 2021/03/26 18:36:00 UTC
[jira] [Created] (ARROW-12101) [Format] Consider adding int0 and
other small integer types for more efficient Dictionary encoding
Justin Talbot created ARROW-12101:
-------------------------------------
Summary: [Format] Consider adding int0 and other small integer types for more efficient Dictionary encoding
Key: ARROW-12101
URL: https://issues.apache.org/jira/browse/ARROW-12101
Project: Apache Arrow
Issue Type: Wish
Components: Format
Reporter: Justin Talbot
I often come across the need to store single-valued columns. The current Arrow format doesn't have an efficient way to represent these, I believe. One possible improvement would be to introduce an int0 type (where all values are 0) that, like null, does not have a buffer allocated. Then this could be used as an index into a Dictionary with a single value.
For low cardinality columns, I also often find myself wishing for int1, int2, and int4 types to use as an index.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)