You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/15 13:03:50 UTC

[GitHub] [arrow] bkietz commented on pull request #8462: ARROW-10145: [C++][Dataset] Assert integer overflow in partitioning falls back to string

bkietz commented on pull request #8462:
URL: https://github.com/apache/arrow/pull/8462#issuecomment-709310353


   I think it's not worthwhile to fallback to `int64`. The only use cases I can think of for long integers in a partition column are
   - A low cardinality* set of externally derived identifiers which happen to be numeric. Arithmetic operations on such identifiers would not be meaningful, so there is no benefit to making that column integral
   - Year stored as nanoseconds since the epoch (or similar int-stored-as-multiple, int-stored-with-offset situation). In this situation I'd advise the user to derive a new column which stores the year more meaningfully
   
   * If the column has high cardinality then it's not really a candidate for partitioning in the first place.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org