You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/06/30 22:53:29 UTC

[GitHub] [arrow-rs] wjones127 commented on issue #4472: Add an ExtensionType to DataType enum

wjones127 commented on issue #4472:
URL: https://github.com/apache/arrow-rs/issues/4472#issuecomment-1615258370

   > I might be missing something here, but why would it be lost, schema metadata should roundtrip over C data interface?
   
   This works well for RecordBatch, but not for an individual array transported independent from any batch. Basically, arrays themselves have no way to be tagged as an extension array, since those don't contain a field where that metadata is stored; they are only extension arrays in the context of a batch.
   
   > I feel quite strongly that only codepaths explicitly concerned with extension types should need concern themselves with them, for example the take or arithmetic kernel should not need to know about extension types.
   
   I definitely agree, and don't want to make these operations more complex than they ought to be. 
   
   If we can think of another place to put this information, I'm open to that.
   
   (A bit of a tangent, but...) In my ideal world, there would be a logical type enum and a physical type enum. Physical types would be the current `DataType`. Then logical types would be things like `String` (just one, regardless of offset size and encoding) and then a generic `ExtensionType` variant. Sort of like what Sasha was talking about a long time ago: https://lists.apache.org/thread/357z4587dczho4x1257ttf0b4o9302co


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org