You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "randolf-scholz (via GitHub)" <gi...@apache.org> on 2023/04/18 15:17:39 UTC

[GitHub] [arrow] randolf-scholz opened a new issue, #35209: Add a type alias for `pa.dictionary(pa.int32(), pa.string())`

randolf-scholz opened a new issue, #35209:
URL: https://github.com/apache/arrow/issues/35209

   ### Describe the enhancement requested
   
   Currently, there is no type alias for `dictionary` class.
   
   https://github.com/apache/arrow/blob/1deb740e02fa928e60ce611790d9dff2d1a6077e/python/pyarrow/types.pxi#L4739-L4793
   
   Given that currently, there seems to be optimized kernels only for `dictionary[int32, string]`, I'd suggest adding this as a type alias. Having a string alias is nice, particularly when one wants to save table schemas as config files.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35209: Add a type alias for `pa.dictionary(pa.int32(), pa.string())`

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35209:
URL: https://github.com/apache/arrow/issues/35209#issuecomment-1516146714

   > (*) I noticed horrible performance when trying to load CSV data using `column_types={"col": pa.dictionary(pa.int64(), pa.string())}` or `column_types={"col": pa.dictionary(pa.int16(), **pa.string())}` or `column_types={"col": pa.dictionary(pa.uint32(), pa.string())}`. Only `int32` seems to behave as expected, performance-wise. If this is a bug I can open another issue.
   
   This seems something to fix. Yes, if you could open a separate issue for this, that would be great.
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org