You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Eduardo Ponce (Jira)" <ji...@apache.org> on 2021/04/22 18:25:00 UTC

[jira] [Comment Edited] (ARROW-11673) [C++] Casting dictionary type to use different index type

    [ https://issues.apache.org/jira/browse/ARROW-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329344#comment-17329344 ] 

Eduardo Ponce edited comment on ARROW-11673 at 4/22/21, 6:24 PM:
-----------------------------------------------------------------

I would be glad to help with this issue.

A question that naturally follows is: What is the expected behavior when casting from a larger to a smaller type and the index overflows?

Possible solution: I think that triggering an error stating that the current data does not allows such cast to occur.

If dictionary types keep track of its largest index value, there is no need to iterate through the indices when casting.


was (Author: edponce):
I would be glad to help with this issue.

A question that naturally follows is: What is the expected behavior when casting from a larger to a smaller type and the index overflows?

Possible solution: I think that triggering an error stating that the current data does not allows such cast to occur.

If dictionary types keep track of its largest index value, there is no need to iterate through the dataset when casting.

> [C++] Casting dictionary type to use different index type
> ---------------------------------------------------------
>
>                 Key: ARROW-11673
>                 URL: https://issues.apache.org/jira/browse/ARROW-11673
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> It's currently not implemented to cast from one dictionary type to another dictionary type to change the index type. 
> Small example:
> {code}
> In [2]: arr = pa.array(['a', 'b', 'a']).dictionary_encode()
> In [3]: arr.type
> Out[3]: DictionaryType(dictionary<values=string, indices=int32, ordered=0>)
> In [5]: arr.cast(pa.dictionary(pa.int8(), pa.string()))
> ...
> ArrowNotImplementedError: Unsupported cast from dictionary<values=string, indices=int32, ordered=0> to dictionary<values=string, indices=int8, ordered=0> (no available cast function for target type)
> ../src/arrow/compute/cast.cc:112  GetCastFunctionInternal(cast_options->to_type, args[0].type().get())
> {code}
> From https://stackoverflow.com/questions/66223730/how-to-change-column-datatype-with-pyarrow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)