You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Christian Lorentzen (Jira)" <ji...@apache.org> on 2022/12/04 19:43:00 UTC

[jira] [Commented] (ARROW-14314) [C++] Sorting dictionary array not implemented

    [ https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643067#comment-17643067 ] 

Christian Lorentzen commented on ARROW-14314:
---------------------------------------------

Hi
I recently stumbled over this missing feature.
It is not 100% clear to me if the intent is to sort by values or by the indices. For example

values/dict = ["c", "a", "b"]
indices = [2, 0, 2, 1, 0]
decoded = ["b", "c", "b", "a", "c"]

Will this be sorted in ascending order as
A) orderd by dictionary
indices = [0, 0, 1, 2, 2]
decoded = ["c", "c", "a", "b", "b"]
B) orderd by decoded values
indices = [1, 2, 2, 0, 0]
decoded = ["a", b", "b", "c", "c"]

My hope is for A) such that the order of the dict in the DictionaryArray has a meaning.

> [C++] Sorting dictionary array not implemented
> ----------------------------------------------
>
>                 Key: ARROW-14314
>                 URL: https://issues.apache.org/jira/browse/ARROW-14314
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Neal Richardson
>            Assignee: Ariana Villegas
>            Priority: Major
>              Labels: kernel, pull-request-available
>             Fix For: 11.0.0
>
>          Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> From R, taking the stock {{mtcars}} dataset and giving it a dictionary type column:
> {code}
> mtcars %>% 
>   mutate(cyl = as.factor(cyl)) %>% 
>   Table$create() %>% 
>   arrange(cyl) %>% 
>   collect()
> Error: Type error: Sorting not supported for type dictionary<values=string, indices=int8, ordered=0>
> ../src/arrow/compute/kernels/vector_array_sort.cc:427  VisitTypeInline(type, this)
> ../src/arrow/compute/kernels/vector_sort.cc:148  GetArraySorter(*physical_type_)
> ../src/arrow/compute/kernels/vector_sort.cc:1206  sorter.Sort()
> ../src/arrow/compute/api_vector.cc:259  CallFunction("sort_indices", {datum}, &options, ctx)
> ../src/arrow/compute/exec/order_by_impl.cc:53  SortIndices(table, options_, ctx_)
> ../src/arrow/compute/exec/sink_node.cc:292  impl_->DoFinish()
> ../src/arrow/compute/exec/exec_plan.cc:297  iterator_.Next()
> ../src/arrow/record_batch.cc:318  ReadNext(&batch)
> ../src/arrow/record_batch.cc:329  ReadAll(&batches)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)