You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/17 12:35:04 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2557: [Question] why need this sort in Sort.rs

Ted-Jiang opened a new issue, #2557:
URL: https://github.com/apache/arrow-datafusion/issues/2557

   @yjshen , Sorry for bothering you,  Why need `positions.sort_unstable()` below
   https://github.com/apache/arrow-datafusion/blob/807b7a5f7eb858e9f7162e1f00ffeeedd0bf2050/datafusion/core/src/physical_plan/sorts/sort.rs#L460-L469
   
   I think we should keep the order of the adding index in `positions `, so we can keep the order from `indices `, the result has correctly order.
   https://github.com/apache/arrow-datafusion/blob/807b7a5f7eb858e9f7162e1f00ffeeedd0bf2050/datafusion/core/src/physical_plan/sorts/sort.rs#L433-L435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2557: [Question] why need this sort in Sort.rs

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2557:
URL: https://github.com/apache/arrow-datafusion/issues/2557#issuecomment-1131588201

    i mean in this situation
   ![image](https://user-images.githubusercontent.com/37145547/169286166-ce1d1933-806b-4c1c-96f3-442f916fc86e.png)
   `positions.sort_unstable()`;  will get wrong order


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2557: [Question] why need this sort in Sort.rs

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2557:
URL: https://github.com/apache/arrow-datafusion/issues/2557#issuecomment-1131589203

   @yjshen  But i think you pre-sort all batches during `insert`, so this situation not exist。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yjshen commented on issue #2557: [Question] why need this sort in Sort.rs

Posted by GitBox <gi...@apache.org>.
yjshen commented on issue #2557:
URL: https://github.com/apache/arrow-datafusion/issues/2557#issuecomment-1131011698

   `group_indices` takes adjacent records in the same batch as input, and does best-effort grouping to make slices instead of individual positions for better `extend` performance (extend a range of records rather than individual record at a time).
   
    `sort_unstable` is useful since the positions are generated from `lexsort` which is unstable itself. there would be a possibility that records with the same sort key appear randomly after `lexsort`.  but `extend` takes start pos and length as input, so a sort to make records with the same sort key appears sequentially is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org