You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "askoa (via GitHub)" <gi...@apache.org> on 2023/02/12 14:00:54 UTC

[GitHub] [arrow-rs] askoa opened a new issue, #3701: `take_run` improvements

askoa opened a new issue, #3701:
URL: https://github.com/apache/arrow-rs/issues/3701

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   The current implementation of `take_run` only handles `PrimitiveArray`.  Also, it's slow as it compares the values. Extending the current approach to String and Binary values will make the solution much slower.
   
   **Describe the solution you'd like**
   Instead of run encoding taken values, we can run encode taken physical indices. This will be significantly faster for String and Binary values as we will avoid comparing values. The drawback of this approach is that in certain scenarios the output might not be efficiently run encoded. For e.g. given a `RunArray { run_ends=[2,4,6,8], values=[1,2,1,2] }` and take indices `[2,3,6,7]`, the output will be `RunArray { run_ends=[2,4], values=[2,2] }` rather than `RunArray { run_ends=[4], values=[2] }`
   
   **Describe alternatives you've considered**
   We continue with the current approach of comparing values which, in creation scenarios, will result in efficient run encoded array at the cost of performance.
   
   **Additional context**
   https://github.com/apache/arrow-rs/pull/3622#discussion_r1089826535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3701: `take_run` improvements

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3701:
URL: https://github.com/apache/arrow-rs/issues/3701#issuecomment-1443405170

   `label_issue.py` automatically added labels {'arrow'} from #3705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3701: `take_run` improvements

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3701: `take_run` improvements
URL: https://github.com/apache/arrow-rs/issues/3701


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org