You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/21 04:48:16 UTC

[GitHub] [arrow] tyrelr edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

tyrelr edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938


   I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function.
   
   At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way.
   1) filter kernel does a batched indexing-like operation based on bits being set in a u64.
   This can probably be re-arranged to minimize/eliminate bounds checks.
   2) sort & take kernels appear to cherrypick indexes based on another index array
   These are tricky.  We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot.
   3) csv writing & display try to iterate N columns in lock-step
   This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String...  The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated.
   
   [Edit: to be clear, I have not actually attempted any of these edits yet, so I do not have real performance numbers, just a gut feeling]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org