You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/26 08:46:54 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #35301: [Python] Allow passing in a mask when creating a scanner

jorisvandenbossche commented on issue #35301:
URL: https://github.com/apache/arrow/issues/35301#issuecomment-1523022843

   Given you want a positional delete, would this rather be a "take" operation than a "filter". I know this is essentially the same (under the hood, filter an array also does a "take" of the required values), but conceptually for a Dataset this might be different. A filter can be defined with an expression, but a "take" is always with actual materialized values. And so we already have a `Dataset.take()` method that does that.
   
   So even if you start with a boolean filter, you should be able to already use `take()` by converting the boolean mask to indices with `pyarrow.compute.indices_nonzero`. 
   
   I am not fully sure how `Scanner::TakeRows` works given that positional indices depend on the order that data is scanned. I assume it follows the order of the actual vector of fragments.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org