You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "kylebarron (via GitHub)" <gi...@apache.org> on 2023/03/03 16:56:01 UTC

[GitHub] [arrow] kylebarron opened a new issue, #34433: [Python]: Possible to evaluate `pyarrow.compute.Expression` without filter?

kylebarron opened a new issue, #34433:
URL: https://github.com/apache/arrow/issues/34433

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   I'm unsure whether this should be categorized as "usage" or "enhancement". 
   
   I've read through the [high-level compute functions doc](https://arrow.apache.org/docs/python/compute.html), as well as the [compute API doc page](https://arrow.apache.org/docs/python/api/compute.html), and also looked through `compute.py` and `_compute.pyx`.
   
   Is there a pyarrow compute API for evaluating an expression against a table or array, receiving _indices_ as output? All the docs seem to use `table.filter`, whereas I'm looking for something like [`numpy.where`](https://numpy.org/doc/stable/reference/generated/numpy.where.html), where I can use `where` and then `table.take` in two different steps instead of one. Presumably `filter` is already doing a "where" then "take" under the hood?
   
   ```py
   import pyarrow.compute as pc
   import pyarrow as pa
   
   table = pa.Table.from_arrays([pa.array([1, 2, 3, 4])], names=["a"])
   expr = pc.field('a') == 2
   
   # does this exist?
   table.evaluate(expr)
   # Expected output BooleanArray:
   # pa.array([False, True, False, False])
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34433: [Python]: Possible to evaluate `pyarrow.compute.Expression` without filter?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34433:
URL: https://github.com/apache/arrow/issues/34433#issuecomment-1454310666

   +1 for exposing something limited to scalar (and in the future scalar-window) expressions.  I've recently come to the conclusion (I think it was @ianmcook that explained this to me) that there is really no such thing as an "aggregate expression" and only "aggregate function" makes sense.  The same probably holds true for vector functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34433: [Python]: Possible to evaluate `pyarrow.compute.Expression` without filter?

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34433:
URL: https://github.com/apache/arrow/issues/34433#issuecomment-1454042059

   The actual kernel that can already do this, combined with the comparison kernel, is `indices_nonzero` (in numpy this is called `np.nonzero`, `np.where` is a bit different I think):
   
   ```
   >>> pc.indices_nonzero(pc.equal(table['a'], 2))
   <pyarrow.lib.UInt64Array object at 0x7f53a9490e20>
   [
     1
   ]
   ```
   
   This can then be used with `take`.
   
   Now, this works on actual data, not with an expression. 
   
   I have been recently thinking if we want something like an `expr.evaluate(..)` function, where you can pass a table-like object, and the expression gets evaluated against that data. 
   But I am not sure if we have the tools for this already in C++ (especially since those expressions can mix scalar (element-wise), aggregation and vector functions, it's not necessarily a simple mapping to an ExecNode). If we limit it to just scalar expressions, we have `ExecuteScalarExpression` in C++ that could be exposed. In the above example, that would work for the filter expression, but not if you already include the indices_nonzero in the expression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mariosasko commented on issue #34433: [Python]: Possible to evaluate `pyarrow.compute.Expression` without filter?

Posted by "mariosasko (via GitHub)" <gi...@apache.org>.
mariosasko commented on issue #34433:
URL: https://github.com/apache/arrow/issues/34433#issuecomment-1453960121

   We could also use this in HF Datasets to support filtering with Expressions in `Dataset.filter`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org