You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kirill Lykov (Jira)" <ji...@apache.org> on 2020/10/06 14:46:00 UTC

[jira] [Updated] (ARROW-10197) [Gandiva][python] Execute expression on filtered data

     [ https://issues.apache.org/jira/browse/ARROW-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirill Lykov updated ARROW-10197:
---------------------------------
    Description: 
Looks like there is no way to execute an expression on filtered data in python. 
 Basically, I cannot pass `SelectionVector` to projector's `evaluate` method

```python
 import pyarrow as pa
 import pyarrow.gandiva as gandiva

table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
                                   pa.array([5., 45., 36., 73.,
                                             83., 23., 76.])],
                                  ['a', 'b'])

builder = gandiva.TreeExprBuilder()
 node_a = builder.make_field(table.schema.field("a"))
 node_b = builder.make_field(table.schema.field("b"))
 fifty = builder.make_literal(50.0, pa.float64())
 eleven = builder.make_literal(11.0, pa.float64())

cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
 cond_2 = builder.make_function("greater_than", [node_a, node_b],
                                    pa.bool_())
 cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
 cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
 condition = builder.make_condition(cond)

filter = gandiva.make_filter(table.schema, condition)
 # filterResult has type SelectionVector
 filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool())
 print(result)

sum = builder.make_function("add", [node_a, node_b], pa.float64())
 field_result = pa.field("c", pa.float64())
 expr = builder.make_expression(sum, field_result)
 projector = gandiva.make_projector(
table.schema, [expr], pa.default_memory_pool())

# Here there is a problem that I don't know how to use filterResult with projector
 r, = projector.evaluate(table.to_batches()[0], result)
 ```

In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270]
  
 Meanwhile, it looks like it is impossible in `gandiva.pyx`: [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154]

  was:
Looks like there is no way to execute an expression on filtered data in python. 
Basically, I cannot pass `SelectionVector` to projector's `evaluate` method

```python
import pyarrow as pa
import pyarrow.gandiva as gandiva

table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
                                  pa.array([5., 45., 36., 73.,
                                            83., 23., 76.])],
                                 ['a', 'b'])

builder = gandiva.TreeExprBuilder()
node_a = builder.make_field(table.schema.field("a"))
node_b = builder.make_field(table.schema.field("b"))
fifty = builder.make_literal(50.0, pa.float64())
eleven = builder.make_literal(11.0, pa.float64())

cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
cond_2 = builder.make_function("greater_than", [node_a, node_b],
                                   pa.bool_())
cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
condition = builder.make_condition(cond)

filter = gandiva.make_filter(table.schema, condition)
# filterResult has type SelectionVector
filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool())
print(result)

sum = builder.make_function("add", [node_a, node_b], pa.float64())
field_result = pa.field("c", pa.float64())
expr = builder.make_expression(sum, field_result)
projector = gandiva.make_projector(
        table.schema, [expr], pa.default_memory_pool())

### Here there is a problem that I don't know how to use filterResult with projector
r, = projector.evaluate(table.to_batches()[0], result)
```

In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270]
 
Meanwhile, it looks like it is impossible in `gandiva.pyx`: [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154]


> [Gandiva][python] Execute expression on filtered data
> -----------------------------------------------------
>
>                 Key: ARROW-10197
>                 URL: https://issues.apache.org/jira/browse/ARROW-10197
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Gandiva, Python
>            Reporter: Kirill Lykov
>            Priority: Trivial
>
> Looks like there is no way to execute an expression on filtered data in python. 
>  Basically, I cannot pass `SelectionVector` to projector's `evaluate` method
> ```python
>  import pyarrow as pa
>  import pyarrow.gandiva as gandiva
> table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
>                                    pa.array([5., 45., 36., 73.,
>                                              83., 23., 76.])],
>                                   ['a', 'b'])
> builder = gandiva.TreeExprBuilder()
>  node_a = builder.make_field(table.schema.field("a"))
>  node_b = builder.make_field(table.schema.field("b"))
>  fifty = builder.make_literal(50.0, pa.float64())
>  eleven = builder.make_literal(11.0, pa.float64())
> cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
>  cond_2 = builder.make_function("greater_than", [node_a, node_b],
>                                     pa.bool_())
>  cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
>  cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
>  condition = builder.make_condition(cond)
> filter = gandiva.make_filter(table.schema, condition)
>  # filterResult has type SelectionVector
>  filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool())
>  print(result)
> sum = builder.make_function("add", [node_a, node_b], pa.float64())
>  field_result = pa.field("c", pa.float64())
>  expr = builder.make_expression(sum, field_result)
>  projector = gandiva.make_projector(
> table.schema, [expr], pa.default_memory_pool())
> # Here there is a problem that I don't know how to use filterResult with projector
>  r, = projector.evaluate(table.to_batches()[0], result)
>  ```
> In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270]
>   
>  Meanwhile, it looks like it is impossible in `gandiva.pyx`: [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)