You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Kirill Lykov <ly...@gmail.com> on 2020/10/06 09:26:06 UTC
Execute expression on filtered data [ptyhon][gandiva]
Hi,
I'm trying to write a code in python which executes an expression on
filtered data. So I create a filter and later projector for some expression
but don't get how to combine those two in python:
```python
import pyarrow as pa
import pyarrow.gandiva as gandiva
table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
pa.array([5., 45., 36., 73.,
83., 23., 76.])],
['a', 'b'])
builder = gandiva.TreeExprBuilder()
node_a = builder.make_field(table.schema.field("a"))
node_b = builder.make_field(table.schema.field("b"))
fifty = builder.make_literal(50.0, pa.float64())
eleven = builder.make_literal(11.0, pa.float64())
cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
cond_2 = builder.make_function("greater_than", [node_a, node_b],
pa.bool_())
cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
condition = builder.make_condition(cond)
filter = gandiva.make_filter(table.schema, condition)
# filterResult has type SelectionVector
filterResult = filter.evaluate(table.to_batches()[0],
pa.default_memory_pool())
print(result)
sum = builder.make_function("add", [node_a, node_b], pa.float64())
field_result = pa.field("c", pa.float64())
expr = builder.make_expression(sum, field_result)
projector = gandiva.make_projector(
table.schema, [expr], pa.default_memory_pool())
### Here there is a problem that I don't know how to use filterResult with
projector
r, = projector.evaluate(table.to_batches()[0], result)
```
In C++, I see that it is possible to pass SelectionVector as second
argument to projector::Evaluate:
https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270
Meanwhile, it looks like it is impossible in `gandiva.pyx`:
https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154
--
Best regards,
Kirill Lykov
Re: Execute expression on filtered data [ptyhon][gandiva]
Posted by Kirill Lykov <ly...@gmail.com>.
I've fixed the thing and added PR https://github.com/apache/arrow/pull/8461
May I ask someone for a review? I suggest Philip Moritz who contributed the
original cython integration layer would be a good candidate.
Since I cannot assign reviewers, I thought maybe it is a good idea to write
in the mailing list.
On Tue, Oct 6, 2020 at 4:37 PM Kirill Lykov <ly...@gmail.com> wrote:
> I've created: https://issues.apache.org/jira/browse/ARROW-10197
> I put priority "Trivial" -- not sure if it is correct.
>
> On Tue, Oct 6, 2020 at 3:41 PM Wes McKinney <we...@gmail.com> wrote:
>
>> This looks like something to improve in the Python bindings. Would you
>> like to open a JIRA issue about it?
>>
>> On Tue, Oct 6, 2020 at 4:26 AM Kirill Lykov <ly...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm trying to write a code in python which executes an expression on
>> > filtered data. So I create a filter and later projector for some
>> expression
>> > but don't get how to combine those two in python:
>> >
>> > ```python
>> > import pyarrow as pa
>> > import pyarrow.gandiva as gandiva
>> >
>> > table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44.,
>> 22.]),
>> > pa.array([5., 45., 36., 73.,
>> > 83., 23., 76.])],
>> > ['a', 'b'])
>> >
>> > builder = gandiva.TreeExprBuilder()
>> > node_a = builder.make_field(table.schema.field("a"))
>> > node_b = builder.make_field(table.schema.field("b"))
>> > fifty = builder.make_literal(50.0, pa.float64())
>> > eleven = builder.make_literal(11.0, pa.float64())
>> >
>> > cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
>> > cond_2 = builder.make_function("greater_than", [node_a, node_b],
>> > pa.bool_())
>> > cond_3 = builder.make_function("less_than", [node_b, eleven],
>> pa.bool_())
>> > cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
>> > condition = builder.make_condition(cond)
>> >
>> > filter = gandiva.make_filter(table.schema, condition)
>> > # filterResult has type SelectionVector
>> > filterResult = filter.evaluate(table.to_batches()[0],
>> > pa.default_memory_pool())
>> > print(result)
>> >
>> > sum = builder.make_function("add", [node_a, node_b], pa.float64())
>> > field_result = pa.field("c", pa.float64())
>> > expr = builder.make_expression(sum, field_result)
>> > projector = gandiva.make_projector(
>> > table.schema, [expr], pa.default_memory_pool())
>> >
>> > ### Here there is a problem that I don't know how to use filterResult
>> with
>> > projector
>> > r, = projector.evaluate(table.to_batches()[0], result)
>> > ```
>> >
>> > In C++, I see that it is possible to pass SelectionVector as second
>> > argument to projector::Evaluate:
>> >
>> https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270
>> >
>> > Meanwhile, it looks like it is impossible in `gandiva.pyx`:
>> >
>> https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Kirill Lykov
>>
>
>
> --
> Best regards,
> Kirill Lykov
>
--
Best regards,
Kirill Lykov
Re: Execute expression on filtered data [ptyhon][gandiva]
Posted by Kirill Lykov <ly...@gmail.com>.
I've created: https://issues.apache.org/jira/browse/ARROW-10197
I put priority "Trivial" -- not sure if it is correct.
On Tue, Oct 6, 2020 at 3:41 PM Wes McKinney <we...@gmail.com> wrote:
> This looks like something to improve in the Python bindings. Would you
> like to open a JIRA issue about it?
>
> On Tue, Oct 6, 2020 at 4:26 AM Kirill Lykov <ly...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I'm trying to write a code in python which executes an expression on
> > filtered data. So I create a filter and later projector for some
> expression
> > but don't get how to combine those two in python:
> >
> > ```python
> > import pyarrow as pa
> > import pyarrow.gandiva as gandiva
> >
> > table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44.,
> 22.]),
> > pa.array([5., 45., 36., 73.,
> > 83., 23., 76.])],
> > ['a', 'b'])
> >
> > builder = gandiva.TreeExprBuilder()
> > node_a = builder.make_field(table.schema.field("a"))
> > node_b = builder.make_field(table.schema.field("b"))
> > fifty = builder.make_literal(50.0, pa.float64())
> > eleven = builder.make_literal(11.0, pa.float64())
> >
> > cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
> > cond_2 = builder.make_function("greater_than", [node_a, node_b],
> > pa.bool_())
> > cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
> > cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
> > condition = builder.make_condition(cond)
> >
> > filter = gandiva.make_filter(table.schema, condition)
> > # filterResult has type SelectionVector
> > filterResult = filter.evaluate(table.to_batches()[0],
> > pa.default_memory_pool())
> > print(result)
> >
> > sum = builder.make_function("add", [node_a, node_b], pa.float64())
> > field_result = pa.field("c", pa.float64())
> > expr = builder.make_expression(sum, field_result)
> > projector = gandiva.make_projector(
> > table.schema, [expr], pa.default_memory_pool())
> >
> > ### Here there is a problem that I don't know how to use filterResult
> with
> > projector
> > r, = projector.evaluate(table.to_batches()[0], result)
> > ```
> >
> > In C++, I see that it is possible to pass SelectionVector as second
> > argument to projector::Evaluate:
> >
> https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270
> >
> > Meanwhile, it looks like it is impossible in `gandiva.pyx`:
> >
> https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154
> >
> >
> >
> > --
> > Best regards,
> > Kirill Lykov
>
--
Best regards,
Kirill Lykov
Re: Execute expression on filtered data [ptyhon][gandiva]
Posted by Wes McKinney <we...@gmail.com>.
This looks like something to improve in the Python bindings. Would you
like to open a JIRA issue about it?
On Tue, Oct 6, 2020 at 4:26 AM Kirill Lykov <ly...@gmail.com> wrote:
>
> Hi,
>
> I'm trying to write a code in python which executes an expression on
> filtered data. So I create a filter and later projector for some expression
> but don't get how to combine those two in python:
>
> ```python
> import pyarrow as pa
> import pyarrow.gandiva as gandiva
>
> table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
> pa.array([5., 45., 36., 73.,
> 83., 23., 76.])],
> ['a', 'b'])
>
> builder = gandiva.TreeExprBuilder()
> node_a = builder.make_field(table.schema.field("a"))
> node_b = builder.make_field(table.schema.field("b"))
> fifty = builder.make_literal(50.0, pa.float64())
> eleven = builder.make_literal(11.0, pa.float64())
>
> cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
> cond_2 = builder.make_function("greater_than", [node_a, node_b],
> pa.bool_())
> cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
> cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
> condition = builder.make_condition(cond)
>
> filter = gandiva.make_filter(table.schema, condition)
> # filterResult has type SelectionVector
> filterResult = filter.evaluate(table.to_batches()[0],
> pa.default_memory_pool())
> print(result)
>
> sum = builder.make_function("add", [node_a, node_b], pa.float64())
> field_result = pa.field("c", pa.float64())
> expr = builder.make_expression(sum, field_result)
> projector = gandiva.make_projector(
> table.schema, [expr], pa.default_memory_pool())
>
> ### Here there is a problem that I don't know how to use filterResult with
> projector
> r, = projector.evaluate(table.to_batches()[0], result)
> ```
>
> In C++, I see that it is possible to pass SelectionVector as second
> argument to projector::Evaluate:
> https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270
>
> Meanwhile, it looks like it is impossible in `gandiva.pyx`:
> https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154
>
>
>
> --
> Best regards,
> Kirill Lykov