You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-13013) [C++][Compute][Python] Move (majority of) kernel unit tests to python

     [ https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-13013:
-----------------------------------

    Assignee:     (was: Weston Pace)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13013
>                 URL: https://issues.apache.org/jira/browse/ARROW-13013
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Ben Kietzman
>            Priority: Major
>
> mailing list discussion: [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]
> Writing unit tests for compute functions in c++ is laborious, entails a lot of boilerplate, and slows iteration since it requires recompilation when adding new tests. The majority of these test cases need not be written in C++ at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit tests, we'll have to expose a c++ function named {{AssertCallFunction}} or so. {{AssertCallFunction}} will invoke the named compute::Function and compare actual results to expected without crossing the C++/python boundary, allowing a developer to step through all relevant code with a single breakpoint in GDB. Construction of scalars/arrays/function options and any other inputs to the function is amply supported by {{pyarrow}}, and will happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from its arguments - for example {{CheckScalar("add",
> {left, right}
> , expected)}} will first assert that {{left + right == expected}} then {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that offsets are handled correctly. This has value but can be easily expressed in Python and configuration of such behavior would overcomplicate the interface of {{AssertCallFunction}}.
> Unit tests for kernels would then be written in {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for [addition with implicit casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918] could then be rewritten as
> {code:python}
> def test_addition_implicit_casts():
>     AssertCallFunction("add", [[ 0,    1,   2,    None],
>                                [ 0.25, 1.5, 2.75, None]],
>                        expected=[0.25, 1.5, 2.75, None])
>     # ...
> {code}
> NB: Some unit tests will probably still reside in C++ since we'll need to test things we don't wish to expose in a user facing API, such as "whether a boolean kernel avoids clobbering bits when outputting into a slice". These should be far more manageable since they won't need to assert correct logic across all possible input types



--
This message was sent by Atlassian Jira
(v8.20.10#820010)