You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/06/08 19:21:00 UTC
[jira] [Updated] (ARROW-13013) [C++][Compute][Python] Move (majority of) kernel unit tests to python

     [ https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Kietzman updated ARROW-13013:
---------------------------------
    Description: 
mailing list discussion: [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]

Writing unit tests for compute functions in c++ is laborious, entails a lot of boilerplate, and slows iteration since it requires recompilation when adding new tests. The majority of these test cases need not be written in C++ at all and could instead be made part of the pyarrow test suite.

In order to make the kernels' C++ implementations easily debuggable from unit tests, we'll have to expose a c++ function named {{AssertCallFunction}} or so. {{AssertCallFunction}} will invoke the named compute::Function and compare actual results to expected without crossing the C++/python boundary, allowing a developer to step through all relevant code with a single breakpoint in GDB. Construction of scalars/arrays/function options and any other inputs to the function is amply supported by {{pyarrow}}, and will happen outside the scope of {{AssertCallFunction}}.

{{AssertCallFunction}} should not try to derive additional assertions from its arguments - for example {{CheckScalar("add",

{left, right}

, expected)}} will first assert that {{left + right == expected}} then {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that offsets are handled correctly. This has value but can be easily expressed in Python and configuration of such behavior would overcomplicate the interface of {{AssertCallFunction}}.

Unit tests for kernels would then be written in {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for [addition with implicit casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918] could then be rewritten as
{code:python}
def test_addition_implicit_casts():
    AssertCallFunction("add", [[ 0,    1,   2,    None],
                               [ 0.25, 1.5, 2.75, None]],
                       expected=[0.25, 1.5, 2.75, None])

    # ...
{code}
NB: Some unit tests will probably still reside in C++ since we'll need to test things we don't wish to expose in a user facing API, such as "whether a boolean kernel avoids clobbering bits when outputting into a slice". These should be far more manageable since they won't need to assert correct logic across all possible input types

  was:
mailing list discussion: https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E

Writing unit tests for compute functions in c++ is laborious, entails a lot of boilerplate, and slows iteration since it requires recompilation when adding new tests. The majority of these test cases need not be written in C++ at all and could instead be made part of the pyarrow test suite.

In order to make the kernels' C++ implementations easily debuggable from unit tests, we'll have to expose a c++ function named {{AssertCallFunction}} or so. {{AssertCallFunction}} will invoke the named compute::Function and compare actual results to expected without crossing the C++/python boundary, allowing a developer to step through all relevant code with a single breakpoint  in GDB. Construction of scalars/arrays/function options and any other inputs to the function is amply supported by {{pyarrow}}, and will happen outside the scope of {{AssertCallFunction}}.

{{AssertCallFunction}} should not try to derive additional assertions from its arguments - for example {{CheckScalar("add", {left, right}, expected)}} will first assert that {{left + right == expected}} then {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that offsets are handled correctly. This has value but can be easily expressed in Python and configuration of such behavior would overcomplicate the interface of {{AssertCallFunction}}.

NB: Some unit tests will probably still reside in C++ since we'll need to test things we don't wish to expose in a user facing API, such as "whether a boolean kernel avoids clobbering bits when outputting into a slice". These should be far more manageable since they won't need to assert correct logic across all possible input types


> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13013
>                 URL: https://issues.apache.org/jira/browse/ARROW-13013
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Ben Kietzman
>            Priority: Major
>
> mailing list discussion: [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]
> Writing unit tests for compute functions in c++ is laborious, entails a lot of boilerplate, and slows iteration since it requires recompilation when adding new tests. The majority of these test cases need not be written in C++ at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit tests, we'll have to expose a c++ function named {{AssertCallFunction}} or so. {{AssertCallFunction}} will invoke the named compute::Function and compare actual results to expected without crossing the C++/python boundary, allowing a developer to step through all relevant code with a single breakpoint in GDB. Construction of scalars/arrays/function options and any other inputs to the function is amply supported by {{pyarrow}}, and will happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from its arguments - for example {{CheckScalar("add",
> {left, right}
> , expected)}} will first assert that {{left + right == expected}} then {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that offsets are handled correctly. This has value but can be easily expressed in Python and configuration of such behavior would overcomplicate the interface of {{AssertCallFunction}}.
> Unit tests for kernels would then be written in {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for [addition with implicit casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918] could then be rewritten as
> {code:python}
> def test_addition_implicit_casts():
>     AssertCallFunction("add", [[ 0,    1,   2,    None],
>                                [ 0.25, 1.5, 2.75, None]],
>                        expected=[0.25, 1.5, 2.75, None])
>     # ...
> {code}
> NB: Some unit tests will probably still reside in C++ since we'll need to test things we don't wish to expose in a user facing API, such as "whether a boolean kernel avoids clobbering bits when outputting into a slice". These should be far more manageable since they won't need to assert correct logic across all possible input types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)