You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/09/12 13:34:00 UTC

[jira] [Commented] (ARROW-17683) [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError

    [ https://issues.apache.org/jira/browse/ARROW-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603090#comment-17603090 ] 

Joris Van den Bossche commented on ARROW-17683:
-----------------------------------------------

Do you know if this is a recurring failure? The test seems to fail on something that already failed in the past as well ({{pa.array(np.array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'))}} fails with released version of pyarrow as well), and is a hypothesis test, so it might be a rare error due to the random parametrization of hypothesis. 

(that said, the project doesn't seem actively maintained anymore, so we should maybe consider removing or disabling it in our nightly integration builds)

> [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17683
>                 URL: https://issues.apache.org/jira/browse/ARROW-17683
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Continuous Integration, Python
>            Reporter: Raúl Cumplido
>            Priority: Major
>              Labels: Nightly
>
> The nightly tests against kartothek are currently failing due to the following error:
> {code:java}
>  ______________________ test_eval_operators[<-1-expected3] ______________________op = '<', value = 1, expected = {'a', 'b', 'c'}    @pytest.mark.parametrize(
> >       "op, value, expected",
>         [
>             ("==", 1, {"b", "c", "e"}),
>             ("<=", 1, {"a", "b", "c", "e"}),
>             (">=", 1, {"b", "c", "e", "f"}),
>             ("<", 1, {"a", "b", "c"}),
>             (">", 1, {"f"}),
>             ("in", [0, 2], {"a", "b", "c", "f"}),
>         ],
>     )tests/core/test_index.py:621: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> tests/core/test_index.py:638: in test_eval_operators
>     index_data[2]: ["f"],
> kartothek/core/index.py:614: in __init__
>     normalize_dtype=normalize_dtype,
> kartothek/core/index.py:78: in __init__
>     table = _index_dct_to_table(index_dct, column, None)
> kartothek/core/index.py:949: in _index_dct_to_table
>     labeled_array = pa.array(keys, type=dtype)
> pyarrow/array.pxi:313: in pyarrow.lib.array
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >   ???
> E   UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 4-7: code point in surrogate code point range(0xd800, 0xe000)
> E   Falsifying example: test_eval_operators(
> E       index_data=array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'),
> E       op='<',
> E       value=1,
> E       expected={'a', 'b', 'c'},
> E   )pyarrow/array.pxi:83: UnicodeDecodeError {code}
> An example of build failure:
> [https://github.com/ursacomputing/crossbow/runs/8296508320?check_suite_focus=true]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)