You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/09/12 13:35:00 UTC

[jira] [Comment Edited] (ARROW-17683) [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError

    [ https://issues.apache.org/jira/browse/ARROW-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603090#comment-17603090 ] 

Joris Van den Bossche edited comment on ARROW-17683 at 9/12/22 1:34 PM:
------------------------------------------------------------------------

Do you know if this is a recurring failure? The test seems to fail on something that already failed in the past as well ({{pa.array(np.array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'))}} fails with released version of pyarrow as well), and is a hypothesis test, so it might be a rare error due to the random parametrization of hypothesis. 

https://crossbow.voltrondata.com/ says the test is failing for 4 days, although there is only 1 failure link.

(that said, the project doesn't seem actively maintained anymore, so we should maybe consider removing or disabling it in our nightly integration builds)


was (Author: jorisvandenbossche):
Do you know if this is a recurring failure? The test seems to fail on something that already failed in the past as well ({{pa.array(np.array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'))}} fails with released version of pyarrow as well), and is a hypothesis test, so it might be a rare error due to the random parametrization of hypothesis. 

(that said, the project doesn't seem actively maintained anymore, so we should maybe consider removing or disabling it in our nightly integration builds)

> [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17683
>                 URL: https://issues.apache.org/jira/browse/ARROW-17683
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Continuous Integration, Python
>            Reporter: Raúl Cumplido
>            Priority: Major
>              Labels: Nightly
>
> The nightly tests against kartothek are currently failing due to the following error:
> {code:java}
>  ______________________ test_eval_operators[<-1-expected3] ______________________op = '<', value = 1, expected = {'a', 'b', 'c'}    @pytest.mark.parametrize(
> >       "op, value, expected",
>         [
>             ("==", 1, {"b", "c", "e"}),
>             ("<=", 1, {"a", "b", "c", "e"}),
>             (">=", 1, {"b", "c", "e", "f"}),
>             ("<", 1, {"a", "b", "c"}),
>             (">", 1, {"f"}),
>             ("in", [0, 2], {"a", "b", "c", "f"}),
>         ],
>     )tests/core/test_index.py:621: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> tests/core/test_index.py:638: in test_eval_operators
>     index_data[2]: ["f"],
> kartothek/core/index.py:614: in __init__
>     normalize_dtype=normalize_dtype,
> kartothek/core/index.py:78: in __init__
>     table = _index_dct_to_table(index_dct, column, None)
> kartothek/core/index.py:949: in _index_dct_to_table
>     labeled_array = pa.array(keys, type=dtype)
> pyarrow/array.pxi:313: in pyarrow.lib.array
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >   ???
> E   UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 4-7: code point in surrogate code point range(0xd800, 0xe000)
> E   Falsifying example: test_eval_operators(
> E       index_data=array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'),
> E       op='<',
> E       value=1,
> E       expected={'a', 'b', 'c'},
> E   )pyarrow/array.pxi:83: UnicodeDecodeError {code}
> An example of build failure:
> [https://github.com/ursacomputing/crossbow/runs/8296508320?check_suite_focus=true]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)