You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Raúl Cumplido (Jira)" <ji...@apache.org> on 2022/09/12 14:56:00 UTC

[jira] [Updated] (ARROW-17683) [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError

     [ https://issues.apache.org/jira/browse/ARROW-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raúl Cumplido updated ARROW-17683:
----------------------------------
    Attachment: image-2022-09-12-16-55-12-149.png

> [CI][Python] Nightly test-conda-python-3.7-kartothek-latest fails due to UnicodeDecodeError
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17683
>                 URL: https://issues.apache.org/jira/browse/ARROW-17683
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Continuous Integration, Python
>            Reporter: Raúl Cumplido
>            Priority: Major
>              Labels: Nightly
>         Attachments: image-2022-09-12-16-55-12-149.png
>
>
> The nightly tests against kartothek are currently failing due to the following error:
> {code:java}
>  ______________________ test_eval_operators[<-1-expected3] ______________________op = '<', value = 1, expected = {'a', 'b', 'c'}    @pytest.mark.parametrize(
> >       "op, value, expected",
>         [
>             ("==", 1, {"b", "c", "e"}),
>             ("<=", 1, {"a", "b", "c", "e"}),
>             (">=", 1, {"b", "c", "e", "f"}),
>             ("<", 1, {"a", "b", "c"}),
>             (">", 1, {"f"}),
>             ("in", [0, 2], {"a", "b", "c", "f"}),
>         ],
>     )tests/core/test_index.py:621: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> tests/core/test_index.py:638: in test_eval_operators
>     index_data[2]: ["f"],
> kartothek/core/index.py:614: in __init__
>     normalize_dtype=normalize_dtype,
> kartothek/core/index.py:78: in __init__
>     table = _index_dct_to_table(index_dct, column, None)
> kartothek/core/index.py:949: in _index_dct_to_table
>     labeled_array = pa.array(keys, type=dtype)
> pyarrow/array.pxi:313: in pyarrow.lib.array
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >   ???
> E   UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 4-7: code point in surrogate code point range(0xd800, 0xe000)
> E   Falsifying example: test_eval_operators(
> E       index_data=array(['', '0', '0\ud800', '1', '2', '3', '4', '5', '6', '7'], dtype='<U2'),
> E       op='<',
> E       value=1,
> E       expected={'a', 'b', 'c'},
> E   )pyarrow/array.pxi:83: UnicodeDecodeError {code}
> An example of build failure:
> [https://github.com/ursacomputing/crossbow/runs/8296508320?check_suite_focus=true]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)