You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2021/05/15 06:31:00 UTC
[jira] [Resolved] (ARROW-12774) [C++][Compute]
replace_substring_regex() creates invalid arrays => crash
[ https://issues.apache.org/jira/browse/ARROW-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yibo Cai resolved ARROW-12774.
------------------------------
Fix Version/s: 5.0.0
Resolution: Fixed
Issue resolved by pull request 10320
[https://github.com/apache/arrow/pull/10320]
> [C++][Compute] replace_substring_regex() creates invalid arrays => crash
> ------------------------------------------------------------------------
>
> Key: ARROW-12774
> URL: https://issues.apache.org/jira/browse/ARROW-12774
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 4.0.0
> Reporter: Adam Hooper
> Assignee: Niranda Perera
> Priority: Major
> Labels: pull-request-available
> Fix For: 5.0.0, 4.0.1
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> min
> {code:python}
> arr = pa.array(['A'] * 16)
> arr2 = pa.compute.replace_substring_regex(arr, pattern="X", replacement="Y")
> arr2.validate(full=True)
> {code}
> Expected results: a valid array
> Actual results: {{pyarrow.lib.ArrowInvalid: Offset invariant failure: non-monotonic offset at slot 64: 0 < 63}}
> So if you run {{arr.diff(arr2)}}, you'll get something like:
> {code:java}
> terminate called after throwing an instance of 'std::length_error'
> what(): basic_string::_S_create
> Aborted (core dumped)
> {code}
> This seems to happen if and only if the input array length is a multiple of 16. That leads to an ugly workaround:
> {code:python}
> def replace_substring_regex_workaround_12774(
> array: pa.Array,
> *,
> pattern: str,
> replacement: str
> ) -> pa.Array:
> if len(array) > 0 and len(array) % 16 == 0:
> chunked_array = pa.chunked_array([array.slice(0, 1), array.slice(1)], type=array.type)
> return pa.compute.replace_substring_regex(
> chunked_array,
> pattern=pattern,
> replacement=replacement
> ).combine_chunks()
> else:
> return pa.compute.replace_substring_regex(
> array,
> pattern=pattern,
> replacement=replacement
> )
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)