You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Maarten Breddels (Jira)" <ji...@apache.org> on 2020/09/14 07:47:00 UTC
[jira] [Created] (ARROW-9991) [C++] split kernsl for strings/binary
Maarten Breddels created ARROW-9991:
---------------------------------------
Summary: [C++] split kernsl for strings/binary
Key: ARROW-9991
URL: https://issues.apache.org/jira/browse/ARROW-9991
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Maarten Breddels
Assignee: Maarten Breddels
Similar to Python str.split and bytes.split, we'd like to have a way to convert str into list[str] (and similarly for bytes).
When the separator is given, the algorithms for both types are the same. Python, however, overloads strip. When given no separator, the algorithm will split considering all whitespace (unicode for str, ascii for bytes) as separator.
I'd rather see not too much overloaded kernels, e.g.
#
binary_split (takes string/binary separator, and maxsplit arg, no special utf8 version needed)
utf8_split_whitespace (similar to Python's version given no separator)
asi
--
This message was sent by Atlassian Jira
(v8.3.4#803005)