You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Maarten Breddels (Jira)" <ji...@apache.org> on 2020/09/14 07:47:00 UTC

[jira] [Created] (ARROW-9991) [C++] split kernsl for strings/binary

Maarten Breddels created ARROW-9991:
---------------------------------------

             Summary: [C++] split kernsl for strings/binary
                 Key: ARROW-9991
                 URL: https://issues.apache.org/jira/browse/ARROW-9991
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Maarten Breddels
            Assignee: Maarten Breddels


Similar to Python str.split and bytes.split, we'd like to have a way to convert str into list[str] (and similarly for bytes).

When the separator is given, the algorithms for both types are the same. Python, however, overloads strip. When given no separator, the algorithm will split considering all whitespace (unicode for str, ascii for bytes) as separator.

I'd rather see not too much overloaded kernels, e.g.
 # 
binary_split (takes string/binary separator, and maxsplit arg, no special utf8 version needed)


 
utf8_split_whitespace (similar to Python's version given no separator)
asi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)