You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/18 13:13:00 UTC
[jira] [Created] (ARROW-10641) [C++] A "replace" or "map" kernel to
replace values in array based on mapping
Joris Van den Bossche created ARROW-10641:
---------------------------------------------
Summary: [C++] A "replace" or "map" kernel to replace values in array based on mapping
Key: ARROW-10641
URL: https://issues.apache.org/jira/browse/ARROW-10641
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Joris Van den Bossche
A "replace" or "map" kernel to replace values in array based on mapping. This would be similar as the pandas {{Series.replace}} (or {{Series.map}}) kernel, and as a small illustration of what is meant:
{code}
In [41]: s = pd.Series(["Yes", "Y", "No", "N"])
In [42]: s
Out[42]:
0 Yes
1 Y
2 No
3 N
dtype: object
In [43]: s.replace({"Y": "Yes", "N": "No"})
Out[43]:
0 Yes
1 Yes
2 No
3 No
dtype: object
{code}
Note: in pandas the difference between "replace" and "map" is that replace will only replace a value if it is present in the mapping, while map will replace every value in the input array with the corresponding value in the mapping and return null if not present in the mapping.
Note, this is different from ARROW-10306 which is about string replacement _within_ array elements (replacing a substring in each string element in the array), while here it is about replacing full elements of the array)
cc [~maartenbreddels]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)