You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/03/26 16:34:00 UTC

[jira] [Commented] (ARROW-11513) [R] Bindings for sub/gsub

    [ https://issues.apache.org/jira/browse/ARROW-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309542#comment-17309542 ] 

Neal Richardson commented on ARROW-11513:
-----------------------------------------

Looking at the [options struct|https://github.com/apache/arrow/pull/8468/files#diff-6bc7ecec6a4f7bcefc2511cde3bd809340ad0d94bb8f7cc5f4994063c798f2faR72-R83] and the [re2 syntax|https://github.com/google/re2/wiki/Syntax], here are some notes for how to map to R concepts:

* gsub/str_replace_all is -1 max_replacements (the default); sub/str_replace is 1 max_replacements
* fixed = FALSE (default) means to use the "replace_substring_regex" function; fixed = TRUE means to use "replace_substring"
* if ignore.case = TRUE and fixed = FALSE, can wrap pattern with a flag like {{paste0("(?i", pattern, ")")}} (or maybe it is actually {{paste0("(?i)", pattern)}}, see [stringi docs|https://stringi.gagolewski.com/rapi/stri_opts_regex.html]; unclear that we have a case-insensitive, non-regex option
* stringr handles case insensitivity differently, using a stringi options struct, so we may need to deal with that (or defer)
* useBytes: unclear that this is an option, or if it is relevant (per the docs for {{sub}}, "The main effect of ‘useBytes = TRUE’ is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales")
* perl: unclear that this is an option, or if it is relevant


> [R] Bindings for sub/gsub
> -------------------------
>
>                 Key: ARROW-11513
>                 URL: https://issues.apache.org/jira/browse/ARROW-11513
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>             Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)