You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/06/24 14:44:00 UTC

[jira] [Comment Edited] (ARROW-13157) [C++] Implement ignore_case option for find_substring

    [ https://issues.apache.org/jira/browse/ARROW-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368896#comment-17368896 ] 

David Li edited comment on ARROW-13157 at 6/24/21, 2:43 PM:
------------------------------------------------------------

IIRC, this should be doable, but requires some trouble: we would use RE2 to do the case-insensitive search, but RE2 doesn't return the match position unless you have a capture group. However, then you can't use the 'literal' option anymore and have to escape all regex characters in your search string. Though I see RE2 has a QuoteMeta function to do this for you, so it shouldn't be that bad. I'll probably pick this up soon unless someone else wants to do it.


was (Author: lidavidm):
IIRC, this should be doable, but requires some trouble: we would use RE2 to do the case-insensitive search, but RE2 doesn't return the match position unless you have a capture group. However, then you can't use the 'literal' option anymore and have to escape all regex characters in your search string. Though I see RE2 has a QuoteMeta function to do this for you, so it shouldn't be that bad.

> [C++] Implement ignore_case option for find_substring 
> ------------------------------------------------------
>
>                 Key: ARROW-13157
>                 URL: https://issues.apache.org/jira/browse/ARROW-13157
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Nic Crane
>            Priority: Major
>
> The find_substring compute function uses the MatchSubstringOptions Options class.  However, when I try to set ignore_case to TRUE, I get the following error:
> {code:java}
>  Error: NotImplemented: find_substring with ignore_case
> {code}
> R code to replicate the error is below, though depends on a currently unmerged branch:
> {code:java}
> df <- tibble(x = c("Foo and Bar", "baz and qux and quux"))
> df %>%
>       Table$create() %>%
>       mutate(x = arrow_find_substring(x, options = list(pattern = "b", ignore_case = TRUE))) %>%
>       collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)