You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/06/24 14:44:00 UTC
[jira] [Comment Edited] (ARROW-13157) [C++] Implement ignore_case
option for find_substring
[ https://issues.apache.org/jira/browse/ARROW-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368896#comment-17368896 ]
David Li edited comment on ARROW-13157 at 6/24/21, 2:43 PM:
------------------------------------------------------------
IIRC, this should be doable, but requires some trouble: we would use RE2 to do the case-insensitive search, but RE2 doesn't return the match position unless you have a capture group. However, then you can't use the 'literal' option anymore and have to escape all regex characters in your search string. Though I see RE2 has a QuoteMeta function to do this for you, so it shouldn't be that bad. I'll probably pick this up soon unless someone else wants to do it.
was (Author: lidavidm):
IIRC, this should be doable, but requires some trouble: we would use RE2 to do the case-insensitive search, but RE2 doesn't return the match position unless you have a capture group. However, then you can't use the 'literal' option anymore and have to escape all regex characters in your search string. Though I see RE2 has a QuoteMeta function to do this for you, so it shouldn't be that bad.
> [C++] Implement ignore_case option for find_substring
> ------------------------------------------------------
>
> Key: ARROW-13157
> URL: https://issues.apache.org/jira/browse/ARROW-13157
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Nic Crane
> Priority: Major
>
> The find_substring compute function uses the MatchSubstringOptions Options class. However, when I try to set ignore_case to TRUE, I get the following error:
> {code:java}
> Error: NotImplemented: find_substring with ignore_case
> {code}
> R code to replicate the error is below, though depends on a currently unmerged branch:
> {code:java}
> df <- tibble(x = c("Foo and Bar", "baz and qux and quux"))
> df %>%
> Table$create() %>%
> mutate(x = arrow_find_substring(x, options = list(pattern = "b", ignore_case = TRUE))) %>%
> collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)