You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "dhicks (via GitHub)" <gi...@apache.org> on 2024/02/23 22:25:42 UTC

[I] perl operators in regular expressions [arrow]

dhicks opened a new issue, #40220:
URL: https://github.com/apache/arrow/issues/40220

   ### Describe the enhancement requested
   
   R 4.3, arrow 14.0.0.2 (most recent Mac OS binary; apologies in advance if this is already supported in source)
   
   `arrow` can't handle perl operators, such as negative lookaheads, in regular expressions, at least via `dplyr` and `stringr`: 
   
   ```
   library(arrow)
   library(dplyr)
   library(stringr)
   
   ar = data.frame(text = c('Lorem ipsum dolor sit amet', 
                            'Lorem dolor ipsum sit amet')) |> 
       as_arrow_table()
   
   ## Works, returns both rows
   ar |> 
       filter(str_detect(text, 'Lorem [^(ipsum)]')) |> 
       collect()
   
   ## Should only return the second row
   ## Error in `compute.arrow_dplyr_query()`:
   ## ! Invalid: Invalid regular expression: invalid perl operator: (?!
   ar |> 
       filter(str_detect(text, regex('Lorem(?! ipsum)')))
       collect()
   ```
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [R] perl operators in regular expressions [arrow]

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on issue #40220:
URL: https://github.com/apache/arrow/issues/40220#issuecomment-1962436165

   Without looking at the code, so not a definitive answer, but I am pretty sure that `re2` the C++ library used in acero doesn't support lookahead so this is probably not something that can be added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org