You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2023/08/17 16:20:00 UTC
[jira] [Updated] (IMPALA-12374) Explore optimizing re2 usage for leading / trailing ".*" when generating LIKE regex
[ https://issues.apache.org/jira/browse/IMPALA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell updated IMPALA-12374:
-----------------------------------
Summary: Explore optimizing re2 usage for leading / trailing ".*" when generating LIKE regex (was: Explore optimizing re2 usage for leading / trailing ".*")
> Explore optimizing re2 usage for leading / trailing ".*" when generating LIKE regex
> -----------------------------------------------------------------------------------
>
> Key: IMPALA-12374
> URL: https://issues.apache.org/jira/browse/IMPALA-12374
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.3.0
> Reporter: Joe McDonnell
> Priority: Major
>
> Abseil has some recommendations about efficiently using re2 here: [https://abseil.io/fast/21]
> One recommendation it has is to avoid leading / trailing .* for FullMatch():
> {noformat}
> Using RE2::FullMatch() with leading or trailing .* is an antipattern. Instead, change it to RE2::PartialMatch() and remove the .*. RE2::PartialMatch() performs an unanchored search, so it is also necessary to anchor the regular expression (i.e. with ^ or $) to indicate that it must match at the start or end of the string.{noformat}
> For our slow path LIKE evaluation, we convert the LIKE to a regular expression and use FullMatch(). Our code to generate the regular expression will use leading/trailing .* and FullMatch for patterns like '%a%b%'. We could try detecting these cases and switching to PartialMatch with anchors. See the link for more details about how this works.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org