You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Omid Shahidi (Code Review)" <ge...@cloudera.org> on 2022/06/08 22:21:43 UTC

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Omid Shahidi has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18602


Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................

IMPALA-9615: Make re2's max_mem option configurable via an Impala
startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amout of memory as default for re2.
By using a greate value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
4 files changed, 90 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10741/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 22:55:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#6).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 122 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG@15
PS6, Line 15: The re2_max_mem_usage flag will
            : accept an unsigned int64
The re2_mem_limit will accept a memory specification string


http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG@21
PS6, Line 21: re2_max_mem_usage
re2_mem_limit


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/common/init.cc@346
PS6, Line 346: re2_mem_limit_
nit: re2_mem_limit


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc@53
PS6, Line 53: uint64_t StringFunctions::re2_mem_limit = 8 << 20;
Can this initialization moved to string-functions.h, along with the declaration?


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h@159
PS6, Line 159: re2_mem_limit_
nit: re2_mem_limit


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h@207
PS6, Line 207: re2_mem_limit
nit: re2_mem_limit_

Impala C++ Code follows a modified version of the Google C++ Style Guide at: https://google.github.io/styleguide/cppguide.html
See https://google.github.io/styleguide/cppguide.html#Variable_Names to contrast between Common Variable names vs Class Data Members.


http://gerrit.cloudera.org:8080/#/c/18602/6/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/6/tests/custom_cluster/test_re2_max_mem.py@22
PS6, Line 22: DEFAULT_RE2_MAX_MEM = 8 << 20
Unused variable, please remove.



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 23:28:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#8).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_mem_limit flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A temp.log
A tests/custom_cluster/test_re2_max_mem.py
7 files changed, 245 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 8
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 9:

(7 comments)

Overall looks good to me. I just have some minor comments and suggestions for follow on tasks.

http://gerrit.cloudera.org:8080/#/c/18602/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18602/9//COMMIT_MSG@9
PS9, Line 9: Some regex patterns require more memory to be compiled and pattern matched using
Would be good to wrap the commit message lines within 72 chars as mentioned here: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala


http://gerrit.cloudera.org:8080/#/c/18602/9//COMMIT_MSG@19
PS9, Line 19: The testcase uses a long regex pattern to use up all the memory in the
Maybe, a good idea to include testing details in a 'Testing' section. Here is a recent commit which is a good example to follow: https://github.com/apache/impala/commit/d12d4c9b077392f58701af2e343dd927fe516846


http://gerrit.cloudera.org:8080/#/c/18602/9/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

http://gerrit.cloudera.org:8080/#/c/18602/9/be/src/exprs/like-predicate.cc@104
PS9, Line 104:       StringFunctions::SetRE2MemOpt(&opts);
Would be interesting to run some tests to measure the perf impact of this patch.


http://gerrit.cloudera.org:8080/#/c/18602/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/9/be/src/exprs/string-functions-ir.cc@46
PS9, Line 46: DECLARE_string(re2_mem_limit);
Is this being used somewhere?


http://gerrit.cloudera.org:8080/#/c/18602/9/be/src/exprs/string-functions-ir.cc@934
PS9, Line 934: void StringFunctions::SetRE2MemLimit(int64_t re2_mem_limit) {
Unfortunately, this is another example of untracked memory usage at runtime. We could have multiple instances of re2 (at least one per fragment) and each could use the configured amount of memory. Unfortunately, the planner doesn't know of any such memory requirement from the LIKE and other predicates using StringFunctions and so doesn't quite predict the real memory requirement. Would be good to include the memory requirement for re2 in the planner estimates. This probably should be a follow on JIRA.


http://gerrit.cloudera.org:8080/#/c/18602/9/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/9/tests/custom_cluster/test_re2_max_mem.py@30
PS9, Line 30:         """"test to see given an amount of memory (in Bytes) does the re2 print an error for
nit:  "for for DFA" -> "for DFA"


http://gerrit.cloudera.org:8080/#/c/18602/9/tests/custom_cluster/test_re2_max_mem.py@43
PS9, Line 43:         if expect_fail:
If 'expect_fail' is set then we seem to be ignoring 'dfa_out_of_mem' flag. This is probably okay, but maybe something to add in the test comments.



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Jun 2022 22:55:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Abhishek Rawat, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#10).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
	size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:36:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#9).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_mem_limit flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 115 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#2).

Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................

IMPALA-9615: Make re2's max_mem option configurable via an Impala
startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amout of memory as default for re2.
By using a greate value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
4 files changed, 90 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc@941
PS4, Line 941:   int64_t re2_max_mem_usage =
             :       ParseUtil::ParseMemSpec(FLAGS_re2_mem_limit, &is_percent, 0);
You can validate this flag once in impala::InitCommonRuntime (be/src/common/init.cc).
It is probably a good idea too to keep the parsed value somewhere as static variable to avoid parsing it multiple time here.


http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py@44
PS4, Line 44: /
> flake8: E501 line too long (110 > 90 characters)
Can short the link like this:
https://github.com/google/re2/blob/3e9622e/re2/re2.h#L619-L648



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sun, 12 Jun 2022 23:37:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Jun 2022 02:18:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10737/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 22:41:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18602/5/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/5/tests/custom_cluster/test_re2_max_mem.py@42
PS5, Line 42:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/5/tests/custom_cluster/test_re2_max_mem.py@42
PS5, Line 42:         # values the regex pattern will fail as invalid pattern although it can be valid 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/5/tests/custom_cluster/test_re2_max_mem.py@44
PS5, Line 44: /
flake8: E501 line too long (110 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 22:13:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 9: Code-Review+1

(4 comments)

Looks good to me. Nice work!

http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/common/global-flags.cc@230
PS7, Line 230: 
             : static const string re2_mem_limit_help_msg =
             :     "Maximum bytes of memory to be used by re2's regex engine "
             :     "to hold the compiled form of the regexp. For more memory-consuming patterns, "
             :     "this can be set to be a higher number."
> Sorry that I miss this before, but can you insert MEM_UNITS_HELP_MSG from c
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/exprs/string-functions-ir.cc@943
PS7, Line 943:   opts->set_max_mem(StringFunctions::re2_mem_limit_);
             : }
             : 
             : void StringFunctions::RegexpPrepare(
             :     FunctionContext* context, FunctionContext::FunctionStateScope scope) {
             :   if (scope != FunctionContext::THREAD_LOCAL) return;
             :   if (!context->IsArgConstant(1)) return;
             :   DCHECK_EQ(context->GetArgType(1)->type, FunctionContext::TYPE_STRING);
             :   Stri
> This comment can be removed?
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/temp.log
File temp.log:

http://gerrit.cloudera.org:8080/#/c/18602/7/temp.log@1
PS7, Line 1: 
> Is this file accidentally committed?
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py@22
PS7, Line 22: 
> Need 2 blank line between"from tests..." and "class TestRE2MaxMem..."
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 18:25:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8275/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:38:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#5).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 122 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 7:

(5 comments)

> (7 comments)

Fixed code style issues

http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@46
PS2, Line 46: (re2_mem_limit);
> I suggest to rename the flag to "re2_mem_limit", so it align with other mem
Thank you for the suggestion! I agree it will be nicer, I will make these changes


http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@938
PS2, Line 938: 
             : // Set the maximum memory used by re2's regex engine for a compiled regex expression's
             : // storage. By default, it uses 8 MiB. This can be used to avoid DFA state cache flush
             : // resulting in slower execution
             : void StringFunctions::SetRE2MemOpt(re2::RE2::Options* opts) {
             :  
This function can possibly be inline. However, for good practice the inline function should be implemented in the header which means that the flag also needs to be defined in the header, but the latter is a bad practice. What do you think we should do, keep it as is or make it inline?


http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc@943
PS4, Line 943: // bool is_percent = false; // not used
             :   // int64_t re2_max_mem_usage =
             :   //     ParseUtil::ParseMemSpec(FLAGS_re2_mem_limit, &is_percent, 0);
             :   // // FIXME: The CLEAN_EXIT_WITH_ERROR throws an error, I am not sure why
             :   // if (re2_max_mem_usage <= 0) {
             :   //  
I'm not sure why this throws an error for me. This compiles in other files


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc@53
PS6, Line 53: uint64_t StringFunctions::re2_mem_limit_ = 8 << 20
> Can this initialization moved to string-functions.h, along with the declara
having an initialized static member variable in header file of c++ is not permitted. I thought this is pretty neat but I would love to hear your opinions on alternatives. One way is to just initialize it when doing the parsing of the flag. It can also be a non-static variable but I think it makes sense for it to be static


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@31
PS2, Line 31: 
> Maybe change this with a boolean arg "expect_fail"? Thus, DEFAULT_RE2_MAX_M
that is true and seems more readable to me. But also there's the case that someone later wouldn't know why the test would fail for small memories with a query failure ("invalid regex pattern"). Do you think the comment is descriptive enough for that?



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 00:17:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10790/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 18:30:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jun 2022 20:47:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
	size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Reviewed-on: http://gerrit.cloudera.org:8080/18602
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 8:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/10789/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 8
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 18:29:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10763/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 22:48:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py@42
PS4, Line 42:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py@42
PS4, Line 42:         # values the regex pattern will fail as invalid pattern although it can be valid 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/4/tests/custom_cluster/test_re2_max_mem.py@44
PS4, Line 44: /
flake8: E501 line too long (110 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jun 2022 23:55:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Abhishek Rawat, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#11).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
	size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py@45
PS10, Line 45:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py@45
PS10, Line 45:             # DFA out of memory issue at that will be brought up when 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:21:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/10777/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 00:36:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10888/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:46:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18602/1/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/1/tests/custom_cluster/test_re2_max_mem.py@33
PS1, Line 33: "
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18602/1/tests/custom_cluster/test_re2_max_mem.py@41
PS1, Line 41: e
flake8: E501 line too long (92 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 22:22:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10748/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 00:17:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#4).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
4 files changed, 98 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18602/3/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/3/be/src/exprs/string-functions-ir.cc@944
PS3, Line 944:   //       Substitute("Invalid mem limit for re2's regex engine: $0", FLAGS_re2_mem_limit));
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/18602/3/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/3/tests/custom_cluster/test_re2_max_mem.py@42
PS3, Line 42: i
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18602/3/tests/custom_cluster/test_re2_max_mem.py@44
PS3, Line 44: /
flake8: E501 line too long (110 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jun 2022 20:28:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: Make re2's max mem option configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: Make re2's max_mem option configurable via an Impala startup flag
......................................................................


Patch Set 2:

(9 comments)

Thank you for your contribution! I'd like to suggest few improvement.

http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-9615: Make re2's max_mem option configurable via an Impala
           : startup flag
Please shorten the commit title to fit within one line.


http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@21
PS2, Line 21: amout
nit: amount


http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@22
PS2, Line 22: greate
nit: greater


http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@46
PS2, Line 46: re2_max_mem_usage
I suggest to rename the flag to "re2_mem_limit", so it align with other mem limit related flag such as mem_limit, admission_control_service_queue_mem_limit, etc.

It is also nice to be able to express this flag value as string pattern like "8MB". You can use ParseUtil::ParseMemSpec() to parse the string pattern to int64_t. See be/src/scheduling/admission-control-service.cc for an example, the admission_control_service_queue_mem_limit flag.


http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@940
PS2, Line 940: _re2_max_mem_usage
nit: Impala code style does not prefix local variable with underscore. So this should be "re2_max_mem_usage".


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@31
PS2, Line 31: max_mem_in
Maybe change this with a boolean arg "expect_fail"? Thus, DEFAULT_RE2_MAX_MEM can be removed.


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@32
PS2, Line 32: ih
nit: in


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@42
PS2, Line 42: wil
nit: will


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@44
PS2, Line 44: See: https://github.com/google/re2/blob/main/re2/re2.h#L619-L648
nit: use permalink to refer to github
https://github.com/google/re2/blob/3e9622e0cd94ebba6e04d5c50b7af32029e330d8/re2/re2.h#L619-L648



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jun 2022 00:01:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#3).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_max_mem_usage flag will
accept an unsigned int64 which will set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_max_mem_usage flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
4 files changed, 250 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10887/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:42:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:38:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Riza Suminto, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#7).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched using
different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory: size xxxxx, bytemap range xx, list count xxxxx".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

The testcase uses a long regex pattern to use up all the memory in the
case of allocating less or the same amount of memory as default for re2.
By using a greater value for re2_mem_limit flag, the regexp can be
consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A temp.log
A tests/custom_cluster/test_re2_max_mem.py
7 files changed, 249 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py@22
PS7, Line 22: class TestRE2MaxMem(CustomClusterTestSuite):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 00:17:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10762/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 22:33:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
......................................................................


Patch Set 7:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag
           : 
> Please shorten the commit title to fit within one line.
Done


http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@21
PS2, Line 21: _limi
> nit: amount
Done


http://gerrit.cloudera.org:8080/#/c/18602/2//COMMIT_MSG@22
PS2, Line 22: th no 
> nit: greater
Done


http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG@15
PS6, Line 15: The re2_mem_limit flag will
            : accept a memory specific
> The re2_mem_limit will accept a memory specification string
Done


http://gerrit.cloudera.org:8080/#/c/18602/6//COMMIT_MSG@21
PS6, Line 21: re2_mem_limit fla
> re2_mem_limit
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/common/global-flags.cc@230
PS7, Line 230: 
             : DEFINE_string(re2_mem_limit, "8MB",
             :     "Maximum bytes of memory to be used by re2's regex engine "
             :     "to hold the compiled form of the regexp. For more memory-consuming patterns, "
             :     "this can be set to be a higher number. Default to 8MB.");
Sorry that I miss this before, but can you insert MEM_UNITS_HELP_MSG from constant-strings.h into this flag documentation please.


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/common/init.cc@346
PS6, Line 346: re2_mem_limit 
> nit: re2_mem_limit
Done


http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@46
PS2, Line 46: (re2_mem_limit);
> Thank you for the suggestion! I agree it will be nicer, I will make these c
Done


http://gerrit.cloudera.org:8080/#/c/18602/2/be/src/exprs/string-functions-ir.cc@940
PS2, Line 940: . By default, it u
> nit: Impala code style does not prefix local variable with underscore. So t
Done


http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/4/be/src/exprs/string-functions-ir.cc@941
PS4, Line 941: // resulting in slower execution
             : void StringFunctions::SetRE2MemOpt(re2::RE2::Options* opts) {
> You can validate this flag once in impala::InitCommonRuntime (be/src/common
Done


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions-ir.cc@53
PS6, Line 53: uint64_t StringFunctions::re2_mem_limit_ = 8 << 20
> having an initialized static member variable in header file of c++ is not p
Thanks for the explanation. Keeping it like this should be sufficient then.


http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18602/7/be/src/exprs/string-functions-ir.cc@943
PS7, Line 943:   // bool is_percent = false; // not used
             :   // int64_t re2_max_mem_usage =
             :   //     ParseUtil::ParseMemSpec(FLAGS_re2_mem_limit, &is_percent, 0);
             :   // // FIXME: The CLEAN_EXIT_WITH_ERROR throws an error, I am not sure why
             :   // if (re2_max_mem_usage <= 0) {
             :   //   CLEAN_EXIT_WITH_ERROR(
             :   //       Substitute("Invalid mem limit for re2's regex engine: $0",
             :   //       FLAGS_re2_mem_limit));
             :   // }
This comment can be removed?


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h@159
PS6, Line 159: re2_mem_limit)
> nit: re2_mem_limit
Done


http://gerrit.cloudera.org:8080/#/c/18602/6/be/src/exprs/string-functions.h@207
PS6, Line 207: re2_mem_limit
> nit: re2_mem_limit_
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/temp.log
File temp.log:

http://gerrit.cloudera.org:8080/#/c/18602/7/temp.log@1
PS7, Line 1: rootLoggerLevel = INFO
Is this file accidentally committed?


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@31
PS2, Line 31: 
> that is true and seems more readable to me. But also there's the case that 
Done


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@32
PS2, Line 32: 
> nit: in
Done


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@42
PS2, Line 42: 
> nit: will
Done


http://gerrit.cloudera.org:8080/#/c/18602/2/tests/custom_cluster/test_re2_max_mem.py@44
PS2, Line 44:   else:
> nit: use permalink to refer to github
Done


http://gerrit.cloudera.org:8080/#/c/18602/6/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/6/tests/custom_cluster/test_re2_max_mem.py@22
PS6, Line 22: class TestRE2MaxMem(CustomClu
> Unused variable, please remove.
Done


http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/7/tests/custom_cluster/test_re2_max_mem.py@22
PS7, Line 22: class TestRE2MaxMem(CustomClusterTestSuite):
> flake8: E302 expected 2 blank lines, found 1
Need 2 blank line between"from tests..." and "class TestRE2MaxMem..."



-- 
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <os...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jun 2022 00:50:15 +0000
Gerrit-HasComments: Yes