You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2019/03/16 06:14:12 UTC

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Hello Lars Volker,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12772

to review the following change.


Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................

IMPALA-5393. Use THREAD_LOCAL state for regexp

This changes the built-in regexp-related UDFs to use THREAD_LOCAL
re2::RE instances intead of FRAGMENT_LOCAL.

Although re2::RE is thread-safe, it achieves that thread safety through
a certain amount of locking. Using thread-local regexps improves
performance substantially.

I ran a simple test query:

select sum(l_linenumber) from item_20x where length(regexp_extract(l_shipinstruct, '.*', 0)) > 0

on a table with three underlying parquet files (thus getting 3 scanner
threads). Prior to this change, the query took ~60 seconds and burned
2m16sec CPU time. With this change, it took ~19sec and 43s CPU time. For
a query with more scanner threads, the improvement should be even more
dramatic.

The only potential downside of this change is slightly increased memory
consumption by having one RE instance per thread, but the REs themselves
should be small relative to all of the other per-scanner-thread memory.

Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
---
M be/src/exprs/string-functions-ir.cc
1 file changed, 6 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12772/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3919/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 18 Mar 2019 17:03:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 2: Code-Review+2

Forwarding +2


-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 18 Mar 2019 23:55:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2451/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Sat, 16 Mar 2019 06:58:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12772/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12772/1//COMMIT_MSG@10
PS1, Line 10: intead
nit: typo



-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Mar 2019 01:35:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 18 Mar 2019 21:12:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12772

to look at the new patch set (#2).

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................

IMPALA-5393. Use THREAD_LOCAL state for regexp

This changes the built-in regexp-related UDFs to use THREAD_LOCAL
re2::RE instances instead of FRAGMENT_LOCAL.

Although re2::RE is thread-safe, it achieves that thread safety through
a certain amount of locking. Using thread-local regexps improves
performance substantially.

I ran a simple test query:

select sum(l_linenumber) from item_20x where length(regexp_extract(l_shipinstruct, '.*', 0)) > 0

on a table with three underlying parquet files (thus getting 3 scanner
threads). Prior to this change, the query took ~60 seconds and burned
2m16sec CPU time. With this change, it took ~19sec and 43s CPU time. For
a query with more scanner threads, the improvement should be even more
dramatic.

The only potential downside of this change is slightly increased memory
consumption by having one RE instance per thread, but the REs themselves
should be small relative to all of the other per-scanner-thread memory.

Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
---
M be/src/exprs/string-functions-ir.cc
1 file changed, 6 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12772/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12772/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12772/1//COMMIT_MSG@10
PS1, Line 10: instea
> nit: typo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 18 Mar 2019 17:03:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5393. Use THREAD LOCAL state for regexp

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12772 )

Change subject: IMPALA-5393. Use THREAD_LOCAL state for regexp
......................................................................

IMPALA-5393. Use THREAD_LOCAL state for regexp

This changes the built-in regexp-related UDFs to use THREAD_LOCAL
re2::RE instances instead of FRAGMENT_LOCAL.

Although re2::RE is thread-safe, it achieves that thread safety through
a certain amount of locking. Using thread-local regexps improves
performance substantially.

I ran a simple test query:

select sum(l_linenumber) from item_20x where length(regexp_extract(l_shipinstruct, '.*', 0)) > 0

on a table with three underlying parquet files (thus getting 3 scanner
threads). Prior to this change, the query took ~60 seconds and burned
2m16sec CPU time. With this change, it took ~19sec and 43s CPU time. For
a query with more scanner threads, the improvement should be even more
dramatic.

The only potential downside of this change is slightly increased memory
consumption by having one RE instance per thread, but the REs themselves
should be small relative to all of the other per-scanner-thread memory.

Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Reviewed-on: http://gerrit.cloudera.org:8080/12772
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M be/src/exprs/string-functions-ir.cc
1 file changed, 6 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibc331151a302e755701cb08adb3e6f289d54c3a6
Gerrit-Change-Number: 12772
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>