You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/06/05 19:17:55 UTC
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Tim Armstrong has uploaded a new change for review.
http://gerrit.cloudera.org:8080/7081
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
IMPALA-5347: reduce codegen overhead of timestamp trunc()
Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.
This patch avoids the problem by moving the implementation out of the
cross-compiled code.
Testing:
Ran expr-test.
I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.
Perf:
I ran the following targeted benchmark:
set num_nodes=1;
set num_scanner_threads=1;
select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
'1998-01-01'
The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.
Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 275 insertions(+), 227 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/1
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7081
to look at the new patch set (#2).
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
IMPALA-5347: reduce codegen overhead of timestamp trunc()
Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.
This patch avoids the problem by moving the implementation out of the
cross-compiled code.
Testing:
Ran expr-test.
I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.
Perf:
I ran the following targeted benchmark:
set num_nodes=1;
set num_scanner_threads=1;
select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
'1998-01-01'
The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.
Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 277 insertions(+), 229 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/2
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 3: Code-Review+2
Carry +2
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 2: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7081/2/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:
Line 16: // under the License.
mention somewhere that these functions should specifically not get cross-compiled (otherwise the next person might decide there's something to be gained from ...).
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 3: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 3:
Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/680/
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 1: Code-Review+1
(2 comments)
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:
PS1, Line 242:
May make sense to codegen and constant propagate in this case.
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.h
File be/src/exprs/udf-builtins.h:
PS1, Line 67: //
nit:///
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 2: Code-Review+1
Carry +1
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 1:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:
PS1, Line 242:
> May make sense to codegen and constant propagate in this case.
Yeah I agree it would be nice, I don't think we have the infrastructure now to do this in a generic way though, given the dispatch logic to map a string to an implementation is non-trivial. I didn't want to get sidetracked implementing a special-case optimisation here.
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.h
File be/src/exprs/udf-builtins.h:
PS1, Line 67: //
> nit:///
Done
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Michael Ho,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7081
to look at the new patch set (#3).
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
IMPALA-5347: reduce codegen overhead of timestamp trunc()
Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.
This patch avoids the problem by moving the implementation out of the
cross-compiled code.
Testing:
Ran expr-test.
I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.
Perf:
I ran the following targeted benchmark:
set num_nodes=1;
set num_scanner_threads=1;
select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
'1998-01-01'
The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.
Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 280 insertions(+), 229 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/3
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 2:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:
PS1, Line 242:
> Yeah I agree it would be nice, I don't think we have the infrastructure now
I concur.May be a TODO ?
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 2:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7081/2/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:
Line 16: // under the License.
> mention somewhere that these functions should specifically not get cross-co
Done
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
IMPALA-5347: reduce codegen overhead of timestamp trunc()
Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.
This patch avoids the problem by moving the implementation out of the
cross-compiled code.
Testing:
Ran expr-test.
I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.
Perf:
I ran the following targeted benchmark:
set num_nodes=1;
set num_scanner_threads=1;
select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
'1998-01-01'
The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.
Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Reviewed-on: http://gerrit.cloudera.org:8080/7081
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 280 insertions(+), 229 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Tim Armstrong: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:
Line 171: // TODO: it would be nice to resolve the branch before codegen so we can optimise
I put a TODO here
--
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes