You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/06/05 19:17:55 UTC

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7081

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................

IMPALA-5347: reduce codegen overhead of timestamp trunc()

Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.

This patch avoids the problem by moving the implementation out of the
cross-compiled code.

Testing:
Ran expr-test.

I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.

Perf:
I ran the following targeted benchmark:
  set num_nodes=1;
  set num_scanner_threads=1;
  select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
  '1998-01-01'

The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.

Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 275 insertions(+), 227 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7081

to look at the new patch set (#2).

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................

IMPALA-5347: reduce codegen overhead of timestamp trunc()

Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.

This patch avoids the problem by moving the implementation out of the
cross-compiled code.

Testing:
Ran expr-test.

I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.

Perf:
I ran the following targeted benchmark:
  set num_nodes=1;
  set num_scanner_threads=1;
  select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
  '1998-01-01'

The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.

Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 277 insertions(+), 229 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 3: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7081/2/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:

Line 16: // under the License.
mention somewhere that these functions should specifically not get cross-compiled (otherwise the next person might decide there's something to be gained from ...).


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/680/

-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 1: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:

PS1, Line 242: 
May make sense to codegen and constant propagate in this case.


http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.h
File be/src/exprs/udf-builtins.h:

PS1, Line 67: //
nit:///


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 2: Code-Review+1

Carry +1

-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:

PS1, Line 242: 
> May make sense to codegen and constant propagate in this case.
Yeah I agree it would be nice, I don't think we have the infrastructure now to do this in a generic way though, given the dispatch logic to map a string to an implementation is non-trivial. I didn't want to get sidetracked implementing a special-case optimisation here.


http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.h
File be/src/exprs/udf-builtins.h:

PS1, Line 67: //
> nit:///
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Michael Ho,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7081

to look at the new patch set (#3).

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................

IMPALA-5347: reduce codegen overhead of timestamp trunc()

Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.

This patch avoids the problem by moving the implementation out of the
cross-compiled code.

Testing:
Ran expr-test.

I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.

Perf:
I ran the following targeted benchmark:
  set num_nodes=1;
  set num_scanner_threads=1;
  select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
  '1998-01-01'

The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.

Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 280 insertions(+), 229 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/7081/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:

PS1, Line 242: 
> Yeah I agree it would be nice, I don't think we have the infrastructure now
I concur.May be a TODO ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7081/2/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:

Line 16: // under the License.
> mention somewhere that these functions should specifically not get cross-co
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


IMPALA-5347: reduce codegen overhead of timestamp trunc()

Trunc has many implementations that are switched between based on a
string argument. Before this patch all implementations were compiled for
every call to trunc(), which added a lot of unnecessary codegen time.

This patch avoids the problem by moving the implementation out of the
cross-compiled code.

Testing:
Ran expr-test.

I ran the repro query from IMPALA-5347 and verified that codegen time
was significantly reduced from ~1.4s to ~.35s.

Perf:
I ran the following targeted benchmark:
  set num_nodes=1;
  set num_scanner_threads=1;
  select count(*) from lineitem where trunc(l_shipdate, 'yy') >=
  '1998-01-01'

The end-to-end query latency was reduced to 0.52s from 0.72s on
average. The time spent in the scanner increased slightly from
around 390ms to around 410ms. This seems like a good-tradeoff.

Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Reviewed-on: http://gerrit.cloudera.org:8080/7081
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/udf-builtins-ir.cc
A be/src/exprs/udf-builtins.cc
M be/src/exprs/udf-builtins.h
4 files changed, 280 insertions(+), 229 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5347: reduce codegen overhead of timestamp trunc()

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: reduce codegen overhead of timestamp trunc()
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7081/1/be/src/exprs/udf-builtins.cc
File be/src/exprs/udf-builtins.cc:

Line 171:   // TODO: it would be nice to resolve the branch before codegen so we can optimise
I put a TODO here


-- 
To view, visit http://gerrit.cloudera.org:8080/7081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I222258f51b2093a38929df847fdb5d25bb9aafc3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes