You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@impala.apache.org by "Zach Amsden (Code Review)" <ge...@cloudera.org> on 2017/02/24 01:01:32 UTC

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6132

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.
Still working on a test for multiply; this requires an actual scale
overflow so generating the numbers will take a bit more work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
6 files changed, 104 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 635:             result_scale);
> this doesn't look correct when t1.precision - t1.scale + t2.scale + result_
Are you sure you didn't miss the max(6, t1.scale + t2.precision + 1) above?  Otherwise looks correct to me.


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 346:         DCHECK(r != 0);
> Probably clearer if the last part was:
Have fixed this up


Line 350:     DCHECK(sizeof(RESULT_T) > 8 || abs(r) <= DecimalUtil::MAX_UNSCALED_DECIMAL8);
> but we may have promoted from 4 to 8. i think what you're trying to show is
That is what I was trying to say.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Hello Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6132

to look at the new patch set (#14).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
12 files changed, 857 insertions(+), 422 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 635:             result_scale);
> Are you sure you didn't miss the max(6, t1.scale + t2.precision + 1) above?
When the desired result precision is > 38, the java code reduces both precision and scale (see createAdjustedDecimalType()).  This code only reduces precision.

A simple example that would show the difference is:

DECIMAL(38,38) / DECIMAL(38,38).

The java code will give DECIMAL(38,6).  This code will give:

result_scale = min(38, max(6, 38 + 38 + 1)) = 38
result_prec = min(38, 38 - 38 + 38 + 38) = 38
i.e. DECIMAL(38, 38).


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 350:     DCHECK(sizeof(RESULT_T) > 8 || abs(r) <= DecimalUtil::MAX_UNSCALED_DECIMAL8);
> That is what I was trying to say.
How about using the DCHECK I wrote out then, to be clear about this?


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
9 files changed, 515 insertions(+), 381 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 707:           ColumnType t3 = GetResultType(t1, t2, DIVIDE, false);
This could be improved by also testing round = true


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
9 files changed, 813 insertions(+), 399 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1339:   { "1.23 * cast(1 as decimal(20,3))", {{ false, 123000, 23, 5 }}},
> Without the result type change, I have been unable to find a meaningful way
okay, that makes sense.


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS8, Line 311: sp
> single precision (not entirely accurate, but...)
i don't think the name will be obvious, but i also don't really have another suggestion other than x16 or x128, and neither is great, so okay to leave it alone.


Line 346:         DCHECK(r != 0);
> It's actually impossible to get a zero value here (value == non-zero value)
But we may have promoted to a higher type in this case. The inputs might be 4-byte (i.e. 9 or less precision), while the result is 8-bytes (18 or less precision). So, the premise that we didn't need to promote is false.  Or am I misunderstanding?


Line 350:     DCHECK(sizeof(RESULT_T) > 8 || abs(r) <= DecimalUtil::MAX_UNSCALED_DECIMAL8);
> Either we have to promote to a RESULT_T that can hold the result, or there 
but we may have promoted from 4 to 8. i think what you're trying to show is that RESULT_T is chosen correctly (which also means that the static_cast on the next line is a no-op). so, should it be:

DCHECK(abs(r) <= MAX_UNSCALED_DECIMAL16 &&
 (sizeof(RESULT_T) > 8 || abs(r) <= MAX_UNSCALED_DECIMAL8) &&
 (sizeof(RESULT_T) > 4 || abs(r) <= MAX_UNSCALED_DECIMAL4));

?


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 52:                               std::is_same<CVR_REMOVED, __int128>{}, int>::type = 0>
> Sure, but then all the following template conditions will become grotesque:
okay, let's leave it alone.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 13: Code-Review+2

Rebased, fixed merge conflicts in expr-test.cc

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 682: ROUND
> Yeah, I guess that makes sense.  Is there another pass that removes unused 
yes, we run the llvm optimizer after that, which will do inlining and dead code removal. but that's also needed with this ROUND shortcircuit.


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 235:   if (UNLIKELY(delta_scale != 0)) {
> Multiply appears to be buggy.  If we ever need to chop the scale, then resu
yea, this is IMPALA-4939. let's not try to tackle it in this commit. i think IMPALA-4939 means that this path is actually dead code, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 15: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/15/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1406:      { false, 666667, 8, 6}}},
I didn't resolve the conflict correctly :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 327:       if (abs(2 * remainder) >= abs(y)) {
> This is where I'm worried about overflow.  Patient and grueling testing has
should these be:

2 * abs(remainder) >= abs(y)

with whatever casting is needed to make the abs() result unsigned? then we know that doesn't overflow, right?


http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/util/bit-util-test.cc
File be/src/util/bit-util-test.cc:

Line 60:   EXPECT_EQ(BitUtil::IncrementAwayFromZero<int128_t>(-200), -201);
> I had a test for that but it seems to have run away.
if it doesn't work or is undefined, can we add an assert to IncrementAwayFromZero<> that we don't call it?
i.e. instead of that last case returning sizeof * char, shouldn't we just assert that callers don't try to do that?


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 729: \
ROUND && ?


http://gerrit.cloudera.org:8080/#/c/6132/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS5, Line 194: namespace detail {
Why new namespace ? To avoid name collision ?


PS5, Line 203: DCHECK(multiplier > 1);
DCHECK(multiplier > 0 && multiplier % 2 == 0);


PS5, Line 205: RESULT_T result = value / multiplier;
Please consider hoisting this out of the loop so we don't need the else statement.


PS5, Line 208: a multiple of two.
Should there be a DCHECK for this ? See comment above.


PS5, Line 249:   // Check overflow again after rounding since +/-1 could cause decimal overflow
             :   if (result_precision == ColumnType::MAX_PRECISION) {
             :     *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
Should this be inside the if-stmt above ? I don't see any rounding outside of that.


PS5, Line 285:   // Check overflow again after rounding since +/-1 could cause decimal overflow
             :   if (result_precision == ColumnType::MAX_PRECISION) {
             :     *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
Should this be inside the if-stmt above ? I don't see any rounding outside of that.


PS5, Line 314:  if (round) {
Should we skip this if *overflow is true ?


PS5, Line 316:  free
             :       // free
duplicated "free"


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.
Still working on a test for multiply; this requires an actual scale
overflow so generating the numbers will take a bit more work.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
6 files changed, 104 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 235:   if (UNLIKELY(delta_scale != 0)) {
> yea, this is IMPALA-4939. let's not try to tackle it in this commit. i thin
Not totally dead code.  If you tack enough leading zeros on these, you can get legitimate results.  I'll add a test for that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(12 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/6132/2//COMMIT_MSG
Commit Message:

Line 44: | 46178777464.523809523847285 | 61076151920731.010714285714202    |
> why are these results not the same except for the least significant digit? 
This is the sum over many divisions, so you are seeing the compound error of truncation.


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 682: ROUND
> I don't see why we need this. it happens that our Add() et al result types 
Multiply gets defined using this macro, and multiply needs round.  Otherwise, we don't need an extra call to GetConstFnAttr to fetch an unused detail.


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 124:       int result_scale, bool* overflow) const {
> where are these variants used?
Testing code, called from be/src/runtime/decimal-test.cc, which forges its own "column types" without having any actual columns or FunctionContext from which to extract the scale and precision.

I'd be extremely happy to kill these and move them into the actual test instead of littering all over this API, which because of the unused round parameter, already is getting a bit ugly and overloaded.


PS2, Line 155: /* round */ true
> why?
So the decimal test always rounds :)  More precision as a result.


PS2, Line 169: /* round */ true
> same
Same reason.


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 153:   DCHECK_EQ(round, false);
> the DCHECK_EQ(result_scale, std::max(this_scale, other_scale)); means that 
I'm not opposed to removing these.  I inserted them mostly to make sure I got the proper overloads for the test cases.


Line 176:   DCHECK_EQ(round, false);
> same
Done


PS2, Line 200: shift_width
> ShiftWidth
This is now feeling very much like something that belongs in be/src/utils/somewhere


Line 294:       result = scaled_down;
> how about factoring this out into subroutine rather than duplicating?
Done


PS2, Line 328: 127
> should this be 255? (or rather, shift_width()?)
No, r is defined to be int128_t.

However, the round can overflow - we need a bounds check here.


PS2, Line 340: 127
> shift_width()?
Will abstract this out to a rounding helper.  I wrote this before I needed the helper.


Line 353:   DCHECK_EQ(round, false);
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 14: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/325/

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 14: Code-Review+2

Some test cases needed updating for divide and average rounding.

Zach, please double check you agree with the changes to decimal-exprs.test.

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/10/be/src/runtime/types.h
File be/src/runtime/types.h:

Line 147:       int min_scale = std::min(scale, +MIN_ADJUSTED_SCALE);
> I decided to copy this from the Java code in the frontend as it looked usef
Yeah, I think defining them in .cc.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(12 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1339:   { "1.23 * cast(1 as decimal(20,3))", {{ false, 123000, 23, 5 }}},
> i thought you said there were some multiplies you could do to exercise the 
Without the result type change, I have been unable to find a meaningful way to test this.


Line 1408:   // N.B. - Google and python both insist that 999999.998 / 999 is 1001.000999
> doesn't that just mean they use a scale at 6 (and round)?
Probably.  Our current behavior is more precise.  I like it.


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 635:             result_scale);
> this doesn't look correct when t1.precision - t1.scale + t2.scale + result_
I'll double check - this was supposed to be an exact copy of the logic from the Java code


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 114:       result += (BitUtil::Sign(v) ^ 1) + 1;
> why is this not just:
That is a better suggestion.


Line 212:       result += (BitUtil::Sign(value) ^ 1) + 1;
> same
Done


PS8, Line 311: sp
> what does "sp" stand for?
single precision (not entirely accurate, but...)

Let me know if you want me to change this


Line 346:         DCHECK(r != 0);
> I don't understand this comment or this dcheck.  Is "precision" referring t
It's actually impossible to get a zero value here (value == non-zero value).  If we have a remainder (impossible for x == 0), and we didn't need to promote to a higher type, there is no way to have result (r) != 0 because we should have then promoted to a higher type.

Since the only problematic case is actually getting a zero result, and we can't get that, we don't need extra bit manipulations for the special case of zero.


Line 347:         r = BitUtil::IncrementAwayFromZero(r);
> given that this turned out to not be generally usable and you have the Sign
I don't mind doing that.


Line 350:     DCHECK(sizeof(RESULT_T) > 8 || abs(r) <= DecimalUtil::MAX_UNSCALED_DECIMAL8);
> what does this DCHECK mean? why is 8 significant?
Either we have to promote to a RESULT_T that can hold the result, or there is no overflow.  Some of the unit tests violated this condition and required changes (int64_t -> int128_t for divide)


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 52:                               std::is_same<CVR_REMOVED, __int128>{}, int>::type = 0>
> can CVR_REMOVED be defined local to UnsignedWidth(), or does that not work 
Sure, but then all the following template conditions will become grotesque:

  typename std::enable_if(std::is_integral<typename std::decay<T>::type>{} ||
  std::is_same<typename std::decay<T>::type{}, unsigned __int128> || ...

Using a template type parameter is better as not only is it more concise, it eliminates many possibilities for errors in the following by giving everything following the same syntax.

I don't use enable_if lightly, but in this case it is actually prudent to prevent misuse of the definition on unknown types.


PS8, Line 69: positive
> non-negative
Done


Line 72:   constexpr static inline T IncrementAwayFromZero(T value) {
> but is this worth having anymore given you need to use Sign() in most place
Probably not.  The discovery of the signed zero underflow bias has probably killed the utility of this utility.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has uploaded a new patch set (#13).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
11 files changed, 849 insertions(+), 414 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
9 files changed, 813 insertions(+), 399 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#3).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.
Still working on a test for multiply; this requires an actual scale
overflow so generating the numbers will take a bit more work.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
10 files changed, 432 insertions(+), 399 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 4:

Code is ready for review, I don't anticipate much in the way of change there.  There are some pre-existing overflows (and a new one) that are not addressed in this change because I don't want to perturb the tests in this patch set.

Will update to a new patch set to thoroughly illustrate the changes required to tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 682: ROUND
> GetConstFnAttr() is just as free as a constant 'false'.  The call goes away
Yeah, I guess that makes sense.  Is there another pass that removes unused constants after substitution?


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 124:       int result_scale, bool* overflow) const {
> If they are only used by tests, I agree killing them would be nice.
Attempting to do so


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 153:   DCHECK_EQ(round, false);
> but what i'm saying is we should actually pass the correct value of 'round'
Done


Line 235:   if (UNLIKELY(delta_scale != 0)) {
Multiply appears to be buggy.  If we ever need to chop the scale, then results that shouldn't actually overflow after reduction can end up overflowing anyway.

For example, this query overflows despite the fact that the result obviously does not:

 select typeof(t.v), t.v from (select cast(0.001 as decimal(9,4)) * cast(.1 as decimal(38,38))as v) t;


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 346:         DCHECK(r != 0);
> Okay, I think I see why this dcheck is true, but I think this is more about
Probably clearer if the last part was:
(i.e. the precision couldn't have exceeded 38 in this case).


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 346:         DCHECK(r != 0);
> But we may have promoted to a higher type in this case. The inputs might be
Okay, I think I see why this dcheck is true, but I think this is more about scale than precision, no? Is this accurate:

No bias at zero. The result scale was chosen such that the smallest non-zero 'x' divided by the largest 'y' will always produce a non-zero unscaled result, and in this case we know the result scale was not adjusted (due to the precision exceeding 38).


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
9 files changed, 828 insertions(+), 412 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 15:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/326/

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS4, Line 122: /* round */ false
> this should pass 'round'
Done


http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 194: namespace detail {
> add a function comment.
Done


PS4, Line 198: be
> garbled
Done


Line 205:     // here that it is a power of two.
> I think you mean "even" (or "power of ten"), not "power of two".
Done


Line 283:   *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
> shouldn't this have the:
Done


PS4, Line 311: 2 * remainder
> do we know this can't overflow? can you comment on that (and maybe add DCHE
Yes, because the maximum value is at most the 128 bit int scaled up by 10**38 (in reality our scales can't get that high).

I don't know of a way to do bitwise asserts with an int256_t, I'll see if it works.  DCHECK_EQ is almost certain to throw up all over itself.


Line 327:       if (abs(2 * remainder) >= abs(y)) {
This is where I'm worried about overflow.  Patient and grueling testing has not been able to produce one yet.  We should be able to compute a value that overflows into the sign bit when doubled, but to do so requires an actual value that should round away from zero.  The compiler seems to be using an unsigned test here between the abs() comparisons, so despite the undefined behavior, it appears to still get the right answer.

Still investigating this a bit more thoroughly.


http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/util/bit-util-test.cc
File be/src/util/bit-util-test.cc:

Line 37:   EXPECT_EQ(BitUtil::UnsignedWidth<char>(), 7);
> this is implementation dependent. maybe say "signed char" to remove that de
Done


Line 42:   EXPECT_GT(BitUtil::UnsignedWidth<int256_t>(), 256);
Oh, this was the test I mentioned.  BitUtil::IncrementAwayFromZero is undefined for non compiler implemented types.


Line 60:   EXPECT_EQ(BitUtil::IncrementAwayFromZero<int128_t>(-200), -201);
> what happens with the 256 bit type we use in decimal code? does this work?
I had a test for that but it seems to have run away.

It gives larger than 256 bit values because the implementation is non-compact (and also not two's complement, so increment away from zero won't have any possibility to work anyway).


http://gerrit.cloudera.org:8080/#/c/6132/4/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

Line 266: //        }
> please remove this from this diff. it should go in IMPALA-4940 change. (als
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 7:

Okay, this is good for review.  Sorry for so many test cases - but they ended up finding a few good bugs.

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/15/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1406:      { false, 666667, 8, 6}}},
> I didn't resolve the conflict correctly :)
looks good


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 4:

(9 comments)

Just some minor comments about the code.  Once you address these and fill out the test cases for the various code paths, I can take a final look.

http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS4, Line 122: /* round */ false
this should pass 'round'


http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 194: namespace detail {
add a function comment.


PS4, Line 198: be
garbled


Line 205:     // here that it is a power of two.
I think you mean "even" (or "power of ten"), not "power of two".

how about adding a dcheck to document that it's even.


Line 283:   *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
shouldn't this have the:

if (result_precision == MAX_PRECISION)

guard around it too?


PS4, Line 311: 2 * remainder
do we know this can't overflow? can you comment on that (and maybe add DCHECK to justify it)?


http://gerrit.cloudera.org:8080/#/c/6132/4/be/src/util/bit-util-test.cc
File be/src/util/bit-util-test.cc:

Line 37:   EXPECT_EQ(BitUtil::UnsignedWidth<char>(), 7);
this is implementation dependent. maybe say "signed char" to remove that dependency.


Line 60:   EXPECT_EQ(BitUtil::IncrementAwayFromZero<int128_t>(-200), -201);
what happens with the 256 bit type we use in decimal code? does this work?


http://gerrit.cloudera.org:8080/#/c/6132/4/fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
File fe/src/main/java/org/apache/impala/analysis/TypesUtil.java:

Line 266: //        }
please remove this from this diff. it should go in IMPALA-4940 change. (also the if-stmt is what is in createAdjustedDecimalType() which is called on line 275).


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

Patch Set 3:

Note to reviewers - this diff is only partially ready for review. I wanted to send it anyway to gather feedback on whether the API cleanup in decimal-value-inline.h looks good. I think it vastly de-clutters the API to move all of the test related stuff into the actual test.

The test also gets a lot of improvement as a result of the flexibility of specifying any combination of precision / scale without defining a new ColumnType, and the reviewer and future test writers benefit from not having to refer back hundreds of lines to figure out what types t1, t2, t3 actually represent.

What is broken right now is the multiply code - the sketch of the non-overflowing implementation is only partial and not fully correct. This means there are test failures as well.

Plan for this diff is to undo the multiply change, and likely the multiply front-end change as well, to isolate the effect of any typos that may have been introduced in the unit test conversion. Likely the best thing to do is isolate the multiply change out into a separate diff.

--
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#12).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
11 files changed, 849 insertions(+), 414 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 15: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 13: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/322/

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/10/be/src/runtime/types.h
File be/src/runtime/types.h:

Line 147:       int min_scale = std::min(scale, +MIN_ADJUSTED_SCALE);
I decided to copy this from the Java code in the frontend as it looked useful.  In the spirit of letting no good deed go unpunished, the compiler has rewarded me with the little gift of converting MIN_ADJUSTED_SCALE into an lvalue, as std::min takes its values by reference.  What is our standard approach to tackling this?  Provide definitions in the .cc file?  (Fortunately in this cases, there is one).


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 13:

Oh nice, I thought I would have to rebase and fix the conflicts.  Thanks!

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 14:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/325/

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6132/2//COMMIT_MSG
Commit Message:

Line 44: | 46178777464.523809523847285 | 61076151920731.010714285714202    |
why are these results not the same except for the least significant digit? what's the typeof l_quantity, l_tax, and l_quantity / l_tax?


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 3:

The decimal-value.h cleanup looks good to me.

Can you split the fix for multiply overflow into a separate change though?

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 682: ROUND
> Multiply gets defined using this macro, and multiply needs round. Otherwis
GetConstFnAttr() is just as free as a constant 'false'. The call goes away and is replaced by a constant when generating the code.

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 124: int result_scale, bool* overflow) const {
> Testing code, called from be/src/runtime/decimal-test.cc, which forges its
If they are only used by tests, I agree killing them would be nice.

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 153: DCHECK_EQ(round, false);
> I'm not opposed to removing these. I inserted them mostly to make sure I g
but what i'm saying is we should actually pass the correct value of 'round', so the dcheck isn't valid. It happens that given our current result type choices, rounding for these requires no changes.

--
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 12: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#11).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
11 files changed, 849 insertions(+), 414 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Hello Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6132

to look at the new patch set (#15).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
12 files changed, 861 insertions(+), 426 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 682: ROUND
I don't see why we need this. it happens that our Add() et al result types don't need rounding, but that's because of how we choose their result type.  i.e. these operations already handle rounding enabled and disabled correctly (since the answer is the same, and they have the dchceks in place already for not losing scale). 

But that's a detail that the decimal-value code cares about, not this code.  also see my comments there.


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 124:       int result_scale, bool* overflow) const {
where are these variants used?


PS2, Line 155: /* round */ true
why?


PS2, Line 169: /* round */ true
same


http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 153:   DCHECK_EQ(round, false);
the DCHECK_EQ(result_scale, std::max(this_scale, other_scale)); means that we never lose scale and so rounding is a no-op, so why not allow the caller to pass the actual value of 'round'?  One day, we may change the result type of Add(), and in some cases lose scale, in which case round will no longer be a no-op and at that point the dcheck at line 152 would no longer hold telling us that this code needs to be updated.


Line 176:   DCHECK_EQ(round, false);
same


Line 198: // Helper because int128_t is not part of std::numeric_limits
explain what this does


PS2, Line 200: shift_width
ShiftWidth
though the name doesn't really tell you what it does. maybe the subroutine should be something like:

T IncrementAwayZero(T value) {
   int shift_by = ... stuff here...
   return value + (1 | value >> shift_by);
}

and then you can use that all over.


Line 294:       result = scaled_down;
how about factoring this out into subroutine rather than duplicating?


PS2, Line 328: 127
should this be 255? (or rather, shift_width()?)


PS2, Line 340: 127
shift_width()?


Line 353:   DCHECK_EQ(round, false);
same


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#10).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
10 files changed, 842 insertions(+), 414 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Reviewed-on: http://gerrit.cloudera.org:8080/6132
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
12 files changed, 861 insertions(+), 426 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#4).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
10 files changed, 398 insertions(+), 381 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6132/6/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 113:       result = BitUtil::IncrementAwayFromZero(result);
Same bug strikes here.  Will add a test for:

cast(cast(-1 * pow(1, -38) as decimal(38,38)) as bigint)

which returns, you guessed: 1


Line 210:       result = BitUtil::IncrementAwayFromZero(result);
Ditto, although we can't really test this code yet.  Will mark with a TODO


Line 345:         r = BitUtil::IncrementAwayFromZero(r);
Unaffected.


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................

IMPALA-4813: Round on divide and multiply

Address rounding on divide and multiply when results are truncated.

Testing: Manually ran some divides that should overflow, then added
the results to the test.  Made the decimal-test use rounding behavior
by default, and now the error margin of the test has decreased.

Initial perf results:

Multiply is totall uninteresting so far, all implementations
return the same values in the same time:

+-------------------------+-----------------------------------+
| sum(l_quantity * l_tax) | sum(l_extendedprice * l_discount) |
+-------------------------+-----------------------------------+
| 61202493.3700           | 114698450836.4234                 |
+-------------------------+-----------------------------------+
Fetched 1 row(s) in 1.13s

Divide shows no regression from prior with DECIMAL_V2 off:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809516381723 | 61076151920731.010714279183910    |
+-----------------------------+-----------------------------------+
before:  Fetched 1 row(s) in 13.08s
after:   Fetched 1 row(s) in 13.06s

And with DECIMAL_V2 on:

+-----------------------------+-----------------------------------+
| sum(l_quantity / l_tax)     | sum(l_extendedprice / l_discount) |
+-----------------------------+-----------------------------------+
| 46178777464.523809523847285 | 61076151920731.010714285714202    |
+-----------------------------+-----------------------------------+
Fetched 1 row(s) in 16.06s

So the performance regression is not as bad as expected.  Still,
divide performance could use some work.

Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/string-parser.h
9 files changed, 805 insertions(+), 399 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6132/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 13:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/322/

-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 8:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1339:   { "1.23 * cast(1 as decimal(20,3))", {{ false, 123000, 23, 5 }}},
i thought you said there were some multiplies you could do to exercise the multiply rounding path?


Line 1408:   // N.B. - Google and python both insist that 999999.998 / 999 is 1001.000999
doesn't that just mean they use a scale at 6 (and round)?


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

Line 635:             result_scale);
this doesn't look correct when t1.precision - t1.scale + t2.scale + result_scale > 38. In that case, we reduce the scale (but always preserve at least 6).


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 114:       result += (BitUtil::Sign(v) ^ 1) + 1;
why is this not just:

result += BitUtil::Sign(v);

?


Line 212:       result += (BitUtil::Sign(value) ^ 1) + 1;
same


PS8, Line 311: sp
what does "sp" stand for?


Line 346:         DCHECK(r != 0);
I don't understand this comment or this dcheck.  Is "precision" referring to decimal precision? And precision of what? and what do you mean by "always have a value"?  non-negative value?  and why is this case fundamentally different than the sizeof(T) == 16 case?

it doesn't seem true that rounding is needed if r != 0, but if r == 0, then rounding is never needed...


Line 347:         r = BitUtil::IncrementAwayFromZero(r);
given that this turned out to not be generally usable and you have the Sign() abstraction, how about getting rid of this and just using:

r += Sign(r); 

here?


Line 350:     DCHECK(sizeof(RESULT_T) > 8 || abs(r) <= DecimalUtil::MAX_UNSCALED_DECIMAL8);
what does this DCHECK mean? why is 8 significant?


http://gerrit.cloudera.org:8080/#/c/6132/8/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 52:                               std::is_same<CVR_REMOVED, __int128>{}, int>::type = 0>
can CVR_REMOVED be defined local to UnsignedWidth(), or does that not work for some reason?


PS8, Line 69: positive
non-negative


Line 72:   constexpr static inline T IncrementAwayFromZero(T value) {
but is this worth having anymore given you need to use Sign() in most places, and that using this is the wrong thing to do when there are fractional parts?


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4813: Round on divide and multiply

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-4813: Round on divide and multiply
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6132/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 729: \
> ROUND && ?
I thought I already got rid of this, it's not needed.


http://gerrit.cloudera.org:8080/#/c/6132/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS5, Line 194: namespace detail {
> Why new namespace ? To avoid name collision ?
Yeah, even though this is "inline.h", this is still a private implementation detail.


PS5, Line 203: DCHECK(multiplier > 1);
> DCHECK(multiplier > 0 && multiplier % 2 == 0);
We still want multiplier > 1, since delta_scale > 0


PS5, Line 205: RESULT_T result = value / multiplier;
> Please consider hoisting this out of the loop so we don't need the else sta
Done


PS5, Line 208: a multiple of two.
> Should there be a DCHECK for this ? See comment above.
Done


PS5, Line 249:   // Check overflow again after rounding since +/-1 could cause decimal overflow
             :   if (result_precision == ColumnType::MAX_PRECISION) {
             :     *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
> Should this be inside the if-stmt above ? I don't see any rounding outside 
Done


PS5, Line 285:   // Check overflow again after rounding since +/-1 could cause decimal overflow
             :   if (result_precision == ColumnType::MAX_PRECISION) {
             :     *overflow |= abs(result) > DecimalUtil::MAX_UNSCALED_DECIMAL16;
> Should this be inside the if-stmt above ? I don't see any rounding outside 
Done


PS5, Line 314:  if (round) {
> Should we skip this if *overflow is true ?
probably not worth it.  If things can inline, we may start computing x % y before overflow's value is known, so we may get better code.


PS5, Line 316:  free
             :       // free
> duplicated "free"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6bfcbe37555b74598d409c6f84f06b0ae5c4312
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes