You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@impala.apache.org by "Zach Amsden (Code Review)" <ge...@cloudera.org> on 2017/02/09 03:24:44 UTC

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5951

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................

IMPALA-2020: Add rounding when casting from decimal to int

Preview diff, not working yet, mostly to see if I can successfully
push diffs against my ASF Impala fork, and also to get early
feedback on the UDF change.  Update - pushing from fork didn't
work.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
4 files changed, 37 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: Still in progrress.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
7 files changed, 149 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, &overflow, true);
> Will Impala 2.9 and Impala 2.8 give different answers when decmial_v2=false
This functionality (constructing Literals given a ColumnType) is only used for testing the Literal class as far as I can tell.  See comment at be/src/exprs/literal.h:77


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS5, Line 54: bool* overflow,
            :       bool round
> Regarding default arguments, let's avoid adding them unless they really mak
agree


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#10).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 344 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/22/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS22, Line 54: UNLIKELY(std::isnan(d)) || UNLIKELY(std::fabs(d) >= max_value))
> nit: UNLIKELY(std::isnan(d) || std::fabs(d) >= max_value))
clang at least generates better code with the double UNLIKELY form.  It seems all UNLIKELY conditions are combined into the same forward jump, where as the combined UNLIKELY form has a forward branch to decide the inner condition which gets taken on the likely path:

https://godbolt.org/g/fnGuSP

I'll try out a couple other compilers and see what results I get.  This is a most interesting experiment I've never actually done before.

Update: GCC gets the optimal form both ways.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS5, Line 303: return to_type(dv.whole_part(scale)); 
> Not really - I thought the saner behavior for ToInt was rounding and wrote 
I don't feel too strongly about it, but think the symmetry in the interfaces is nice. 

If you leave the non-rounding case here, then line 303 should be formatted this way per our style:

} else {
   return ...
}

We only omit line breaks and braces if the entire if-stmt fits on one line (and maybe doesn't have an else clause?).


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, &overflow, true);
> In what sense?
Will Impala 2.9 and Impala 2.8 give different answers when decmial_v2=false?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS5, Line 54: bool* overflow,
            :       bool round
> Done.  Means I can't give a default value to round unless I also give a def
Regarding default arguments, let's avoid adding them unless they really make things simpler. Often, they just lead to coding errors since new users of the interface don't bother to think about the right values to pass.

Regarding default value for overflow, I don't think that makes sense since as you concluded, all callers should care about overflow. If they don't, that probably indicates a bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 18:

> Uploaded patch set 20.

 > Uploaded patch set 21.

rebased and fixed merge conflict

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/22/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS22, Line 54: UNLIKELY(std::isnan(d)) || UNLIKELY(std::fabs(d) >= max_value))
> clang at least generates better code with the double UNLIKELY form.  It see
I am fine to keep the code as-is. Based on the offline discussion, your assumption is that forward jump tends to be predicted as not taken.

Also, we use clang to compile fucntions to IR and not machine code. It may be more useful to find out how the IR generated is different with these UNLIKELY combinations and the implication to other usages of UNLIKELY or LIKELY in the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 9:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS9, Line 303: round
> what i meant earlier was we should either pass 'round' here and get rid of 
Otherwise this is a behavior change.  dv.ToInt detects overflow.  to_type(dv.whole_part(scale)) does not.


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS9, Line 1442:  { false, 9999999900, 10, 2 }}},
> you can delete this line since the results match.
Done


Line 1460:      { true, 0, 4, 0 }}},
> let's include tests at the boundary of the 'int' ranges to exercise the new
will do


Line 1503:      { true, 0, 3, 0 }}},
> what happens on the "exact" boundary, i.e. +/- 0.5? why not test that case 
due to scaling, we get coverage anyway as 0.45 ends in 5.  I can add it, but I don't think it gives any better coverage.


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS9, Line 232: Optionally rounds to the nearest integer,
             :   /// defined as half round away from zero.
> this contradicts the first sentence which says rounding always happens. but
has to wait for V1 to be deleted


PS9, Line 234: RESULT_T
> this should be explained in the comment as well.
Done


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 43: precise
> what is the consequence of that? do we get the wrong answer with the curren
Well, there aren't enough bits in the floating point multiplier to fully represent the larger scale multipliers.  We have 10^15 being around a 53-bit number, which is what we have for our mantissa and we get another 10 or so bits of representation for free because of the trailing zeros in the representation of 10 = 5 * 2, so after scales of somewhere less than 25, DecimalUtil::GetScaleMultiplier<double> will only be an approximation of the actual number.  If it rounds to nearest, then values below (10**scale) may overflow when multiplied by this number.

Note both V1 and V2 code have this problem - the overflow check in V1 logic may miss a wraparound to a very low number.  The V2 code does not have this problem, but instead may have the problem of false overflow because of an inaccurate double representation of large scales.

I'd rather fix this issue in a separate change.


PS9, Line 110: auto
> we generally use auto only when it makes the code more readable (because, s
Done


PS9, Line 111: auto
> same
Done


PS9, Line 119: DCHECK
> DCHECK_EQ (it'll print both sides on failure).
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 119: DCHECK
> Done
Couldn't do this.  DCHECK_EQ doesn't know how to print int128_t and larger values.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/4/be/src/udf/udf.h
File be/src/udf/udf.h:

Line 437:     // XXX Null floats only compare equal if value is also equal?
I'm filing a JIRA for this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 22: Code-Review+2

Michael, any more comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1
> How so? We've already added the new V2 result type formula for %, and you g
Old comment, I mistook 38, 38 for 38, 0


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 113:   if (divisor == 1) {
> Things like dcheck, the divide, modulus and rounding are completely unneces
I guess most of those would be constant folded away by the compiler once we propagate 'scale' as constant. divisor would be known at that point and it would just constant fold remainder to 0.
I guess it may not be able to remove abs(remainder) but other than that it's mostly the same as the code above.


PS10, Line 117: const T remainder = v % divisor;
> No, but I was worried that moving the computation further away from the div
May be it's more important to look at the cross-compiled IR file output as that's gonna be the version we'll use.


PS10, Line 127: typename RESULT_T::underlying_type_t
> I can only get rid of this one, not the other, so it seems more confusing j
It's applicable to the next line too, right ? The following looks easier to parse IMHO but I guess some would say it's personal preference:

*overflow = result > std::numeric_limits<result_type>::max() ||
      result < std::numeric_limits<result_type>::min();


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#15).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Note there is no free lunch:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:49 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb66700000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.76s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b0300000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.73s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:59 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.85s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:17:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c600000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.96s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=d84e0f69718bb16d:f069dae600000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:24:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=5849852a17314252:73e2433f00000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.86s
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:24:04 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=b842bc8eaaf9e85d:70616a6b00000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.86s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/15
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 49: double
> shouldn't this be T?
No, T may not have enough precision to hold the value of 10^scale - look at the old code, which also does the multiplication in double.


PS12, Line 52: max_value < 0
> why is the max_value < 0 check needed?
If we cast to a decimal type whose underlying width is not large enough to hold the precision, this will become true.

Make it a DCHECK?  This should be impossible, guaranteed by the type system.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS9, Line 303: round
> Otherwise this is a behavior change.  dv.ToInt detects overflow.  to_type(d
Yeah, I got that.  What I'm saying is why not remove the 'round' paramter since we can't use it anyway?  I originally suggested adding that parameter assuming it would eliminate the need for the else-clause, but since that's not the case, why keep the  dead (and untested) code?


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1503:      { true, 0, 3, 0 }}},
> due to scaling, we get coverage anyway as 0.45 ends in 5.  I can add it, bu
okay.  i missed that the scale=1 case covers that.


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 43: precise
> Well, there aren't enough bits in the floating point multiplier to fully re
sure, fixing in a separate change sounds fine.  If there's no jira for that, can you open one and note the number in the todo?


PS9, Line 119: DCHECK
> Couldn't do this.  DCHECK_EQ doesn't know how to print int128_t and larger 
ah, okay.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 113:   if (divisor == 1) {
what's this for? is it an optimization for scale==0 case?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: Still in progrress.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 219 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Reviewed-on: http://gerrit.cloudera.org:8080/5951
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 472 insertions(+), 110 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, &overflow, true);
> Shouldn't this still be dependent on the behavior we want to test ? We test
I really don't want to complicate tests any more than necessary so since this is just for tests, I just gave this the sane expected behavior.  If this happens to break some tests, we'll find out and fix them, but this should not change Impala behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 16:

(1 comment)

After fighting the compiler for over an hour to figure out why our operator<< was not being called, I found this little gem in glog/logging.h and since abandoned getting DCHECK_EQ / GT to work.

/ Build the error message string.
template<class t1, class t2>
std::string* MakeCheckOpString(const t1& v1, const t2& v2, const char* names) {
  // It means that we cannot use stl_logging if compiler doesn't
  // support using expression for operator.
  // TODO(hamaji): Figure out a way to fix.
#if 1
  using ::operator<<;
#endif
  std::strstream ss;
  ss << names << " (" << v1 << " vs. " << v2 << ")";
  return new std::string(ss.str(), ss.pcount());
}

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
> please pare down the commit msg, in particular get rid of the verbatim shel
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> std::abs() is not defined for int128_t
maybe add a comment for future readers of this code:
// std::abs() is not defined for int128_t.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 22: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/22/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS22, Line 54: UNLIKELY(std::isnan(d)) || UNLIKELY(std::fabs(d) >= max_value))
nit: UNLIKELY(std::isnan(d) || std::fabs(d) >= max_value))


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/16//COMMIT_MSG
Commit Message:

Line 64: Query: select sum(cast(l_extendedprice as bigint)) from
please pare down the commit msg, in particular get rid of the verbatim shell output (a table with the numbers is sufficient to get the point across).


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 3:

> I'm not opposed to cleaning up the AnyVal stuff like that, but
 > given that udf.h stuff dictates UDF compatibility, it's not
 > completely trivial. It doesn't look like it would break binary
 > compatibility though. But, in case something goes wrong, how about
 > we do that as a separate change so it could be backed out without
 > affecting the decimal work? It doesn't look like the decimal stuff
 > will depend on it, right?

No, but it gets a lot cleaner to test the limits by giving the generic form an underlying type.  I deliberately did not change FloatVal, since the equality operator is currently kind of broken.  Everything else should be binary compatible.

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#19).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 469 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/19
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1496:       10000000000ll * 10000000000ll * 10000000000ll, 38, 38}}}
Missing comma - got hit by a bad merge.  expr-test was the only thing rebuilt so there wasn't much in the way of errors to spot.


http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> maybe add a comment for future readers of this code:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#4).

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................

IMPALA-2020: Add rounding when casting from decimal to int

First implementation of rounding mode on CAST from decimal
to int.

Testing: In progress.  Expect a full test suite for both modes and
all edge cases to be covered.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
4 files changed, 57 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/3/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 84:   auto divisor = DecimalUtil::GetScaleMultiplier<T>(scale);
doesn't round, of course.


Line 88:     return RESULT_T::null();
how about sticking with the convention used in the rest of this file where the caller decides what to do with overflow by returning via *overflow? The reason is because in the not to distant future, we'll want to raise an error on overflow rather than silently convert to null.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#20).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 470 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/20
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 11:

I have not updated the Python test yet, as that would require a rebase.  Python test and benchmark will come in next patch set, this is just to give a clean slate for review.

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#11).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 374 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 11:

(5 comments)

> I have not updated the Python test yet, as that would require a
 > rebase.  Python test and benchmark will come in next patch set,
 > this is just to give a clean slate for review.

Thanks.  Just some minor comments on the code. 

I think the benchmark is higher priority than the python test since we already have pretty good functional test coverage and so less likely to be surprised, whereas there could be a perf regression.

We could even do the python test as a follow on commit if that is a fair amount of work (perhaps even after taking care of divide rounding).

http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 1526: power(10, 3) - power(10, -1)
at some point in the future, we may implement a decimal variant of POWER(), in which case these test cases would no longer exercise the CAST(double as decimal) path.  So, how about either adding a check:

// The CAST FLOAT -> DECIMAL test cases rely on power() returning
// a double. If that changes, those test cases need to be updated.
TestStringValue("typeOf(power(10,1))", "DOUBLE");

Or, we could add the explicit cast to double/float now.


Line 1601:     {{ false, -999, 3, 0 }}},
there all do DOUBLE -> DECIMAL.  How about adding some FLOAT -> DECIMAL cases as well?


http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS11, Line 232: Optionally
delete "Optionally" since it's not optional.


PS11, Line 233: otherwise the result is truncated.
delete since this doesn't do truncation


http://gerrit.cloudera.org:8080/#/c/5951/11/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS11, Line 45: mantissaa
mantissa


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/3/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS3, Line 297: round ? dv.ToIntVal<to_type>(scale) : to_type(dv.whole_part(scale));
perhaps pass 'round' to ToInt() and move the "else" block in there since it's basically the truncation case. that way we're more symmetric.

and then the to_type can stay here, and instead ToInt() can return the primitive (like whole_part() does), which would keep all the conversions between DecimalValue and udf::*IntVal types within this code, which I think makes more sense since this code is the UDF builtin code, whereas DecimalValue is the decimal slot representation code internal to impala.


http://gerrit.cloudera.org:8080/#/c/5951/3/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 88:     return RESULT_T::null();
> That's a good idea.  Should I still be returning raw UDF types (in which ca
I would probably leave the UDF type (e.g. BigIntVal) out of here and just return the raw primitive, unless using the UDF type simplifies things for some reason.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5951/12//COMMIT_MSG
Commit Message:

PS12, Line 15: I tried to update the expr benchmark 
let's just do a time a simple query with the casts you've modified over a large enough table to check for regressions, before and after this change.


http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 49: double
shouldn't this be T?


PS12, Line 52: max_value < 0
why is the max_value < 0 check needed?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5951/10//COMMIT_MSG
Commit Message:

PS10, Line 10: Still writing tests,
             : but this is ready for human eyes to look at.
Not needed in the commit message anymore, right ?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, true, &overflow);
> This functionality (constructing Literals given a ColumnType) is only used 
Shouldn't this still be dependent on the behavior we want to test ? We test both the v1 vs v2 behavior in the BE test. May consider taking an extra argument (bool round) ?


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS10, Line 237: inline typename RESULT_T::underlying_type_t ToInt(int scale, bool round, bool* overflow) const;
long line


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // TODO: this could be made more precise by doing the computation in
             :     // 64 bit with 128 bit immediate result instead of doing an additional
             :     // multiply in 53-bit double precision floating point
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
             :     // Truncate and just take the integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   }
Appears to me that we should be able to share the code for the most part:

d *= DecimalUtil::GetScaleMultiplier<double>(scale);
if (round) d = std::round(d);
const T max_value = DecimalUtil::GetScaleMultiplier<T>(precison);
if (abs(d) >= max_value) {
   ...
}

return DecimalValue(static_cast<T>(d));


PS10, Line 108: typename RESULT_T::underlying_type_t
May help to add a comment that underlying_type_t is defined only for the *IntType in udf.h


Line 113:   if (divisor == 1) {
> what's this for? is it an optimization for scale==0 case?
If it's for optimization, constant propagation in codegen would have reaped most of the benefit as this is inlined into cross-compiled code.


PS10, Line 117: const T remainder = v % divisor;
Do we not need to compute this if round is false ?


PS10, Line 119: DCHECK(
DCHECK_EQ


PS10, Line 127: typename RESULT_T::underlying_type_t
May be consider doing a typedef typename RESULT_T::underlying_type_t result_type; inside here for ease of reading.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 2:

I'm not opposed to cleaning up the AnyVal stuff like that, but given that udf.h stuff dictates UDF compatibility, it's not completely trivial. It doesn't look like it would break binary compatibility though. But, in case something goes wrong, how about we do that as a separate change so it could be backed out without affecting the decimal work? It doesn't look like the decimal stuff will depend on it, right?

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#13).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 21: Code-Review+2

(2 comments)

Michael, can you take a final pass as well?

http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1621:      { false, -1000, 5, 0 }}},
could you add a comment saying how many fives that is. I counted 38, so why does this overflow?


http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
why not std:abs() here?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................

IMPALA-2020: Add rounding when casting from decimal to int

Preview diff, not working yet, mostly to see if I can successfully
push diffs against my ASF Impala fork, and also to get early
feedback on the UDF change.  Update - pushing from fork didn't
work.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
4 files changed, 41 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

Patch Set 9:

(10 comments)

How about augmenting test_decimal_casts.py to also test these cases? (You'll probably want Michael's change).

As we discussed, on a release build, let's do a quick microbenchmark to verify perf with these three points:

- Before this change.
- After this change with v2=false
- After this change

And we want the first two to match (no regression with v2=false).

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS9, Line 303: round
what i meant earlier was we should either pass 'round' here and get rid of the "else" clause, *or* leave the else clause here but don't pass round. Why do we have both?

I think you concluded that the else-clause was better because of the change in overflow behavior, so that's fine. but then we don't need 'round' (we never pass false).

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS9, Line 1442: { false, 9999999900, 10, 2 }}},
you can delete this line since the results match.

Line 1460: { true, 0, 4, 0 }}},
let's include tests at the boundary of the 'int' ranges to exercise the new overflow check you added.

let's also test the other int types (bigint, tinyint, smallint)

Line 1503: { true, 0, 3, 0 }}},
what happens on the "exact" boundary, i.e. +/- 0.5? why not test that case as well?

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS9, Line 232: Optionally rounds to the nearest integer,
: /// defined as half round away from zero.
this contradicts the first sentence which says rounding always happens. but as i mentioned in decimal-operators-ir.cc, let's either keep the else-clause there or use the 'round' bool here, but not both.

PS9, Line 234: RESULT_T
this should be explained in the comment as well.

http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 43: precise
what is the consequence of that? do we get the wrong answer with the current implementation in some cases?

PS9, Line 110: auto
we generally use auto only when it makes the code more readable (because, say, the type is really long like for an iterator). Here, the type can just be T, right?

PS9, Line 111: auto
same

PS9, Line 119: DCHECK
DCHECK_EQ (it'll print both sides on failure).

--
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG
Commit Message:

Line 122
> Sure, we can try to get this back in follow on commits.
Will check


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

PS15, Line 218:   BENCHMARK("strncmp1", "'abcdefghi
> Can you please add a sample output like other benchmark functions ?
I can't - with these changes, this is in a slightly better state, but the benchmark still doesn't run, it segfaults almost immediately due to a NULL descriptor table.  After spending the better part of a day attempting to fix it, I've had to shelve this for now.  Considering that there are serious issues running this through codegen anyway, I'm not sure what to do with these changes.


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

PS15, Line 169: -.1
> -0.1.
Done


PS15, Line 224: Decimal4Value::FromDouble(t4, 0.499999999, true, &overflow);
> Isn't a duplicate of the test above ?
This was supposed to be negative :)


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS15, Line 52: truncating digits that
             :   /// cannot be represented or rounding if desired
> rounding to the closest integer if 'round' is true. Truncate the decimal pl
Done


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // Some quick research shows us that 10**38 in binary has exactly 53
             :     // significant nonzero bits.  Coincidentally, double precision floating
             :     // point gives us 53 bits for the mantissa.  TODO: validate that the
             :     // values produced by the template are both correct, constant and that
             :     // the multiply does not lose precision.
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     if (max_value < 0 || UNLIKELY(std::isnan(d)) || UNLIKELY(std::abs(d) >= max_value)) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
             :    
> Are there particular cases you have in mind which may break if we use V2's 
None I can come up with right now, but this was before I noticed the bit pattern for 10^37 fits exactly in a a double.  Now I'm reasonably sure it is safe.


http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 52: max_value < 0
> Yes, if it's not possible (i.e. only happens due to a software bug somewher
Done


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS15, Line 132: 
> nit: long line
fixed


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: In progress.  Rouding to int works as expected, rounding from
float seems to work as well.
Expect a full test suite for both modes and all edge cases to be covered.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
7 files changed, 149 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#21).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 469 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/21
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 127: typename RESULT_T::underlying_type_t
> It's applicable to the next line too, right ? The following looks easier to
I agree that does look nicer, but I can't do it without polluting the client namespace with whatever typedef I use, unless I put it inside another namespace, like detail, which doesn't really solve the problem either.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/17/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS17, Line 1395: N.B. - with full decimal V2 this should evaluate to 1
How so? We've already added the new V2 result type formula for %, and you get decimal(38, 38) in this case which is consistent with other engines.  what type would be a better choice?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 23:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/299/

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 23: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: Still in progrress.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 228 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 10:

(1 comment)

I am trying to get the DCHECK_EQ to work with int128_t, for which we do provide a definition, but gcc keeps complaining about an ambiguous overload.

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // TODO: this could be made more precise by doing the computation in
             :     // 64 bit with 128 bit immediate result instead of doing an additional
             :     // multiply in 53-bit double precision floating point
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
             :     // Truncate and just take the integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   }
> If it won't break DECIMAL_V1, may as well merge the two paths and add the t
It looks like it will work and it doesn't change the speed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 3:

> > I'm not opposed to cleaning up the AnyVal stuff like that, but
 > > given that udf.h stuff dictates UDF compatibility, it's not
 > > completely trivial. It doesn't look like it would break binary
 > > compatibility though. But, in case something goes wrong, how
 > about
 > > we do that as a separate change so it could be backed out without
 > > affecting the decimal work? It doesn't look like the decimal
 > stuff
 > > will depend on it, right?
 > 
 > No, but it gets a lot cleaner to test the limits by giving the
 > generic form an underlying type.  I deliberately did not change
 > FloatVal, since the equality operator is currently kind of broken. 
 > Everything else should be binary compatible.

But you can get that by looking at *Val::val, no?

Another thing to watch out for is that since users write and compile their own UDFs, we've had trouble in the past with accidently increasing the C++ version required to build these things. I don't remember the specific problems (maybe Michael or Tim do) and also don't know whether that's a real issue with this change.

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 18:

(4 comments)

Does this change solve the DCHECK you were hitting running expr-test?

http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1395:      { true, 0, 38, 38 }}}, // This is overflowing, but wasn't noticed by the test before
let's just say IMPALA-4964 rather than refer to what the old code did, and why not combine both lines (we have the same bug in both v1 and v2)?


PS18, Line 1397: related to this
related to what?


PS18, Line 1398: IMAPALA
IMPALA. also combine to one line


Line 1635:         TestIsNotNull(c.expr, type);
mentioned in the other change -- 
thanks for finding that. we should probably add 
EXPECT_EQ(result, StringParser::PARSE_SUCCESS);
to the end of TestDecimalValue(). What's happening is the decimal string parser is failing (when parsing "NULL"), but when it does that it happens to return a DecimalValue with value()=0.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#3).

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................

IMPALA-2020: Add rounding when casting from decimal to int

Preview diff, not working yet, mostly to see if I can successfully
push diffs against my ASF Impala fork, and also to get early
feedback on the UDF change.  Update - pushing from fork didn't
work.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
4 files changed, 50 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:31:32 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:06:05 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:04:45 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:41:07 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=2645812a55b081c7:8b8eecc800000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 20:59:38 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=72472ad54b4d2324:4e5354c500000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:02:48 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=5f48db565c4426e1:9a9b255000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:01:44 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=a148fc7f88508bf3:7d0381a00000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:03:44 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=724bf16838f51065:78675af700000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:03:53 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=d746528416873278:a94e61ef00000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 485 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/16
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#12).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 485 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS5, Line 303: return to_type(dv.whole_part(scale)); 
> any reason not to have ToInt<>() handle this case too by passing in 'round'
Not really - I thought the saner behavior for ToInt was rounding and wrote this cast first, which didn't require a parameter, whereas FromDouble does as it is used in places where no context is available.  I can add this if you think it is useful.


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, &overflow, true);
> where is this used? will it break compatibility?
In what sense?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS5, Line 54: bool* overflow,
            :       bool round
> here and below, let's keep input parameters first, then output params.
Done.  Means I can't give a default value to round unless I also give a default value to overflow, but lets do that anyway.

Is it okay if I change the ABI here to be

From Double (const ColumnType& t, double d, bool round = true, bool& overflow = of_ignored)

and then have a static bool DecimalValue::of_ignored which will get inserted when the caller doesn't bother to check overflow.  (Actually, Q: can reference arguments even have default values?)  The reason being to prevent a nullptr deref crash.  Worth having a debate about references vs. pointers here.  Passing a pointer allows us to omit the overflow check from ever happening (ptr == nullptr should be known at compile time) but we currently don't do that or null checks in any of the places where we pass back this style of overflow.  I think we probably will want the overflow check in all runtime cases so maybe this doesn't matter.


Line 232:   inline typename RESULT_T::underlying_type_t ToInt(int scale, bool& overflow) const;
> add a function comment.
Done


PS5, Line 232:  bool& overflow
> use 'bool* overflow'. we generally use pointers for output params to make i
You have answered my last question.  Please ignore


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 50:     if (abs(d) >= max_value) {
> is this correct? 'd' is already scaled here.  please be sure to test the up
Yeah that is bogus.  What we need is the maximum integer value that represents a base 10 number of length precision, so the -scale should go.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5951

to look at the new patch set (#22).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 472 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/22
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS5, Line 303: return to_type(dv.whole_part(scale)); 
any reason not to have ToInt<>() handle this case too by passing in 'round' (similar to FromDouble())?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 185:         value_.decimal4_val = Decimal4Value::FromDouble(type, v, &overflow, true);
where is this used? will it break compatibility?


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS5, Line 54: bool* overflow,
            :       bool round
here and below, let's keep input parameters first, then output params.


PS5, Line 232:  bool& overflow
use 'bool* overflow'. we generally use pointers for output params to make it clearer at the callsite.


Line 232:   inline typename RESULT_T::underlying_type_t ToInt(int scale, bool& overflow) const;
add a function comment.


http://gerrit.cloudera.org:8080/#/c/5951/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 50:     if (abs(d) >= max_value) {
is this correct? 'd' is already scaled here.  please be sure to test the upper bounds.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding when casting from decimal to int

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding when casting from decimal to int
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/3/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 84:   auto divisor = DecimalUtil::GetScaleMultiplier<T>(scale);
> doesn't round, of course.
Done.  Turns out you can do this without branching.  I think right now I have a branch but I can get rid of it.


Line 88:     return RESULT_T::null();
> how about sticking with the convention used in the rest of this file where 
That's a good idea.  Should I still be returning raw UDF types (in which case I need to return the null version of the type).


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/15//COMMIT_MSG
Commit Message:

Line 122: do to get this back.
Sure, we can try to get this back in follow on commits.

But the other thing I wanted to check was whether perf for v2=0 regresses at all with this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1621:   { "cast(555555555555555555555555555555555555555 as decimal(38,0))",
> could you add a comment saying how many fives that is. I counted 38, so why
oops, I counted wrong -- i think there are 39 5's here?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/benchmarks/expr-benchmark.cc
File be/src/benchmarks/expr-benchmark.cc:

PS15, Line 218: Benchmark* BenchmarkDecimalCast() {
Can you please add a sample output like other benchmark functions ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-test.cc
File be/src/runtime/decimal-test.cc:

PS15, Line 169: -.1
-0.1.

Also may help to add cases which cause overflow due to rounding.

Decimal16Value::FromDouble(t3, 0.99999, true, &overflow);
Decimal16Value::FromDouble(t3, 0.99999, false, &overflow);


PS15, Line 224: Decimal4Value::FromDouble(t4, 0.499999999, true, &overflow);
Isn't a duplicate of the test above ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS15, Line 52: truncating digits that
             :   /// cannot be represented or rounding if desired
rounding to the closest integer if 'round' is true. Truncate the decimal places otherwise.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // Some quick research shows us that 10**38 in binary has exactly 53
             :     // significant nonzero bits.  Coincidentally, double precision floating
             :     // point gives us 53 bits for the mantissa.  TODO: validate that the
             :     // values produced by the template are both correct, constant and that
             :     // the multiply does not lose precision.
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     DCHECK(max_value > 0);  // no DCHECK_GT because of int128_t
             :     if (UNLIKELY(std::isnan(d)) || UNLIKELY(std::abs(d) >= max_value)) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
> Sort of.  I didn't want to change the original at all in this change though
Are there particular cases you have in mind which may break if we use V2's logic for both version ?


http://gerrit.cloudera.org:8080/#/c/5951/15/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS15, Line 132: ||
nit: long line


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS12, Line 52: max_value < 0
> If we cast to a decimal type whose underlying width is not large enough to 
Yes, if it's not possible (i.e. only happens due to a software bug somewhere else), let's make it a DCHECK.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#17).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/17
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 9:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5951/10//COMMIT_MSG
Commit Message:

PS10, Line 10: Still writing tests,
             : but this is ready for human eyes to look at.
> Not needed in the commit message anymore, right ?
will cull


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS9, Line 303: round
> Yeah, I got that.  What I'm saying is why not remove the 'round' paramter s
Killing it with fire


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1503:      { true, 0, 3, 0 }}},
> okay.  i missed that the scale=1 case covers that.
It doesn't hurt to do anyways, and does apply some pressure to other corners.  Would probably not hurt to add + 0.55 tests as well.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

PS10, Line 237: /// Returns an approximate double for this decimal.
> long line
Will fix


http://gerrit.cloudera.org:8080/#/c/5951/9/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS9, Line 43: precise
> sure, fixing in a separate change sounds fine.  If there's no jira for that
Freakily enough, 100000000000000000000000000000000000000 has exactly 53 significant digits in binary.  So it should be represented exactly as a double precision floating point value.  This means conversions from float to decimal(38, 38) should be well behaved.


http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // TODO: this could be made more precise by doing the computation in
             :     // 64 bit with 128 bit immediate result instead of doing an additional
             :     // multiply in 53-bit double precision floating point
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
             :     // Truncate and just take the integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   }
> Appears to me that we should be able to share the code for the most part:
Sort of.  I didn't want to change the original at all in this change though.  We can do this if we can prove the results will be identical.


PS10, Line 108: typename RESULT_T::underlying_type_t
> May help to add a comment that underlying_type_t is defined only for the *I
Done


Line 113:   if (divisor == 1) {
> If it's for optimization, constant propagation in codegen would have reaped
Things like dcheck, the divide, modulus and rounding are completely unnecessary in this case.


PS10, Line 117: const T remainder = v % divisor;
> Do we not need to compute this if round is false ?
No, but I was worried that moving the computation further away from the divide would cause the compiler to do the divide twice.  We'll try it and see.  I haven't even gotten to looking at the release binary yet.


PS10, Line 119: DCHECK(
> DCHECK_EQ
Can't do it or GCC goes completely insane as it can't find ostream operators for int128_t.


PS10, Line 127: typename RESULT_T::underlying_type_t
> May be consider doing a typedef typename RESULT_T::underlying_type_t result
I can only get rid of this one, not the other, so it seems more confusing just to do that for only one use.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 23: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#18).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0        |  1        |
+----------------------+

+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5      | 0.6                              |
+---------------------------------------------+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

New version, decimal_v2=1;
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/18
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/10/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS10, Line 38:   if (round) {
             :     // Decimal V2 behavior
             : 
             :     // Multiply the double by the scale.
             :     //
             :     // TODO: this could be made more precise by doing the computation in
             :     // 64 bit with 128 bit immediate result instead of doing an additional
             :     // multiply in 53-bit double precision floating point
             : 
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             :     d = std::round(d);
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Return the rounded integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   } else {
             :     // TODO: IMPALA-4924: remove DECIMAL V1 code
             : 
             :     // Check overflow.
             :     const T max_value = DecimalUtil::GetScaleMultiplier<T>(precision - scale);
             :     if (abs(d) >= max_value) {
             :       *overflow = true;
             :       return DecimalValue();
             :     }
             : 
             :     // Multiply the double by the scale.
             :     d *= DecimalUtil::GetScaleMultiplier<double>(scale);
             : 
             :     // Truncate and just take the integer part.
             :     return DecimalValue(static_cast<T>(d));
             :   }
> None I can come up with right now, but this was before I noticed the bit pa
If it won't break DECIMAL_V1, may as well merge the two paths and add the test cases such as 10^37 to verify things are still working as expected.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 18:

(4 comments)

Nope, doesn't solve the DCHECK issue.  I added a DCHECK that the static cast back to the target type from double equals the actual value of the target type, and it failed.  Issue is real, see the updated comment in be/src/runtime/decimal-value.inline.h

Floating point is not so fun, and the conversion really does lose precision.  Issue is also pre-existing so I'm not trying to fix it now.

http://gerrit.cloudera.org:8080/#/c/5951/18/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1395:      { true, 0, 38, 38 }}}, // This is overflowing, but wasn't noticed by the test before
> let's just say IMPALA-4964 rather than refer to what the old code did, and 
Done


PS18, Line 1397: related to this
> related to what?
Done


PS18, Line 1398: IMAPALA
> IMPALA. also combine to one line
Done


Line 1635:         TestIsNotNull(c.expr, type);
> mentioned in the other change -- 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  Still writing tests,
but this is ready for human eyes to look at.

Testing: Still in progrress.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
8 files changed, 222 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.

Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> why not std:abs() here?
std::abs() is not defined for int128_t


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes