You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2017/02/16 17:10:08 UTC

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6038

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The major differences are:
1. output type's scale has a minimum of 6
2. precision will be augmented if the scale is adjusted.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 216 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The major differences are:
1. output type's scale has a minimum of 6
2. precision will be augmented if the scale is adjusted.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 229 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:

1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which is always set to 38. We use the smallest precision which
can store the output. A key insight is that the output of AVG()
is no wider than the inputs. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) if AVG() is a subexpression. Please note that
the output type is different from SUM()/COUNT() as the latter
can have a much larger scale.

2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal values whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Reviewed-on: http://gerrit.cloudera.org:8080/6038
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 235 insertions(+), 76 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved
  Dan Hecht: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

PS2, Line 197: 0
> Yes, I tried verifying with sum()/num_rows(). That's why I used tpch_parque
I meant I verified with sum(col)/count(*).


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6038/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 447:   // The scale of the accumulated sum is set to the scale of the output type.
> I think this comment would be better in the frontend code since that's wher
Done


http://gerrit.cloudera.org:8080/#/c/6038/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 336:       // AVG() always get at least MIN_ADJUSTED_SCALE decimal places since it performs
> gets
Done


Line 338:       int resultScale = Math.max(ScalarType.MIN_ADJUSTED_SCALE, digitsAfter);
> Is the resulting type the same as if we did (SUM(col) / COUNT(col))? In any
Done


http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

PS2, Line 197: 0
> do these results make sense? seems surprising that this decimal place is al
Yes, I tried verifying with sum()/num_rows(). That's why I used tpch_parquet.lineitem to help verify the result.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG
Commit Message:

PS4, Line 25: types
values

i.e. we can compute AVG() over DECIMAL(38,0) as long as the absolute values are less than 10^32. I think this is more common than actually having values larger than 10^32 when the column was the result of some decimal arithmetic that produced a type that's "too precise".

http://gerrit.cloudera.org:8080/#/c/6038/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

PS4, Line 338: Our implementation is similar to MS SQL which takes the max of the input's scale
: // and MIN_ADJUSTED_SCALE.
How about explaining why we deviate from SQL Server and from SUM()/COUNT():

Scale is set the same as MS SQL Server, which takes the max of the input scale and MIN_ADJUST_SCALE. For precision, MS SQL always sets it to 38. Instead, trim it down to the size that's needed given that the absolute value of the result must be less than the absolute value of the largest input. Using a smaller precision allows for better DECIMAL types to be chosen for the overall expression when AVG() is a subexpression.

--
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6038

to look at the new patch set (#3).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The major differences are:
1. output type's scale has a minimum of 6
2. precision will be augmented if the scale is adjusted.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 231 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.

Alex Behm has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6038

to look at the new patch set (#5).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:

1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which is always set to 38. We use the smallest precision which
can store the output. A key insight is that the output of AVG()
is no wider than the inputs. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) if AVG() is a subexpression. Please note that
the output type is different from SUM()/COUNT() as the latter
can have a much larger scale.

2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal values whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 235 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6038/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 450:     DCHECK_EQ(sum_scale, output_scale);
I don't think we need this DCHECK since it was just an artificial restriction put on the code.


http://gerrit.cloudera.org:8080/#/c/6038/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 408:     size_t from_offset = expr.find("from");
please add a short comment explaining this.
also, expr is no longer an expression..


http://gerrit.cloudera.org:8080/#/c/6038/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 335:         analyzer.getQueryOptions().isDecimal_v2()) {
let's add a comment:
// AVG() result always gets at least MIN_ADJUSTED_SCALE decimal places since it performs an implicit divide.


Line 338:       return ScalarType.createAdjustedDecimalType(resultPrecision, resultScale);
I guess this works out as the same type as falling through and taking the createClippedDecimalType() path, right? if so, i'm fine with either (falling through might be less "tricky", and we are clipping the precision in this case anyway).


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6038

to look at the new patch set (#4).

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................

IMPALA-4821: Update AVG() for DECIMAL_V2

This change implements the DECIMAL_V2's behavior for AVG().
The differences with DECIMAL_V1 are:

1. The output type has a minimum scale of 6. This is similar
to MS SQL's behavior which takes the max of 6 and the input
type's scale. We deviate from MS SQL in the output's precision
which tends to be 38. We use the smallest precision which can
store the output. One key insight is that the output of AVG()
is no wider than the input. Precision only needs to be adjusted
when the scale is augmented. Using a smaller precision avoids
potential loss of precision in subsequent decimal operations
(e.g. division) which take output of AVG() as inputs. Please
note that the output type is different from SUM()/COUNT() as
the latter can have a much larger scale.

2. Due to a minimum of 6 decimal places for the output,
AVG() for decimal types whose whole number part exceeds 32
decimal places (e.g. DECIMAL(38,4), DECIMAL(33,0)) will
always overflow as the scale is augmented to 6. Certain
decimal types which work with AVG() in DECIMAL_V1 no longer
work in DECIMAL_V2.

Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
5 files changed, 231 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6038/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6038/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 450:     DCHECK_EQ(sum_scale, output_scale);
> I don't think we need this DCHECK since it was just an artificial restricti
It was mostly for ensuring the compatibility with DECIMAL_V1. I can convert that to a comment instead.


http://gerrit.cloudera.org:8080/#/c/6038/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 408:     size_t from_offset = expr.find("from");
> please add a short comment explaining this.
Added some comments. I'll it call it a query for the lack of better term.


http://gerrit.cloudera.org:8080/#/c/6038/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 335:         analyzer.getQueryOptions().isDecimal_v2()) {
> let's add a comment:
Done


Line 338:       return ScalarType.createAdjustedDecimalType(resultPrecision, resultScale);
> I guess this works out as the same type as falling through and taking the c
I anticipate the removal of createClippedDecimalType() in some future version. I'll keep it this way if you don't mind.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG
Commit Message:

PS4, Line 25: types
> Done.
I'm not sure how many people deal with numbers larger than 10^32... seems like a reasonable limitation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/291/

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.

Alex Behm has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6038/3//COMMIT_MSG
Commit Message:

Line 12: 2. precision will be augmented if the scale is adjusted.
In addition to my other comment, I think we should mention some of the limitations here, for example, that it won't be possible to compute the AVG() of some decimal types at all.


http://gerrit.cloudera.org:8080/#/c/6038/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 337:       // an implicit divide. This is similar to the MS SQL server's behavior which takes
Suggest these minor fixes and reordering:

AVG() gets at least MIN_ADJUSTED_SCALE decimal places since it performs an implicit divide. However, the output type is not always the same as SUM()/COUNT(). Our behavior is similar to MS SQL Server's which takes the max of the input's scale and MIN_ADJUSTED_SCALE.

The observations are ok, but at a high level:
* the type is not the same as SUM()/COUNT()
* the behavior is not the same as MS SQL Server's

It would be good to provide a justification/intuition for why we chose yet another behavior (also in the commit msg).


http://gerrit.cloudera.org:8080/#/c/6038/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

Line 208: ====
We should also add tests for those decimal types where AVG() cannot be computed (because we always overflow).


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6038/4//COMMIT_MSG
Commit Message:

PS4, Line 25: types
> values
Done.

This may seem like a concern in the long run, isn't it ?


http://gerrit.cloudera.org:8080/#/c/6038/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

PS4, Line 338: Our implementation is similar to MS SQL which takes the max of the input's scale
             :       // and MIN_ADJUSTED_SCALE.
> How about explaining why we deviate from SQL Server and from SUM()/COUNT():
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.

Michael Ho has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6038/3//COMMIT_MSG
Commit Message:

Line 12: 2. precision will be augmented if the scale is adjusted.
> In addition to my other comment, I think we should mention some of the limi
Done


http://gerrit.cloudera.org:8080/#/c/6038/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 337:       // an implicit divide. This is similar to the MS SQL server's behavior which takes
> Suggest these minor fixes and reordering:
Done


http://gerrit.cloudera.org:8080/#/c/6038/3/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

Line 208: ====
> We should also add tests for those decimal types where AVG() cannot be comp
There are cases like these in expr-test.cc.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.

Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

+2 for backend and tests

http://gerrit.cloudera.org:8080/#/c/6038/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 447:   // The scale of the accumulated sum is set to the scale of the output type.
I think this comment would be better in the frontend code since that's where the compatibility needs to be maintained.


http://gerrit.cloudera.org:8080/#/c/6038/2/testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test:

PS2, Line 197: 0
do these results make sense? seems surprising that this decimal place is always 0, but i haven't looked at the data.


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4821: Update AVG() for DECIMAL V2

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.

Alex Behm has posted comments on this change.

Change subject: IMPALA-4821: Update AVG() for DECIMAL_V2
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6038/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 336:       // AVG() always get at least MIN_ADJUSTED_SCALE decimal places since it performs
gets


Line 338:       int resultScale = Math.max(ScalarType.MIN_ADJUSTED_SCALE, digitsAfter);
Is the resulting type the same as if we did (SUM(col) / COUNT(col))? In any case, I think we should briefly explain why we chose this behavior (e.g. "because SQL Server does it that way").


-- 
To view, visit http://gerrit.cloudera.org:8080/6038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28f5ef0370938440eb5b1c6d29b2f24e6f88499f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-HasComments: Yes