You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2022/04/13 11:57:14 UTC

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

pranav.lodha@cloudera.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18413


Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 146 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,075 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/19
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408:     if (isnan(val1) || isnan(val2)) return;
Skipping NaN here seems questionable? Please add comments or handle appropriately.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422
PS18, Line 422:     if (nA == 0) {
Does (nB == 0) need handling?


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543
PS18, Line 543:   if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) return;
Skipping Nan?


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575
PS18, Line 575:     if (isnan(val1) || isnan(val2)) return;
Skipping NaN


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589
PS18, Line 589:     if (nA == 0) {
handle (nB == 0) ?


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS18, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store;
This may be non-deterministic (flaky) without an order by on the window clause.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544
PS18, Line 1544: select id, double_col, corr(double_col, int_col) over (partition by month) from functional.alltypes
This may be non-deterministic (flaky) without an order by on the window clause.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578
PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city order by s_city
Check if order by city is strong ordering.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 01 Jun 2022 18:38:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,068 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/22
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 940 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/17
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10622/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 23 May 2022 21:50:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 24 May 2022 02:06:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17:

(2 comments)

> Patch Set 15:
> 
> (3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328
PS15, Line 328:     state->xvar += deltaX * (x - state-
> This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. 
Right, I was skeptical about it too! I've made the required changes.


http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332
PS15, Line 332: }
              : 
              : static inline void CorrRemoveState(double x, d
> I think we need to find a reference for these.
Here are a few useful links: 
https://www.osti.gov/servlets/purl/1028931
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 24 May 2022 07:34:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 23:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10749/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 10:12:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10: Code-Review+1

(5 comments)

LGTM. I'll see if other guys want to review this.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339
PS10, Line 339:  
nit: remove the space


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350
PS10, Line 350:  
nit: remove the space


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364
PS10, Line 364:   
nit: fix the indention


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365
PS10, Line 365: 
nit: remove blank lines


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/uda.test@180
PS10, Line 180: ====
nit: could you revert the changes in this file?



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 27 Apr 2022 09:39:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10664/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 May 2022 21:59:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc@295
PS11, Line 295: // Correlation coefficient: r = (n(Σxy)−(Σx)(Σy))/(√(nΣx^2−(Σx)^2)*(nΣy^2−(Σy)^2))
line too long (97 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 11 May 2022 06:10:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 21:50:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8115/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 18 May 2022 07:54:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10709/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:33:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 18:44:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(4 comments)

I would prefer to see the covar_pop and covar_samp functions implemented in the same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
These and any other computation variables may need to be precise types if the output type is precise.


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
Handle overflow?


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
Handle overflow?


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: 
Add OLAP partition/window testcases. If not supported, they can be negative tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 27 Apr 2022 12:16:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 13:

(12 comments)

> Patch Set 12:
> 
> (12 comments)
> 
> Thanks for adding the other two stats functions!

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301
PS12, Line 301:   
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309
PS12, Line 309: 
              : void AggregateFunctions::CorrInit(FunctionContext* ctx, StringVal* dst) {
              :   dst->is_null = false;
              :   dst->len = sizeof(Co
> These two fields are not used in covar_samp() and covar_pop(). Can we have 
Sure, I added a new state, CovarState, and a few related functions.


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349
PS12, Line 349: rns NULL if
> nit: please fix the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362
PS12, Line 362: pret_cast<C
> nit: fix the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376
PS12, Line 376: return
> nit: add spaces
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392
PS12, Line 392: L);
> nit: add spaces
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347
PS12, Line 1347: //
> nit: add one space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS12, Line 1372: //
> nit: add one space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by s_city) from tpcds.store;
> nit: could you add s_store_sk as the first column? Also we don't need "limi
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619
PS12, Line 1619: 
> Could you leave a commet for this?
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: ====
> nit: add s_store_sk column and remove the limit clause
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700: ---- TYPES
> nit: add s_store_sk column and remove the limit clause
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 12 May 2022 18:17:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/28
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@315
PS1, Line 315: void AggregateFunctions::CorrUpdate(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2,
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@329
PS1, Line 329: void AggregateFunctions::CorrRemove(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2,
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@346
PS1, Line 346: void AggregateFunctions::CorrMerge(FunctionContext* ctx, const StringVal& src, StringVal* dst) {
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@361
PS1, Line 361: const StringVal AggregateFunctions::CorrSerialize(FunctionContext* ctx, const StringVal& src) {
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions-ir.cc@374
PS1, Line 374:   double corr = ((state->prod*state->count) - (state->sumx*state->sumy)) / (pow((((state->count*state->sum_squaredx) - (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) - (state->sumy*state->sumy))), 0.5));
line too long (221 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@69
PS1, Line 69:   static void CorrUpdate(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/18413/1/be/src/exprs/aggregate-functions.h@71
PS1, Line 71:   static void CorrRemove(FunctionContext* ctx, const DoubleVal& src1, const DoubleVal& src2,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Apr 2022 11:58:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 29:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8246/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 29: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 19 Jun 2022 04:35:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
 the Pearson's correlation coefficient between them. This UDAF
 is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 153 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
 the Pearson's correlation coefficient between them. This UDAF
 is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 151 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 18 May 2022 12:26:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 07 Jun 2022 09:07:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/24
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   return result;
> Yeah, we can remove it, it'll get cleared after the function pops from the 
Oh, I think we need this. Otherwise we'll see warnings like

 WARNINGS: UDF WARNING: Memory leaked via FunctionContext::Allocate()

Look at other udas, they use StringValSerializeOrFinalize() for such simple purpose, i.e. copy and free. I think we don't need this CorrSerialize() method, just do the same as other udas, e.g.
https://github.com/apache/impala/blob/1358700740dbeff799f6a6a95f95b1d9fe7281d2/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java#L1376


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   // Calculating correlation coefficient using Pearson Product Moment Correlation formula
> Yeah, we can remove it. I intended to remove it, but probably forgot.
Based on the above comment, I think we do need this.


http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@372
PS5, Line 372:   if (state->count == 0 || state->count == 1) return DoubleVal::null();
Can we add test cases for these?


http://gerrit.cloudera.org:8080/#/c/18413/5/be/src/exprs/aggregate-functions-ir.cc@374
PS5, Line 374: ((state->prod*state->count) - (state->sumx*state->sumy)) /
             :       (pow((((state->count*state->sum_squaredx) -
             :       (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) -
             :                    (state->sumy*state->sumy))), 0.5));
Can we format this a little bit? There are too many brackets. Maybe this is better:

  int64_t n = state->count;
  double r1 = n * state->prod - state->sumx * state->sumy;
  double r2 = (n * state->sum_squaredx - state->sumx * state->sumx) *
      (n * state->sum_squaredy - state->sumy * state->sumy);
  double corr = r1 / sqrt(r2);

Note that we can replace pow(x, 0.5) with sqrt(x).

We should also check if r2 is 0 and return NULL in that case. Hive does it this way:
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366


http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@19
PS5, Line 19: NULL
Is this correct? In Hive, it's 1.0


http://gerrit.cloudera.org:8080/#/c/18413/5/testdata/workloads/functional-query/queries/QueryTest/uda.test@26
PS5, Line 26: corr(cast(timestamp_col as int),cast(timestamp_col as int))
Hive doesn't need to cast timestamp in the query. Can we improve this?



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Apr 2022 03:27:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(14 comments)

> Patch Set 7:
> 
> (16 comments)
> 
> The patch is looking good now!

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291
PS7, Line 291: //
> nit: add a space after "//"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320
PS7, Line 320: st
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321
PS7, Line 321:   state->prod += x * y;
             :   ++state->count;
             : }
             : 
             : static inline void CorrRemoveState(double x, double y, CorrState* state){
             :   state->sumx -= x;
> We can extract these into an update method, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336
PS7, Line 336: if
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337
PS7, Line 337:   CorrState* state = reinterpret_cast<CorrState*>(dst->ptr);
             :   if (!isnan(src1.val) && !isnan(src2.val)) {
             :     CorrUpdateState(src1.val, src2.val , state);
             :   }
             : }
             : 
> We can extract these into a method too.
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355
PS7, Line 355: co
> nit: need a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376
PS7, Line 376: ns
> nit: need a space after if
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401
PS7, Line 401:     return DoubleVal::null();
> We should not free it here since we still use 'state' at following lines. I
Approach 1 worked so implemented that


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408
PS7, Line 408: if 
> nit: add a space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10
PS7, Line 10: select test_count(int_col) from functional.alltypestiny;
> Sorry to be back and forth, but we'd better move these tests to testdata/wo
Sure, done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24
PS7, Line 24: select sum_small_decimal(c3) from functional.decimal_tiny;
> Do we have test coverage on only one side is NULL?
Yes, added


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41
PS7, Line 41: ---- TYPES
> Cool!
Ack


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48
PS7, Line 48: ---- RESULTS
> Could you also add the 'id' column here?
Done


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69
PS7, Line 69: ---- TYPES
> Could you add the 'year' column here?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:41:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. This UDAF is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 153 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 12:

(12 comments)

Thanks for adding the other two stats functions!

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301
PS12, Line 301: //
nit: add a space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309
PS12, Line 309:   // Sum(x) squared
              :   double sum_squaredx;
              :   // Sum(y) squared
              :   double sum_squaredy;
These two fields are not used in covar_samp() and covar_pop(). Can we have a separate State for covar_samp() and covar_pop()?


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349
PS12, Line 349: ,src2.val ,
nit: please fix the space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362
PS12, Line 362: ,src2.val ,
nit: fix the space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376
PS12, Line 376: ,val2,
nit: add spaces


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392
PS12, Line 392: ,val2,
nit: add spaces


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347
PS12, Line 1347: //
nit: add one space


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS12, Line 1372: //
nit: add one space


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_number_employees,s_floor_space) over (partition by s_city) from tpcds.store limit 10;
nit: could you add s_store_sk as the first column? Also we don't need "limit 10" since this table only has 12 rows.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619
PS12, Line 1619: /1E+10
Could you leave a commet for this?


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: select covar_samp(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store limit 10;
nit: add s_store_sk column and remove the limit clause


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700: select covar_pop(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store limit 10;
nit: add s_store_sk column and remove the limit clause



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 12 May 2022 07:48:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 11 May 2022 09:38:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(2 comments)

> Patch Set 10:
> 
> (1 comment)
> 
> > Patch Set 10:
> > 
> > (4 comments)
> > 
> > I would prefer to see the covar_pop and covar_samp functions implemented in the same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
> AFAIK, Hive and other aggregate functions of impala like avg() aren't check
I'll add overflow handling


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
> AFAIK, Hive and other aggregate functions of impala like avg() aren't check
I'll add overflow handling



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 02 May 2022 11:33:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 28: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:49:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 26:

(3 comments)

> Patch Set 25: Code-Review+1
> 
> (3 comments)
> 
> Math seems good to me

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
              :   int64_t count; // number of elements
              :   double xavg; // average of x elements
              :   double yavg; // average of y elements
              :   double xvar; // n times the variance of x elements
              :   double yvar; // n times the variance of y elements
              :   double covar; // n times the covariance
              : };
> As I understand, both CorrState and CovarState calculates xavg,yavg and cov
Yeah, that's right, we did something similar in our 12th patch where we didn't create a new state for covar_samp() and covar_pop() but two fields were not getting used in calculating covar_samp() and covar_pop() and it was causing a lot of unnecessary computation so we separated it into two different states.


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
> nit: do we need this? for first items state->xavg = deltaX = x, so we will 
Yeah, you're right, Quanlong and I had a discussion on this, we decided to add this check as it saves some floating point computation. More computation would means more floating point rounding error.


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331
PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_(n - 1)) * (x_n - mx_n)
> nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1))
Yeah, I just changed it.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:28:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/26
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 26:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10775/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:47:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 26:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7
PS26, Line 7:  
nit: remove the space before ":"


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
> Yeah, you're right, Quanlong and I had a discussion on this, we decided to 
We can improve this later if we do find it impacts the performance. I think most of the group sizes are large in practise. So the cpu prediction correctness rate won't be so bad. Maybe we can add a LIKELY() marcro here.


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343
PS26, Line 343:   }
              :   else {
nit: put "}" and "else {" at the same line


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513
PS26, Line 513:   }
              :   else {
nit: put "}" and "else {" at the same line



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Jun 2022 03:11:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 28: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 18 Jun 2022 17:22:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341
PS20, Line 341:  
nit: remove the space before ")"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342
PS20, Line 342:     state->count = 0;
              :     state->xavg = 0.0;
              :     state->yavg = 0.0;
              :     state->xvar = 0.0;
              :     state->yvar = 0.0;
              :     state->covar = 0.0;
nit: these can be simplified by memset(), just like what we do at line 313.

 memset(state, 0, sizeof(CorrState))


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348
PS20, Line 348:   }
              :   else
nit: put "}" and "else" at the same line, i.e. "} else {"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355
PS20, Line 355: (x_n - mx_(n - 1)) * (y_n - my_n)
This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - state->yavg' is (y_n - my_(n-1)).

BTW, please fix the idention of line 355 ~ 360.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375
PS20, Line 375: the number of calls to Update() or Remove()
This is not clear to me. Do you mean 'count'?


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393
PS20, Line 393:     CorrUpdateState(val1, val2, state);
nit: please add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398
PS20, Line 398: the number of calls to Update() or Remove()
Could you explain more? Why do we need to check the number of Update/Remove calls?


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407
PS20, Line 407:     CorrRemoveState(val1, val2, state);
nit: please add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451
PS20, Line 451: sqrt((state->xvar * state->yvar))
We should check whether 'state->xvar * state->yvar' become negative before calling sqrt(). Please split the if-statement at line 455 into two: One for checking counts and vars before this line, and the other one for checking 'r == 0.0' after this line.

BTW, don't need duplicated parentheses here.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452
PS20, Line 452:   // xvar and yvar become negative in certain cases due to floating point rounding error.
              :   // Since these values are very small, they can be ignored and rounded to 0.
              :   DCHECK(state->xvar >= -1E-8 && state->yvar >= -1E-8);
move this to the begining of this method, and split it into two DCHECKs so we know exactly which one breaks if it happens.

 DCHECK(state->xvar >= -1E-8);
 DCHECK(state->yvar >= -1E-8);


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467
PS20, Line 467:   ctx->Free(src.ptr);
              :   return CorrGetValue(ctx, src);
This is a use-after-free pattern. We should not free the resource if we still want to use it. Chang it to

 DoubleVal r = CorrGetValue(ctx, src);
 ctx->Free(src.ptr);
 return r;


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507
PS20, Line 507:  
nit: remove the space before ")".


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508
PS20, Line 508:     state->count = 0;
              :     state->xavg = 0.0;
              :     state->yavg = 0.0;
              :     state->covar = 0.0;
nit: use memset instead.

 memset(state, 0, sizeof(CovarState))


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512
PS20, Line 512:   }
              :   else {
nit: put them into one line, i.e. "} else {"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535
PS20, Line 535: the number of calls to Update() or Remove()
Need explanation here too


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@553
PS20, Line 553:     CovarUpdateState(val1, val2, state);
nit: add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@567
PS20, Line 567:     CovarRemoveState(val1, val2, state);
nit: add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@625
PS20, Line 625:   ctx->Free(src.ptr);
              :   return CovarSampleGetValue(ctx, src);
This is a use-after-free pattern too. Please fix it.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@636
PS20, Line 636:   ctx->Free(src.ptr);
              :   return CovarPopulationGetValue(ctx, src);
This is a use-after-free pattern too. Please fix it.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 03:15:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10712/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:51:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 23: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 11:07:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10750/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 12:37:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24: Code-Review+1

The test failure is due to a flaky test (IMPALA-1995) which is unrelated to this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 01:32:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 643 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/12
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10554/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 11 May 2022 06:30:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14: Code-Review+1

LGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 12:45:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10591/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 18 May 2022 08:11:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10440/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 13 Apr 2022 12:19:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
5 files changed, 274 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10495/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:52:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 12:

(7 comments)

> Patch Set 11:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296: // r = (n(Σxy)−(Σx)(Σy))/(√(nΣx^2−(Σx)^2)*(nΣy^2−(Σy)^2))
> Seems ok to stick with double as other databases do this and there does not
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339
PS10, Line 339: 
> nit: remove the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350
PS10, Line 350: 
> nit: remove the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364
PS10, Line 364: 
> nit: fix the indention
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365
PS10, Line 365: 
> nit: remove blank lines
Done


http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc@295
PS11, Line 295: // Correlation coefficient:
> line too long (97 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: double
> Also test empty table and all zero values cases.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 11 May 2022 09:18:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Jian Zhang (Code Review)" <ge...@cloudera.org>.
Jian Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296
PS18, Line 296: // r = covar / (√(xvar * yvar)
could you explain more about how the `CorrState` is updated and used to calculate the final correlation of two variables?

it would be helpful for readers unfamiliar with the algorithms if you could add links about the "Pearson's correlation" and "Welford's online algorithm".



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 03 Jun 2022 01:38:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293
PS17, Line 293: // using a stable one-pass algorithm
nit: please also mention "Welford's online algorithm".


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355
PS17, Line 355: NULL
nit: we prefer nullptr to NULL.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358
PS17, Line 358:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: we can merge this check to line 354, i.e.

  if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) return;


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371
PS17, Line 371:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: same as above. We can merge the check to line 367.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412
PS17, Line 412:   if(src.ptr != NULL) {
nit: add a space after "if" and replace NULL with nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:       dst_state->count = src_state->count;
              :       dst_state->xavg = src_state->xavg;
              :       dst_state->yavg = src_state->yavg;
              :       dst_state->xvar = src_state->xvar;
              :       dst_state->yvar = src_state->yvar;
              :       dst_state->covar = src_state->covar;
nit: Can we simplify this by using memcpy? i.e.

  memcpy(dst, &src, sizeof(CorrState)


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423
PS17, Line 423:     if (nA != 0 && nB != 0) {
nit: replace "if" with "else if" to save one check, or add a "return" at the end of the above if-branch.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453
PS17, Line 453: sqrt(state->xvar) / sqrt(state->yvar)
sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar).


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493
PS17, Line 493: both terms
nit: both terms? Maybe you want to add another equation after line 491:

 // c_n = c_(n - 1) + (x_n - mx_n) * (y_n - my_(n - 1))



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 May 2022 06:57:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 12:35:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408:     if (isnan(val1) || isnan(val2)) return;
> Skipping NaN here seems questionable? Please add comments or handle appropr
Code is still skipping NaN..



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:02:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 23:20:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24:

recheck


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Jun 2022 16:39:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10461/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Apr 2022 07:36:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by s_city) from tpcds.store;
> Done
sorry that I mean changing the query to

 select s_store_sk, corr(s_number_employees,s_floor_space) over (partition by s_city) from tpcds.store

Some result rows are identical. Adding the s_store_sk column helps to distinguish them.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: ====
> Done
sorry, I mean changing the query to

  select s_store_sk, covar_samp(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700: ---- TYPES
> Done
sorry, I mean changing it to

  select s_store_sk, covar_pop(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 10:46:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8101/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 11:24:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8146/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 23 May 2022 21:37:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

(11 comments)

> Patch Set 17:
> 
> (9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293
PS17, Line 293: // online algorithm. This is calcula
> nit: please also mention "Welford's online algorithm".
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337
PS17, Line 337:   double deltaY = y - state->yavg;
> If state->count is 1, it will be decreased to 0. We will hit divide-by-zero
Yeah, it's been taken care of in the new patch.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342
PS17, Line 342:     state->xvar = 0.0;
> If the original count is 2, it will be decreased to 1 at line 337. Then we 
Sure, updated in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355
PS17, Line 355:  vx_
> nit: we prefer nullptr to NULL.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358
PS17, Line 358:       state->yvar -= deltaY * (y - state->yavg);
> nit: we can merge this check to line 354, i.e.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371
PS17, Line 371: 
> nit: same as above. We can merge the check to line 367.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412
PS17, Line 412: 
> nit: add a space after "if" and replace NULL with nullptr.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   DCHECK(dst->ptr != NULL);
              :   DCHECK_EQ(sizeof(CorrState), dst->len);
              :   CorrState* dst_state = reinterpret_cast<CorrState*>(dst->ptr);
              :   if (src.ptr != nullptr) {
              :     long nA = dst_state->count;
              :     long nB = src_state->count;
> nit: Can we simplify this by using memcpy? i.e.
This was causing impala crash so didn't add this as of now


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423
PS17, Line 423:       dst_state->count = src_state->count;
> nit: replace "if" with "else if" to save one check, or add a "return" at th
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453
PS17, Line 453: 
> sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar).
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493
PS17, Line 493: 
> nit: both terms? Maybe you want to add another equation after line 491:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 01 Jun 2022 07:05:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

> Patch Set 18:
> 
> (1 comment)
Hi, sure Jian, I’ll be adding a few details in the new patch after I get a few clarifications, till then you can checkout these links :
Basic overview :
https://docs.google.com/document/d/1TH-907nK5JWIGZ-ePo9CQLQYlW2Y-jEAajfCYHhswus/edit
This contains implementation details :
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
https://www.osti.gov/biblio/1028931
Hive also implements using a similar method :
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 03 Jun 2022 06:32:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 19:

(3 comments)

> Patch Set 19:
> 
> (10 comments)
> 
> > Patch Set 18:
> > 
> > (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   CorrState* src_state = reinterpret_cast<CorrState*>(src.ptr);
              :   DCHECK(dst->ptr != NULL);
              :   DCHECK_EQ(sizeof(CorrState), dst->len);
              :   CorrState* dst_state = reinterpret_cast<CorrState*>(dst->ptr);
              :   if (src.ptr != nullptr) {
              :     long nA = dst_state->count;
> This was causing impala crash so didn't add this as of now
Added


http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@324
PS19, Line 324: count
This check is just added as it saves some floating point computation. More computation would means more floating point rounding error.


http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@458
PS19, Line 458: =
Please let me know if this tolerance value is suitable



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:36:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:26:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 23:

(4 comments)

> Patch Set 22: Code-Review+1
> 
> (4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341
PS22, Line 341: if (state->count <= 1) {
> nit: this is part of a multi-line if-else statement so our code style still
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452
PS22, Line 452:   DCHECK(state->xvar >= -1E-8);
              :   DCHECK(state->yvar >= -1E-8
> nit: add brackets {} for multi-line if-statement
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507
PS22, Line 507:   state->xavg += (x - state->xavg) / state->count;
> nit: this is part of a multi-line if-else statement so our code style still
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324
PS22, Line 1324:         List
> nit: use the same idention as the above metionds.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 09:50:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 28:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10793/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:44:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
              :   int64_t count; // number of elements
              :   double xavg; // average of x elements
              :   double yavg; // average of y elements
              :   double xvar; // n times the variance of x elements
              :   double yvar; // n times the variance of y elements
              :   double covar; // n times the covariance
              : };
> In Patchset12, you used CorrState for Covar calculations as well, which I a
As we discussed in the call, removing the code duplication brings up more unnecessary complications in the code and its cleaner to keep them separate, as it is. Maybe you can add a cross reference in the comment to the structs, e.g. "If you change Corr functions you probably have to change Covar functions too and vice versa.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Jun 2022 07:17:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 29: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
              :   int64_t count; // number of elements
              :   double xavg; // average of x elements
              :   double yavg; // average of y elements
              :   double xvar; // n times the variance of x elements
              :   double yvar; // n times the variance of y elements
              :   double covar; // n times the covariance
              : };
> Yeah, that's right, we did something similar in our 12th patch where we did
In Patchset12, you used CorrState for Covar calculations as well, which I agree is not great, wasteful and confusing

My proposal is to still have 2 separate structs, but CorrState could utilize CovarState.

Some sketch, you can use composition instead of inheritance as well.

struct CovarState {
  int64_t count; // number of elements
  double xavg; // average of x elements
  double yavg; // average of y elements
  double covar; // n times the covariance
  void update(double x, double y) {...}
  void remove(double x, double y) {...}
  void merge(CovarState& other) {...}
};

struct CorrState : public CovarState {\ 
  double xvar; // n times the variance of x elements
  double yvar; // n times the variance of y elements
  void update(double x, double y) {
    double prevXavg = xavg;
    double prevYavg = yavg;
    CovarState::update(x,y);
    xvar = ...
    yvar = ...
  }
  ...
};



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:42:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/10446/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 14 Apr 2022 04:51:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 652 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(1 comment)

> Patch Set 10:
> 
> (4 comments)
> 
> I would prefer to see the covar_pop and covar_samp functions implemented in the same patch as well.

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: 
> Add OLAP partition/window testcases. If not supported, they can be negative
Sure, I'll include those.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:37:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/10447/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 14 Apr 2022 05:02:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 08:06:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 19:

(10 comments)

> Patch Set 18:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296
PS18, Line 296: // https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
> could you explain more about how the `CorrState` is updated and used to cal
Added a few links


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408:       tm_src2.ToSubsecondUnixTime(UTCPTR, &val2)) {
> Skipping NaN here seems questionable? Please add comments or handle appropr
Yeah, its fixed now.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422
PS18, Line 422:     long nB = src_state->count;
> Does (nB == 0) need handling?
nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462
PS18, Line 462:   return DoubleVal(corr);
> Add an assert that negative values are small and comment that this can happ
Sure,added, I've taken the tolerance value to be around -1E-8 as it seemed safe to me. We can probably go for smaller values like -1E-10 as well. Please let me know what would be a suitable tolerance value.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543
PS18, Line 543:   DCHECK_EQ(sizeof(CovarState), dst->len);
> Skipping Nan?
Updated in the latest patch.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575
PS18, Line 575:   }
> Skipping NaN
Updated in the latest patch.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589
PS18, Line 589:       return;
> handle (nB == 0) ?
nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS18, Line 1526:  from tpcds.store;
> This may be non-deterministic (flaky) without an order by on the window cla
Yeah, its fixed in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544
PS18, Line 1544: select id, double_col, corr(double_col, int_col) over (partition by month order by id) from functional.alltypes
> This may be non-deterministic (flaky) without an order by on the window cla
Yeah, its fixed in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578
PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk
> Check if order by city is strong ordering.
It wasn't strong ordering, I've changed it to s_store_sk which contains unique values.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:23:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301
PS20, Line 301:   long count; // number of elements
Change all occurrences of long to int64_t


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378
PS20, Line 378:   DCHECK(dst->ptr != NULL);
Change all occurrences of NULL to nullptr



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 07 Jun 2022 13:10:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 15:47:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14:

(2 comments)

I double the tests and find a question we need to address.

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: 1.01233222875e-08
Hive returns NULL in this case. I think the reason is https://issues.apache.org/jira/browse/HIVE-16178:

 If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
 If N * SUMY2 equals SUMY * SUMY , then the result is the null value.

The commit is https://github.com/apache/hive/commit/d97e4874dc0e09c535e8cb908f6b17698e49a5d6

Could you check if indeed N * SUMY2 equals SUMY * SUMY in this case, or it's a precision issue in Hive?


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558
PS14, Line 1558: ====
For tests on window functions, can we also copy Hive's test?
https://github.com/apache/hive/blob/3b3da9ed7f3813bae3e959670df55682fea648d3/ql/src/test/results/clientpositive/llap/windowing.q.out#L853-L890



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 16 May 2022 01:36:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 04:41:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 00:14:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24: Code-Review+2

> Patch Set 24:
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/

The failure is unrelated to this patch
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5845
https://issues.apache.org/jira/browse/IMPALA-10125


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:22:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 07:49:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25: Code-Review+1

(3 comments)

Math seems good to me

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
              :   int64_t count; // number of elements
              :   double xavg; // average of x elements
              :   double yavg; // average of y elements
              :   double xvar; // n times the variance of x elements
              :   double yvar; // n times the variance of y elements
              :   double covar; // n times the covariance
              : };
As I understand, both CorrState and CovarState calculates xavg,yavg and covar the same way. I think we could restructure this a bit to spare code duplication.

E.g.:
CovarState::update, remove and merge could encapsulate the logic regarding the common members, and CorrState could have a CovarState member, and call its functions meanwhile calculating xvar and yvar too


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
nit: do we need this? for first items state->xavg = deltaX = x, so we will add 0 to state->covar, does not change anything. This way we pay a branching for every count unnecessarily


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331
PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_n) * (x_n - mx_(n - 1))
nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1))



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Jun 2022 14:57:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has abandoned this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Abandoned

minor change
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10
PS4, Line 10:  
nit: remove the leading space


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292
PS4, Line 292: // type and returns the correlation coefficient between them.
Could you comment the formula here?


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359
PS4, Line 359: 
nit: remove this blank line


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   ctx->Free(src.ptr);
Could you explain why we need to free it here? It's not allocated by this function.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   ctx->Free(src.ptr);
Why do we free it here? We still use the CorrState below.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375
PS4, Line 375:   // Calculating correlation coefficient using Pearson Product Moment Correlation formula
             :   double corr = ((state->prod*state->count) - (state->sumx*state->sumy)) /
             :       (pow((((state->count*state->sum_squaredx) -
             :       (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) -
             :                    (state->sumy*state->sumy))), 0.5));
I just realized that we are using a different formula than Hive's:
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L49-L60

Are they equivalent?


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12
PS4, Line 12: '0.00032'
I checked the test codes and found that it already handles rounding inaccurary:
https://github.com/apache/impala/blob/31bd2a47036e7be3a0a32f873b60d2f70dcd8c9f/tests/common/test_result_verifier.py#L176-L185

So we are safe to use double results directly.

BTW, can we add more corr() on other columns?


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15
PS4, Line 15: ====
Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. catalog_sales has NULLs in cs_sold_date_sk column.


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32
PS4, Line 32: select corr(double_col,double_col) from functional.alltypes;
Could you add tests for other types as well? E.g.

select
  corr(tinyint_col, tinyint_col),
  corr(smallint_col, smallint_col),
  corr(int_col, int_col),
  corr(bigint_col, bigint_col),
  corr(float_col, float_col),
  corr(double_col, double_col),
  corr(timestamp_col, timestamp_col)
from functional.alltypes;

'alltypes' table doesn't have decimal types. We can use the `decimal_tbl` table.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 15 Apr 2022 01:04:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 14 Apr 2022 08:02:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 7:

(16 comments)

The patch is looking good now!

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@291
PS7, Line 291: //
nit: add a space after "//"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@320
PS7, Line 320: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@321
PS7, Line 321:     state->sumx += src1.val;
             :     state->sumy += src2.val;
             :     state->sum_squaredx += src1.val * src1.val;
             :     state->sum_squaredy += src2.val * src2.val;
             :     state->prod += src1.val * src2.val;
             :     ++state->count;
We can extract these into an update method, e.g.

  static inline void CorrUpdate(double x, double y, CorrState* state)

and then use it in TimestampCorrUpdate() as well.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@336
PS7, Line 336: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@337
PS7, Line 337:     state->sumx -= src1.val;
             :     state->sumy -= src2.val;
             :     state->sum_squaredx -= src1.val * src1.val;
             :     state->sum_squaredy -= src2.val * src2.val;
             :     state->prod -= src1.val * src2.val;
             :     --state->count;
We can extract these into a method too.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@355
PS7, Line 355: if
nit: need a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@376
PS7, Line 376: if
nit: need a space after if


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@401
PS7, Line 401:   ctx->Free(src.ptr);
We should not free it here since we still use 'state' at following lines. I think we should free it before each 'return' statement.

Another solution is getting its values into local variables and then free it.


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions-ir.cc@408
PS7, Line 408: if(
nit: add a space after "if"


http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/18413/7/be/src/exprs/aggregate-functions.h@74
PS7, Line 74:   static const StringVal CorrSerialize(FunctionContext* ctx, const StringVal& src);
We can remove this


http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1322
PS7, Line 1322: Corr
nit: Add a space after "//"


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@10
PS7, Line 10: select corr(ps_availqty,ps_supplycost) from tpch.partsupp;
Sorry to be back and forth, but we'd better move these tests to testdata/workloads/functional-query/queries/QueryTest/aggregation.test
All the builtin aggregate functions are tested there.


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@24
PS7, Line 24: select corr(d,e) from functional.nulltable;
Do we have test coverage on only one side is NULL?


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@41
PS7, Line 41: select corr(utctime,localtime) from functional.alltimezones;
Cool!


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@48
PS7, Line 48: select corr(int_col, int_col)
Could you also add the 'id' column here?

Also please comment why the results are NULLs.


http://gerrit.cloudera.org:8080/#/c/18413/7/testdata/workloads/functional-query/queries/QueryTest/uda.test@69
PS7, Line 69: select corr(double_col,double_col)
Could you add the 'year' column here?



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 26 Apr 2022 07:51:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(3 comments)

> Patch Set 10:
> 
> (4 comments)
> 
> I would prefer to see the covar_pop and covar_samp functions implemented in the same patch as well.

Since this jira was focused on corr(), should we change its description or create a new jira for covariance functions?

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
> These and any other computation variables may need to be precise types if t
AFAIK, a similar approach is used in hive and other aggregate functions of impala, that's why we kept it double. Double supports precision types like Decimal. There's a test included as well just to check this behaviour.


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   state->sumx += x;
> Handle overflow?
AFAIK, Hive and other aggregate functions of impala like avg() aren't checking explicitly for overflow. Do you want any specific check or behavior in case of overflow, probably returning null or giving an error?


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@389
PS10, Line 389:   dst_state->sumx += src_state->sumx;
> Handle overflow?
AFAIK, Hive and other aggregate functions of impala like avg() aren't checking explicitly for overflow. Do you want any specific check or behavior in case of overflow, probably returning null or giving an error?



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:03:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

> Patch Set 10:
> 
> Sure, I'll include those as well.

I mean the OLAP partition tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:06:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
5 files changed, 274 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 12 Jun 2022 13:49:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 12 Jun 2022 18:11:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 24: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 00:39:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8099/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 12 May 2022 18:53:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,078 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/18
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337
PS17, Line 337:   if (state->count > 0) --state->count;
If state->count is 1, it will be decreased to 0. We will hit divide-by-zero errors in the following lines.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342
PS17, Line 342:   if (state->count > 1) {
If the original count is 2, it will be decreased to 1 at line 337. Then we will miss it here which is wrong IIUC.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 May 2022 05:08:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,067 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/20
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328
PS15, Line 328:   state->xavg -= deltaX / state->count;
This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. In CorrUpdateState(), we have
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n
Reverting the formula we have
mx_(n-1) = (n * mx_n - x_n)/(n-1) = mx_n - (x_n - mx_n)/(n-1)

The above code corresponds to
mx_(n-1) = mx_n - (x_n - mx_n) / n
So it's incorrect. We should decrease state->count before this.

I think the correct code is

  --state->count;
  state->xavg -= deltaX / state->count;
  state->yavg -= deltaY / state->count;

We also need to check if state->count becomes 0.


http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332
PS15, Line 332:     state->covar -= deltaX * (y - state->yavg);
              :     state->xvar -= deltaX * (x - state->xavg);
              :     state->yvar -= deltaY * (y - state->yavg);
I think we need to find a reference for these.


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: NULL
> Resolved
FWIW, this is resolved by changing the formula to use another one with better error precision.
https://www.johndcook.com/blog/standard_deviation/



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 19 May 2022 07:02:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462
PS18, Line 462:     return DoubleVal::null();
Add an assert that negative values are small and comment that this can happen due to floating point rounding error.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 03 Jun 2022 13:42:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14:

(3 comments)

> Patch Set 13:
> 
> (3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store;
> sorry that I mean changing the query to
No problem at all, I've updated the query.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: ====
> sorry, I mean changing the query to
No problem at all, I've updated the query.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700: ---- TYPES
> sorry, I mean changing it to
No problem at all, I've updated the query.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 11:31:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 759 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/13
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10572/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 13 May 2022 11:43:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10564/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 12 May 2022 18:29:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 853 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/15
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 18:43:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341
PS22, Line 341: if (state->count <= 1) memset(state, 0, sizeof(CorrState));
nit: this is part of a multi-line if-else statement so our code style still prefer adding brackets {} here.


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452
PS22, Line 452:   if (state->count == 0 || state->count == 1 || state->xvar <= 0.0 || state->yvar <= 0.0)
              :     return DoubleVal::null();
nit: add brackets {} for multi-line if-statement


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507
PS22, Line 507:   if (state->count <= 1) memset(state, 0, sizeof(CovarState));
nit: this is part of a multi-line if-else statement so our code style still prefer adding brackets {} here.


http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324
PS22, Line 1324:             
nit: use the same idention as the above metionds.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 10 Jun 2022 08:05:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/27
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10623/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 23 May 2022 21:56:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has restored this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 07 Jun 2022 04:36:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8200/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 06:32:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296:   double prod;
> AFAIK, a similar approach is used in hive and other aggregate functions of 
Seems ok to stick with double as other databases do this and there does not appear to be a standard. I would recommend comparing results with postgres including cases that are likely to lose precision mixing small and large numbers.


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: 
> Sure, I'll include those.
Also test empty table and all zero values cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 05 May 2022 17:54:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 759 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/14
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 13: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 12 May 2022 23:21:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15:

(2 comments)

> Patch Set 14:
> 
> (2 comments)
> 
> I double the tests and find a question we need to address.

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: NULL
> Hive returns NULL in this case. I think the reason is https://issues.apache
Resolved


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558
PS14, Line 1558: ====
> For tests on window functions, can we also copy Hive's test?
We still don't have support for window clause



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 18 May 2022 07:51:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 173 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 5:

(9 comments)

> Patch Set 4:
> 
> (9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/4//COMMIT_MSG@10
PS4, Line 10: t
> nit: remove the leading space
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@292
PS4, Line 292: // type and returns the correlation coefficient between them.
> Could you comment the formula here?
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@359
PS4, Line 359:   dst_state->count += src_state->count;
> nit: remove this blank line
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@366
PS4, Line 366:   return result;
> Could you explain why we need to free it here? It's not allocated by this f
Yeah, we can remove it, it'll get cleared after the function pops from the stack trace so its not needed.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@373
PS4, Line 373:   // Calculating correlation coefficient using Pearson Product Moment Correlation formula
> Why do we free it here? We still use the CorrState below.
Yeah, we can remove it. I intended to remove it, but probably forgot.


http://gerrit.cloudera.org:8080/#/c/18413/4/be/src/exprs/aggregate-functions-ir.cc@375
PS4, Line 375:       (pow((((state->count*state->sum_squaredx) -
             :       (state->sumx*state->sumx)) * ((state->count*state->sum_squaredy) -
             :                    (state->sumy*state->sumy))), 0.5));
             :   return DoubleVal(corr);
             : }
> I just realized that we are using a different formula than Hive's:
Yes, they're equivalent. Both are using Pearson's correlation coefficient itself. In hive, there are functions for covariance and standard deviation, so the formula might look a little different. But on simplifying its the same.


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test
File testdata/workloads/functional-query/queries/QueryTest/uda.test:

http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@12
PS4, Line 12: 0.000321849166368
> I checked the test codes and found that it already handles rounding inaccur
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@15
PS4, Line 15: ====
> Can we add tests on NULL values? Some tpcds tables contain NULLs, e.g. cata
Done


http://gerrit.cloudera.org:8080/#/c/18413/4/testdata/workloads/functional-query/queries/QueryTest/uda.test@32
PS4, Line 32: ====
> Could you add tests for other types as well? E.g.
Sure, done!



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Apr 2022 09:23:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................

IMPALA-11205: Implement CORR() function

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them. This UDAF
is tested with a few query tests written in uda.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/uda.test
4 files changed, 250 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 25 Apr 2022 10:08:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 27 Apr 2022 08:58:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement CORR() function

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement CORR() function
......................................................................


Patch Set 10:

Sure, I'll include those as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Apr 2022 07:04:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 18: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 01 Jun 2022 01:58:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 20:

(1 comment)

> Patch Set 18:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408: }
> Code is still skipping NaN..
Oh sorry, I missed it, its fixed now.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:34:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 22:

(21 comments)

> Patch Set 20:
> 
> (19 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301
PS20, Line 301:   int64_t count; // number of elements
> Change all occurrences of long to int64_t
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341
PS20, Line 341: )
> nit: remove the space before ")"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342
PS20, Line 342:   else {
              :     --state->count;
              :     // mx_(n - 1) = mx_n - (x_n - mx_n) / (n - 1)
              :     state->xavg -= deltaX / state->count;
              :     // my_(n - 1) = my_n - (y_n - my_n) / (n - 1)
              :     state->yavg -= delt
> nit: these can be simplified by memset(), just like what we do at line 313.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348
PS20, Line 348:     // c_(n - 1) = c_n - (x_n - mx_n) * (y_n - my_(n -1))
              :     st
> nit: put "}" and "else" at the same line, i.e. "} else {"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355
PS20, Line 355: 
> This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - state
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375
PS20, Line 375: 
> This is not clear to me. Do you mean 'count'?
In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have finalize() function, we don't need to explicitly check the number of update() and remove() calls.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378
PS20, Line 378: void AggregateFunctions::TimestampCorrUpdate(FunctionContext* ctx,
> Change all occurrences of NULL to nullptr
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393
PS20, Line 393:   // Remove doesn't need to explicitly 
> nit: please add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398
PS20, Line 398: lue::FromTimestampVal(src1);
> Could you explain more? Why do we need to check the number of Update/Remove
In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have finalize() function, we don't need to explicitly check the number of update() and remove() calls.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407
PS20, Line 407: void AggregateFunctions::CorrMerge(Func
> nit: please add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451
PS20, Line 451: e->yvar >= -1E-8);
> We should check whether 'state->xvar * state->yvar' become negative before 
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452
PS20, Line 452:   if (state->count == 0 || state->count == 1 || state->xvar <= 0.0 || state->yvar <= 0.0)
              :     return DoubleVal::null();
              :   double r = sqrt(state->xvar * state->yvar);
> move this to the begining of this method, and split it into two DCHECKs so 
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467
PS20, Line 467:   ctx->Free(src.ptr);
              :   return r;
> This is a use-after-free pattern. We should not free the resource if we sti
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507
PS20, Line 507: )
> nit: remove the space before ")".
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508
PS20, Line 508:   else {
              :     --state->count;
              :     // my_(n - 1) = my_n - (y_n - my_n) / (n - 1)
              :     state->yavg -= (y -
> nit: use memset instead.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512
PS20, Line 512:     // c_(n - 1) = c_n - (x_n - mx_(n - 1)) * (y_n - my_n)
              :     stat
> nit: put them into one line, i.e. "} else {"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535
PS20, Line 535: 
> Need explanation here too
In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have finalize() function, we don't need to explicitly check the number of update() and remove() calls.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@553
PS20, Line 553: void AggregateFunctions::TimestampCovarR
> nit: add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@567
PS20, Line 567: }
> nit: add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@625
PS20, Line 625:   ctx->Free(src.ptr);
              :   return r;
> This is a use-after-free pattern too. Please fix it.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@636
PS20, Line 636:   DoubleVal r = CovarPopulationGetValue(ctx, src);
              :   ctx->Free(src.ptr);
> This is a use-after-free pattern too. Please fix it.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 18:24:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/23
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 21: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 08 Jun 2022 11:02:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 25:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 940 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/16
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 27:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:40:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
pranav.lodha@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 28:

(4 comments)

> Patch Set 25:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7
PS26, Line 7: :
> nit: remove the space before ":"
Done


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: // Correlation coefficient formula used:
              : // r = covar / (√(xvar * yvar)
              : struct CorrState {
              :   int64_t count; // number of elements
              :   double xavg; // average of x elements
              :   double yavg; // average of y elements
              :   double xvar; // n times the variance of x elements
              :   
> As we discussed in the call, removing the code duplication brings up more u
Sure, I've added that.


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343
PS26, Line 343:   if (state->count <= 1) {
              :     mems
> nit: put "}" and "else {" at the same line
Done


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513
PS26, Line 513:     memset(state, 0, sizeof(CovarState));
              :   } else
> nit: put "}" and "else {" at the same line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:25:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()  and COVAR_POP()
......................................................................

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Reviewed-on: http://gerrit.cloudera.org:8080/18413
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 30
Gerrit-Owner: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pr...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>