You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/02/04 09:48:09 UTC

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17023


Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................

IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

When the Partition-by and Order-by expressions of an analytic are all
constants, it should be evaluated in a single unpartitioned fragment
(same as analytics that have no Partition-by/Order-by exprs).

Tests:
 - Added planner tests
 - Added e2e tests

Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
---
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
4 files changed, 114 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17023/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 11:53:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 2:

(5 comments)

Code changes look good. A few comments about testing.

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3210
PS2, Line 3210: plased
nit: change to 'placed'


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3212
PS2, Line 3212: select row_number() over (order by 'a') from functional.alltypes
One other suggestion for the test since we are concerned about correctness in the presence of constants is to also add a test with expressions .. e.g  order by 1+3  that exercises constant folding or CAST(5 as int).


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3229
PS2, Line 3229: plased
nit: same as above


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3261
PS2, Line 3261: plased
nit: same as above


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test:

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test@2200
PS2, Line 2200: select row_number() over (order by 'a'), count() over (order by 0)
This query actually produces the right results without the patch because alltypestiny has only 8 rows which falls below the 100 row threshold for small query optimization and it runs as a single node plan.
You could perhaps use a medium size table or reduce the threshold or (if the exact row_number value is not needed to be verified) you could do a  COUNT(*) on top of the subquery with row_number() to verify row count.



-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Feb 2021 17:25:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8077/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Feb 2021 10:11:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................

IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

When the Partition-by and Order-by expressions of an analytic are all
constants, it should be evaluated in a single unpartitioned fragment
(same as analytics that have no Partition-by/Order-by exprs). Currently,
it's placed within the same fragment with the child node, which causes
it to be computed locally and get incorrect results when the fragment is
partitioned.

Tests:
 - Added planner tests
 - Added e2e tests

Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Reviewed-on: http://gerrit.cloudera.org:8080/17023
Reviewed-by: Aman Sinha <am...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
4 files changed, 171 insertions(+), 5 deletions(-)

Approvals:
  Aman Sinha: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17023

to look at the new patch set (#3).

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................

IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

When the Partition-by and Order-by expressions of an analytic are all
constants, it should be evaluated in a single unpartitioned fragment
(same as analytics that have no Partition-by/Order-by exprs). Currently,
it's placed within the same fragment with the child node, which causes
it to be computed locally and get incorrect results when the fragment is
partitioned.

Tests:
 - Added planner tests
 - Added e2e tests

Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
---
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
4 files changed, 171 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17023/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test:

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test@2200
PS2, Line 2200: # Note that this test is ran with "set exec_single_node_rows_threshold=0" which is in the
> Thanks for the suggestion! I guess you are running the query without "set e
Yes, I was running with the default settings via impala shell.  Thanks for clarifying the test behavior.



-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 02:32:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8082/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 02:00:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17023

to look at the new patch set (#2).

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................

IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

When the Partition-by and Order-by expressions of an analytic are all
constants, it should be evaluated in a single unpartitioned fragment
(same as analytics that have no Partition-by/Order-by exprs).

Tests:
 - Added planner tests
 - Added e2e tests

Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
---
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
4 files changed, 114 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17023/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8076/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Feb 2021 10:09:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 2:

(5 comments)

Thanks for the quick review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3210
PS2, Line 3210: plased
> nit: change to 'placed'
Done


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3212
PS2, Line 3212: select row_number() over (order by 'a') from functional.alltypes
> One other suggestion for the test since we are concerned about correctness 
Good point! Added more tests.


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3229
PS2, Line 3229: plased
> nit: same as above
Done


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test@3261
PS2, Line 3261: plased
> nit: same as above
Done


http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test:

http://gerrit.cloudera.org:8080/#/c/17023/2/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test@2200
PS2, Line 2200: select row_number() over (order by 'a'), count() over (order by 0)
> This query actually produces the right results without the patch because al
Thanks for the suggestion! I guess you are running the query without "set exec_single_node_rows_threshold=0". It can hit the bug as long as the table contains several files because this test is ran with exec_single_node_rows_threshold=0 which is in the default test dimension: https://github.com/apache/impala/blob/60f8f87b09a27618df2ac73c1cc6dcd052f8c60d/tests/common/test_dimensions.py#L169

I run the test using

 impala-py.test tests/query_test/test_queries.py::TestQueries::test_analytic_fns

The ouputs show me that tests are ran with exec_single_node_rows_threshold=0:

 tests/query_test/test_queries.py::TestQueries::test_analytic_fns[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] PASSED
 tests/query_test/test_queries.py::TestQueries::test_analytic_fns[protocol: hs2 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] PASSED
 tests/query_test/test_queries.py::TestQueries::test_analytic_fns[protocol: hs2-http | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] PASSED

Reverting changes in FE can fail this test.



-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 01:40:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17023 )

Change subject: IMPALA-10473: Fix wrong analytic results on constant partition/order by exprs
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6874/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc88a410dab984ff37e27dc635bee5f289003a2a
Gerrit-Change-Number: 17023
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 06:26:47 +0000
Gerrit-HasComments: No