You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Abhishek Rawat (Code Review)" <ge...@cloudera.org> on 2020/11/12 03:50:47 UTC

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16712


Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_stats_extrapolation.py
4 files changed, 25 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16712/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Reviewed-on: http://gerrit.cloudera.org:8080/16712
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_stats_extrapolation.py
4 files changed, 27 insertions(+), 9 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 03:39:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7642/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 02:24:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 09:07:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6651/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 03:40:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_stats_extrapolation.py
4 files changed, 27 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16712/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16712/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/16712/3/tests/common/impala_test_suite.py@933
PS3, Line 933:     # Both a and b must be positive for the following check to make sense.
> I think I will fix appx_equals for -ive values like you suggested.
I changed the diff_perc in test_stats_extrapolation to 1.0 from 2.0. For very small sampling percentage like 1 or 3 we do get a large variance between extrapolated #rows and actual #rows.

The condition we are trying to catch is any time we have a -1 (which indicates stats not set).
appx_equals(X, -1, 1.0) returns false since X+1/X > 1.0

The test case passes with 1.0.

We are now also ensuring that the extrapolated #rows >= 0 in the testcase. So we should be covered as far as catching this issue through the test.



-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 02:11:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7641/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 20:26:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7635/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 04:13:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_stats_extrapolation.py
4 files changed, 27 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16712/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7643/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 02:35:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16712/4/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/16712/4/tests/common/impala_test_suite.py@933
PS4, Line 933: ,
flake8: E231 missing whitespace after ','



-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Nov 2020 02:03:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16712/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/16712/3/tests/common/impala_test_suite.py@933
PS3, Line 933:     # Both a and b must be positive for the following check to make sense.
> Hmm, do you think it's easier to just not mess with it then?
I think I will fix appx_equals for -ive values like you suggested.

And also fix the callers to pass a valid diff_perc. Its only being called in a couple of places so should be okay.

Let me make these changes.



-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 20:05:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16712/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16712/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@549
PS2, Line 549:       countSql = String.format("CAST(ROUND(COUNT(*) / %.10f) AS BIGINT)", effectiveSamplePerc_);
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 03:51:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

Posted by "Abhishek Rawat (Code Review)" <ge...@cloudera.org>.
Abhishek Rawat has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16712 )

Change subject: IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
......................................................................

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_stats_extrapolation.py
4 files changed, 26 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16712/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16712
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Gerrit-Change-Number: 16712
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>