You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2018/10/17 01:49:34 UTC

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11706


Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................

IMPALA-7689: Reduce per column per partition stats estimate size

With the improvements in the incremental stats memory representation
(IMPALA-7424), the per column per partition stats estimate should be
reduced to account for the compressed memory footprint. Doing some
experiments on various test tables, I see the size is down by 50-70%.

This patch reduces the size estimate by 50% (conservative). Ideally we
don't need to estimate on the Catalog server during serialization since
we can compute the byte sizes by looping through all the partitions.
However this patch retains the current logic to keep it consistent with
"compute incremental stats" analysis.

Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 1 insertion(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/11706/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

looks good. main thing for this one that would be helpful is to explain where the 400 came from (and from there, the 200).

http://gerrit.cloudera.org:8080/#/c/11706/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11706/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@199
PS1, Line 199: 200
the 400 byte estimate was pretty accurate from the examples I saw. is there a pointer you can put here to explain how that was derived, then explain the 50% of that from compression?



-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 03:42:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 2: Code-Review+2

Carrying +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 21:05:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 00:57:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11706

to look at the new patch set (#2).

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................

IMPALA-7689: Reduce per column per partition stats estimate size

With the improvements in the incremental stats memory representation
(IMPALA-7424), the per column per partition stats estimate should be
reduced to account for the compressed memory footprint. Doing some
experiments on various test tables, I see the size is down by 50-70%.

This patch reduces the size estimate by 50% (conservative). Ideally we
don't need to estimate on the Catalog server during serialization since
we can compute the byte sizes by looping through all the partitions.
However this patch retains the current logic to keep it consistent with
"compute incremental stats" analysis.

Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 3 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/11706/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1081/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 21:35:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11706/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11706/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@199
PS1, Line 199: 200
> the 400 byte estimate was pretty accurate from the examples I saw. is there
That was just an empirical estimate by observing some clusters with incremental stats. I don't have any useful pointers to add here.



-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 06:44:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1072/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 02:24:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................

IMPALA-7689: Reduce per column per partition stats estimate size

With the improvements in the incremental stats memory representation
(IMPALA-7424), the per column per partition stats estimate should be
reduced to account for the compressed memory footprint. Doing some
experiments on various test tables, I see the size is down by 50-70%.

This patch reduces the size estimate by 50% (conservative). Ideally we
don't need to estimate on the Catalog server during serialization since
we can compute the byte sizes by looping through all the partitions.
However this patch retains the current logic to keep it consistent with
"compute incremental stats" analysis.

Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Reviewed-on: http://gerrit.cloudera.org:8080/11706
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 3 insertions(+), 4 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7689: Reduce per column per partition stats estimate size

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11706 )

Change subject: IMPALA-7689: Reduce per column per partition stats estimate size
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3326/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Gerrit-Change-Number: 11706
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 21:05:25 +0000
Gerrit-HasComments: No