You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/01/17 21:21:02 UTC

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5726

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for both ALTER TABLE
and COMPUTE STATS.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
2 files changed, 125 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5726/4/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

Line 182:         in the <codeph>COMPUTE STATS</codeph> statement.
COMPUTE INCREMENTAL STATS only (the PARTITION clause does not apply to COMPUTE STATS)


Line 183:         Some forms of <codeph>ALTER TABLE</codeph> still only apply to one partition
The new partition selection logic also works for SHOW FILES IN <tbl> PARTITION (...)


Line 200:         large numbers of partitions.
Mmention IMPALA-4106 as a known scalability limitation, typically these bulk operations should be more efficient than doing multiple ALTER TABLE statements in quick succession.


-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "Amos Bird (Code Review)" <ge...@cloudera.org>.
Amos Bird has posted comments on this change.

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

Line 144:       For example, you might drop a group of partitions corresponding to a particular date
> Does the scalability bottleneck apply when the number of matching partition
>Does the scalability bottleneck apply when the number of matching partitions is high, or just generally when the table has a lot of partitions regardless of how many match?

when there are a lot of matching partitions.

>Do we have any guidance to offer (e.g. "don't use this idiom on tables with tens of thousands of partitions", or "under circumstances XYZ, issue multiple ALTER TABLE statements, each for one partition")?

well it just acts like we are issueing multiple single DROP stmts, so there isn't a guidance. User has to pay such laytency for now if those partitions are needed to be dropped.

>Does any issue manifest itself just as a single slow DDL statement, or does the hms traffic potentially cause other statements to wait, or other symptoms?

it's holding a lock of targeting table, so other DDLs related will be blocked.


-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1654: [DOCS] DDL for multiple partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1654: [DOCS] DDL for multiple partitions
......................................................................


IMPALA-1654: [DOCS] DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Added note about performance/scalability of IMPALA-1654.

Added new Known Issue item for IMPALA-4106 under Performance category.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Reviewed-on: http://gerrit.cloudera.org:8080/5726
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_show.xml
4 files changed, 186 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: [DOCS] DDL for multiple partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-1654: [DOCS] DDL for multiple partitions
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_show.xml
3 files changed, 158 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: [DOCS] DDL for multiple partitions

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#5).

Change subject: IMPALA-1654: [DOCS] DDL for multiple partitions
......................................................................

IMPALA-1654: [DOCS] DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Added note about performance/scalability of IMPALA-1654.

Added new Known Issue item for IMPALA-4106 under Performance category.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_show.xml
4 files changed, 186 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "Amos Bird (Code Review)" <ge...@cloudera.org>.
Amos Bird has posted comments on this change.

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

Line 144:       For example, you might drop a group of partitions corresponding to a particular date
should we warn user about the scalability limit (multiple hms RPCs) of current bulk dropping?


http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_compute_stats.xml
File docs/topics/impala_compute_stats.xml:

Line 130:       The following <codeph>COMPUTE INCREMENTAL STATS</codeph> statements affect some but not all 
trailing space.


-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

Line 144:       For example, you might drop a group of partitions corresponding to a particular date
> should we warn user about the scalability limit (multiple hms RPCs) of curr
Does the scalability bottleneck apply when the number of matching partitions is high, or just generally when the table has a lot of partitions regardless of how many match?

Do we have any guidance to offer (e.g. "don't use this idiom on tables with tens of thousands of partitions", or "under circumstances XYZ, issue multiple ALTER TABLE statements, each for one partition")?

Does any issue manifest itself just as a single slow DDL statement, or does the hms traffic potentially cause other statements to wait, or other symptoms?


http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_compute_stats.xml
File docs/topics/impala_compute_stats.xml:

Line 130:       The following <codeph>COMPUTE INCREMENTAL STATS</codeph> statements affect some but not all 
> trailing space.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_show.xml
3 files changed, 139 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#4).

Change subject: IMPALA-1654: DDL for multiple partitions
......................................................................

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Added note about performance/scalability of IMPALA-1654.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_show.xml
3 files changed, 167 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-1654: [DOCS] DDL for multiple partitions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-1654: [DOCS] DDL for multiple partitions
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/21/

-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1654: [DOCS] DDL for multiple partitions

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1654: [DOCS] DDL for multiple partitions
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Amos Bird <am...@gmail.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: No