You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/09/07 18:15:41 UTC

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7999

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 157 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7999/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:33:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml@1226
PS1, Line 1226:         and the statistics are computed again from the beginning. Therefore, expect a one-time
> from scratch
Done


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml@1241
PS1, Line 1241: -- by -1 under #Rows and false under Incremental stats.
> I suggest you leave out the -1 under #Rows part since that may be confusing
Done. The extra details could make good additions to the 'DROP STATS' and 'background on incremental stats' topics, but let's save that for a followup gerrit.


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml@611
PS1, Line 611:         Because the <codeph>COMPUTE STATS</codeph> statement can be resource-intensive to run frequently
> This advice isn't prescriptive enough for my taste. We should state very cl
OK, why don't I fold that into the 'incremental_stats_after_full' note below, and rearrange the text so that note comes earlier. Support is always asking for advice to be in "big red boxes" and the <note> idiom is the most eye-catching way we have to do that. (The original text in the note box is reused on 3 pages: "Partitioning", "Performance - Statistics", and "COMPUTE STATS". I'll include the expanded version of the note in all 3 places.)


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml@613
PS1, Line 613:         that is optimized for processing partitioned tables.
> I wouldn't say that incremental stats is "optimized" for partitioned tables
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 05:21:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Silvius Rus, Alex Behm, Mostafa Mokhtar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7999

to look at the new patch set (#2).

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 182 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7999/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1227
PS4, Line 1227:         <codeph>COMPUTE INCREMENTAL STATS</codeph> during the lifetime of a table,
> (or vice versa)
Done


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1243
PS4, Line 1243:         be cached on every <cmdname>impalad</cmdname> host that is eligible to be a coordinator.
> as it must be cached on the catalogd and on every ...
Done


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1244
PS4, Line 1244:         If this metadata for a table exceeds 2 GB, you might experience service downtime.
> It's worse than that. If the aggregate metadata of *all* tables combined ge
Done


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml@612
PS4, Line 612:         as new partitions are added, Impala includes a variation of this statement that is intended for use with
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml@361
PS4, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. For such tables, use
> Sorry, I disagree with the "For such tables, use COMPUTE INCREMENTAL STATS"
OK, since this bit is just a pointer to the real writeup elsewhere, why don't I delete the <note> entirely rather than trying to find some softer wording.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 22:22:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Reviewed-on: http://gerrit.cloudera.org:8080/7999
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 172 insertions(+), 110 deletions(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 6
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml@1226
PS1, Line 1226:         and the statistics are computed again from the beginning. Therefore, expect a one-time
from scratch


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/shared/impala_common.xml@1241
PS1, Line 1241: -- by -1 under #Rows and false under Incremental stats.
I suggest you leave out the -1 under #Rows part since that may be confusing. The reason is that DROP INCREMENTAL STATS will *not* modify the #Rows.

Here's how you can think about incremental stats:
COMPUTE INCREMENTAL STATS populates the "regular" stats such as the #rows and column NDVs that COMPUTE STATS also does, but in addition it also stores "incremental stats" to speed up the next COMPUTE INCREMENTAL STATS. So the "incremental" part is really this extra information which you can drop separately from the "regular" stats.

One nice thing is that you can safely DROP INCREMENTAL STATS everywhere to reduce the size of table metadata without impacting query plans because the "regular" stats are preserved.


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml@611
PS1, Line 611:         Because the <codeph>COMPUTE STATS</codeph> statement can be resource-intensive to run frequently
This advice isn't prescriptive enough for my taste. We should state very clearly that you should use either COMPUTE STATS xor COMPUTE INCREMENTAL STATS but never both. Switching during the lifetime of a table is *not* recommended, but if you really must do so then we recommend you first drop all stats before the switch (using DROP STATS and DROP INCREMENTAL STATS).


http://gerrit.cloudera.org:8080/#/c/7999/1/docs/topics/impala_partitioning.xml@613
PS1, Line 613:         that is optimized for processing partitioned tables.
I wouldn't say that incremental stats is "optimized" for partitioned tables. Foremost, incremental stats allow you to compute stats in a partition-by-partition fashion which might be a better fit for a user's data ingestion pattern. However, we should be very clear about the cost of incremental stats. Incremental stats need ~400bytes per column per partition in the table metadata (which gets disseminated and cached everywhere), so incremental stats it not a good fit for tables with a huge number of columns and partitions. If you have a partitioned table and only a few of the partitions are "active" then you can compute incremental stats for new partitions coming in and drop incremental stats for those partitions "phased" out to limit your exposure to the metadata size problems.

You can even state that the huge table metadata can crash the catalog and/or impalads due to the Java 2GB array size limit. (We're working on fixing that)

Basically I want to be sure that users understand the cost of incremental stats and the impact (crash) of when they go overboard with incremental stats. There is no graceful degradation here.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 04:20:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 5: Code-Review+2

I'm still working with Bharath on getting diagnostics for the table sizes, but this patch looks good.


-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:07:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Please see my explanation on what "incremental" stats is in previous patch 
fair pointer for me, but my comment is about whether this wording is clear for the reader.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:38:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE STATS</codeph> or
Yes!


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233
PS2, Line 1233:         When you run <codeph>COMPUTE INCREMENTAL STATS</codeph> on a table for the first time,
I suggest some minor rephrasing to drive home the "don't switch mantra" a little more, see comments.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234:         the statistics are computed again from scratch regardless of whether you previously ran
regardless of whether the table has existing stats.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236:         for scanning the entire table when switching from <codeph>COMPUTE STATS</codeph> to
when running COMPUTE INCREMENTAL STATS for the first time on a given table.

(do not mention switching... not supposed to do that)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions are actively being
If the aggregate metadata of all tables exceeds 2 GB you may experience service downtime (daemon crashes).

("serious error" really isn't clear to me)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245
PS2, Line 1245:         added or inserted into, you can run <codeph>COMPUTE INCREMENTAL STATS</codeph> for the active
Sorry my phrasing might have been misleading. By "active" partitions I meant those partitions that are being queried (i.e. read)... if you query some partitions very infrequently then there is no point in keeping incremental stats for them.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
such as partition pruning or join ordering.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@624
PS2, Line 624:         subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
Need to be careful here because "large tables" could be misinterpreted to mean "tables with many partitions".

I'd prefer to avoid the word "suitable" and instead use a phrasing that states it enables updating the stats as partitions are added. Whether incremental stats is "suitable" for anything is questionable because of the huge memory downside.

I'd agree that incremental stats could be suitable in situations where you have a huge partitioned table with a small rolling window of "active" partitions, so you only ever need to keep incremental stats on let's say <100 partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. That situation is where you switch
Rephrase to avoid "switch" since switching is bad



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(19 comments)

Almost finished with the comments. I'll touch base with Alex to get a little more clarification about which stats are safe to, or make sense to, DROP INCREMENTAL STATS for.

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE STATS</codeph> or
> Yes!
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> They are not required if you *exactly* what you are doing, but that does no
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> are these drops required?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234:         the statistics are computed again from scratch regardless of whether you previously ran
> regardless of whether the table has existing stats.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236:         for scanning the entire table when switching from <codeph>COMPUTE STATS</codeph> to
> when running COMPUTE INCREMENTAL STATS for the first time on a given table.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds
> more specifically, impalads that are also coordinators?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds
> Yes
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions are actively being
> If the aggregate metadata of all tables exceeds 2 GB you may experience ser
Done. "Serious error" was my compromise I always used for MySQL, where the open source tradition leaned towards saying "crash" but the enterprise focus suggested something more euphemistic.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Fine with me to expand this to add my earlier explanation of what the "incr
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> does that mean lack of stats has not affect on optimization or something el
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> fair pointer for me, but my comment is about whether this wording is clear 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Please see my explanation on what "incremental" stats is in previous patch 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> such as partition pruning or join ordering.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> Actually I would remove partition pruning because stats have nothing to do 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@611
PS2, Line 611: frequently
> remove
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@623
PS2, Line 623: is a shortcut
> I don't know what "shortcut" means here. I'd remove it.
I'm looking for a way to convey that it's faster to do COMPUTE INCREMENTAL STATS on a partitioned table than COMPUTE STATS. But the time savings only happens if you do C.I.S. multiple times, that is, because the table keeps getting new partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. That situation is where you switch
> Rephrase to avoid "switch" since switching is bad
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361: That situation is where you switch
> I'd reword this part ("That situation is where ..."). Suggestion:
I used wording similar to Vuk's suggestion, but without saying "do a CTAS  into a whole new table and throw away the old table", the user is likely to follow their intuition into switching from C.S. to C.I.S. on the same table.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@412
PS2, Line 412: >COMPUTE INCREMENTAL STAT
> docs in impala_common mention "drop stats" before making a switch. that's n
The conref= lines in the <note> above will pull in the same text as in implala_common.xml with all the extra warnings and instructions.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:09:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1227
PS4, Line 1227:         <codeph>COMPUTE INCREMENTAL STATS</codeph> during the lifetime of a table,
(or vice versa)


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1243
PS4, Line 1243:         be cached on every <cmdname>impalad</cmdname> host that is eligible to be a coordinator.
as it must be cached on the catalogd and on every ...


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1244
PS4, Line 1244:         If this metadata for a table exceeds 2 GB, you might experience service downtime.
It's worse than that. If the aggregate metadata of *all* tables combined gets to 2GB you may experience downtime.


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml@612
PS4, Line 612:         as new partitions are added, Impala includes a variation of this statement that is intended for use with
How about:

includes a variation of this statement that allows computing statistics on a per-partition basis such that stats can be incrementally updated when new partitions are added.


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml@361
PS4, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. For such tables, use
Sorry, I disagree with the "For such tables, use COMPUTE INCREMENTAL STATS" part. I think we need to be very careful about recommending incremental stats. We can document what it does, but I think we should go out of out way to not explicitly recommend it for any reason.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 21:53:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/165/


-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:25:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Silvius Rus, Alex Behm, Mostafa Mokhtar, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7999

to look at the new patch set (#5).

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 172 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7999/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Silvius Rus, Alex Behm, Mostafa Mokhtar, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7999

to look at the new patch set (#3).

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 184 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7999/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7999/3/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/3/docs/topics/impala_partitioning.xml@623
PS3, Line 623: is a shortcut for partitioned tables that works on a
             :         subset of partitions rather than the entire table.
How about:

"that works only on the subset of partitions of a partitioned table that have changed."



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:21:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> fair pointer for me, but my comment is about whether this wording is clear 
Fine with me to expand this to add my earlier explanation of what the "incremental" part of the stats actually is (to explain why dropping the "incremental" portion is fine).


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> such as partition pruning or join ordering.
Actually I would remove partition pruning because stats have nothing to do with that (and we don't want to imply that they do)



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:46:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(3 comments)

OK, I think this is about is far as we can go given the scope of the original request, to clarify that you use CS or CIS but not both. I got more details consulting with Alex but that would require something like a tutorial or deep dive to cover what's happening and the tradeoffs for each aspect.

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233
PS2, Line 1233:         When you run <codeph>COMPUTE INCREMENTAL STATS</codeph> on a table for the first time,
> I suggest some minor rephrasing to drive home the "don't switch mantra" a l
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245
PS2, Line 1245:         added or inserted into, you can run <codeph>COMPUTE INCREMENTAL STATS</codeph> for the active
> Sorry my phrasing might have been misleading. By "active" partitions I mean
OK, after consulting with Alex I'm paring this wording way back. Just too many ways that someone could do extra work that was counterproductive or didn't have the benefit that they assumed.


http://gerrit.cloudera.org:8080/#/c/7999/3/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/3/docs/topics/impala_partitioning.xml@623
PS3, Line 623: is a shortcut for partitioned tables that works on a
             :         subset of partitions rather than the entire table.
> How about:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 21:26:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions are actively being
> Done. "Serious error" was my compromise I always used for MySQL, where the 
Sorry I just realized the ask here. Yea we do need some numbers like #partitions #files and #blocks to come up with an estimate of a particular table's contribution to the memory footprint.  Roughly something of the order of

#partitions * 2kB + #files * 750B + # file_blocks * 300B

The problem is that there is no easy way a user can get these numbers for a given table.

Similarly for incremental stats we can use the HMS dump to get an estimate (or directly connect to the HMS db)

select sum(length(PARTITION_PARAMS.PARAM_KEY) + length(PARTITION_PARAMS.PARAM_VALUE)), PARTITIONS.TBL_ID from PARTITIONS, PARTITION_PARAMS where PARTITIONS.PART_ID = PARTITION_PARAMS.PART_ID and PARTITION_PARAMS.PARAM_KEY LIKE 'impala_intermediate%' group by PARTITIONS.TBL_ID;

But in most serious setups, end users can't do this. IMPALA-4870 aims to expose most of this data (WIP). May be we can revisit this doc once it is done.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:46:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
are these drops required?


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds
more specifically, impalads that are also coordinators?


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243: metadata for a table exceeds
              :         2 GB
is there a diagnostic page that we can point to here that explains how to find the size of metadata (either via a sql query or a monitoring webpage)?


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
does that mean lack of stats has not affect on optimization or something else?


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@611
PS2, Line 611: frequently
remove


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@623
PS2, Line 623: is a shortcut
I don't know what "shortcut" means here. I'd remove it.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361: That situation is where you switch
I'd reword this part ("That situation is where ..."). Suggestion:

From <keyword keyref="impala21_full"/> and higher, use the new feature to compute statistics incrementally on just the partitions that changed. See ...


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@412
PS2, Line 412: >COMPUTE INCREMENTAL STAT
docs in impala_common mention "drop stats" before making a switch. that's not mentioned here. what is the required/suggested usage? perhaps call this out as a "switch" and link to why the user should avoid this.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:11:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> are these drops required?
They are not required if you *exactly* what you are doing, but that does not apply to most people. Dropping first is always safe and definitely the recommended practice.

If you do not drop and switch back-and-forth between compute stats and compute incremental sats, then you may end up with "unexpected" table metadata.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds
> more specifically, impalads that are also coordinators?
Yes


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243: metadata for a table exceeds
              :         2 GB
> is there a diagnostic page that we can point to here that explains how to f
This is a longer story. I asked Bharath to help here.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> does that mean lack of stats has not affect on optimization or something el
Please see my explanation on what "incremental" stats is in previous patch sets. Dropping *incremental* stats has no effect on the row counts and NDVs used in query optimization.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:34:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Silvius Rus, Alex Behm, Mostafa Mokhtar, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7999

to look at the new patch set (#4).

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................

[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
---
M docs/shared/impala_common.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_perf_stats.xml
4 files changed, 180 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7999/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Silvius Rus <sr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>