You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/08/15 21:34:54 UTC
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
John Russell has uploaded a new change for review.
http://gerrit.cloudera.org:8080/7680
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 595 insertions(+), 0 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7680/1
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 1:
For my TABLESAMPLE examples, I used a trivial amount of data just to make it obvious what was happening based on percentages vs. number and sizes of data files. Perhaps Alex or Mostafa has a good example of a "beefy" query that illustrates a big speedup or less overhead when using sampling...
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 1:
(8 comments)
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:
Line 863: queries to understand the data distribution and plan a partitioning strategy,
> I'd leave out the "to understand the data distribution and plan a partition
Done
Line 865: to only a percentage of data within the table. This technique reduces the overhead
> Nice!
Done
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:
Line 175: clause immediately after a table reference, to specify that the query only processes an
> a certain percentage of the table data? an "arbitrary portion" sounds stran
Done
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:
Line 57: The <codeph>TABLESAMPLE</codeph> clause comes immediately after a table name.
> table name or alias, e.g.
Done
Line 69: processing a particular set of data files, the proportion of sampled data from the
> suggest "selecting a random set of data files" instead of "processing a par
Done
Line 77: sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph>
> suggest "selects" instead of "considers"
Done
Line 172: by itself, because all phases of query execution use less data overall.
> This is not necessarily true, depending on whether the small query optimiza
I'll soften the wording just a little. "...often makes..."
Line 257: table metadata is not updated by a <codeph>REFRESH</codeph>
> whitespace
Done
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 2: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Reviewed-on: http://gerrit.cloudera.org:8080/7680
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 611 insertions(+), 17 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Alex Behm: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 2: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 2:
Build started: https://jenkins.impala.io/job/gerrit-docs-submit/151/
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
Patch Set 1:
(8 comments)
Looks good, just minor comments
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:
Line 863: queries to understand the data distribution and plan a partitioning strategy,
I'd leave out the "to understand the data distribution and plan a partitioning strategy" because that already supposes a certain use case in the user's mind. I'd not make any assumptions about what the user wants to do with TABLESAMPLE.
Line 865: to only a percentage of data within the table. This technique reduces the overhead
Nice!
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:
Line 175: clause immediately after a table reference, to specify that the query only processes an
a certain percentage of the table data? an "arbitrary portion" sounds strange and it's not really completely arbitrary
http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:
Line 57: The <codeph>TABLESAMPLE</codeph> clause comes immediately after a table name.
table name or alias, e.g.
from mytable t tablesample ...
Line 69: processing a particular set of data files, the proportion of sampled data from the
suggest "selecting a random set of data files" instead of "processing a particular set of data files"
Line 77: sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph>
suggest "selects" instead of "considers"
Line 172: by itself, because all phases of query execution use less data overall.
This is not necessarily true, depending on whether the small query optimization kicks in with limit.
Line 257: table metadata is not updated by a <codeph>REFRESH</codeph>
whitespace
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).
Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................
IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 611 insertions(+), 17 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7680/2
--
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>