You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/08/15 21:34:54 UTC

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7680

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................

IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 595 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7680/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 1:

For my TABLESAMPLE examples, I used a trivial amount of data just to make it obvious what was happening based on percentages vs. number and sizes of data files. Perhaps Alex or Mostafa has a good example of a "beefy" query that illustrates a big speedup or less overhead when using sampling...

-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

Line 863:   queries to understand the data distribution and plan a partitioning strategy,
> I'd leave out the "to understand the data distribution and plan a partition
Done


Line 865:   to only a percentage of data within the table. This technique reduces the overhead
> Nice!
Done


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:

Line 175:         clause immediately after a table reference, to specify that the query only processes an
> a certain percentage of the table data? an "arbitrary portion" sounds stran
Done


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:

Line 57:       The <codeph>TABLESAMPLE</codeph> clause comes immediately after a table name.
> table name or alias, e.g.
Done


Line 69:       processing a particular set of data files, the proportion of sampled data from the
> suggest "selecting a random set of data files" instead of "processing a par
Done


Line 77:       sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph>
> suggest "selects" instead of "considers"
Done


Line 172:       by itself, because all phases of query execution use less data overall.
> This is not necessarily true, depending on whether the small query optimiza
I'll soften the wording just a little. "...often makes..."


Line 257:       table metadata is not updated by a <codeph>REFRESH</codeph> 
> whitespace
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Reviewed-on: http://gerrit.cloudera.org:8080/7680
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 611 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/151/

-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................


Patch Set 1:

(8 comments)

Looks good, just minor comments

http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

Line 863:   queries to understand the data distribution and plan a partitioning strategy,
I'd leave out the "to understand the data distribution and plan a partitioning strategy" because that already supposes a certain use case in the user's mind. I'd not make any assumptions about what the user wants to do with TABLESAMPLE.


Line 865:   to only a percentage of data within the table. This technique reduces the overhead
Nice!


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_select.xml
File docs/topics/impala_select.xml:

Line 175:         clause immediately after a table reference, to specify that the query only processes an
a certain percentage of the table data? an "arbitrary portion" sounds strange and it's not really completely arbitrary


http://gerrit.cloudera.org:8080/#/c/7680/1/docs/topics/impala_tablesample.xml
File docs/topics/impala_tablesample.xml:

Line 57:       The <codeph>TABLESAMPLE</codeph> clause comes immediately after a table name.
table name or alias, e.g.

from mytable t tablesample ...


Line 69:       processing a particular set of data files, the proportion of sampled data from the
suggest "selecting a random set of data files" instead of "processing a particular set of data files"


Line 77:       sampling considers the same set of data files each time. <codeph>REPEATABLE</codeph>
suggest "selects" instead of "considers"


Line 172:       by itself, because all phases of query execution use less data overall.
This is not necessarily true, depending on whether the small query optimization kicks in with limit.


Line 257:       table metadata is not updated by a <codeph>REFRESH</codeph> 
whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement
......................................................................

IMPALA-5309: [DOCS] Add TABLESAMPLE clause to SELECT statement

Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_select.xml
M docs/topics/impala_subqueries.xml
A docs/topics/impala_tablesample.xml
M docs/topics/impala_views.xml
10 files changed, 611 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7680/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd7e5b7cfe11c986348bc6c8d1b11921f34df336
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>