You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/06/26 22:50:13 UTC

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7300

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7300/3/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 52:       on the right-hand side of the join is broadcast. This behavior
> I believe the word right-hand side is reserved to the position of the table
I knew 'RHS' was going to get me in trouble. :-)


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:           <topicref rev="2.9.0 IMPALA-5381 IMPALA-5583" href="topics/impala_default_join_distribution_mode.xml"/>
> Why mention IMPALA-5583 also?
In the past I've referred both to the "code implementation" JIRA and "document the new feature" JIRA in this kind of context. Just for ease of future maintenance and tracing if something is wrong or missing on the doc side. I guess that's less important when the doc one is a subtask of the code one. I'll take it out.


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:       This option determines the join strategy that Impala uses when any of the tables
> We deliberately did not use "join strategy" in the option name because stra
Can you elaborate a little on the meaning of "join distribution mode" then? That's not terminology we've used elsewhere in the docs.


Line 47:       Hive <codeph>ANALYZE TABLE</codeph> statement.
> Sure you want to keep the ANALYZE TABLE part? In most situations we cannot 
Done


Line 48:       By default, when a table involved in the join query does not have statistics,
> Accuracy could be improved. What if both tables do not have stats? Clarify 
What is the answer if both tables are missing stats? Does Impala make a deduction about which is smaller and that one gets broadcast while the other doesn't?


Line 58:       might be missing statistics due to the overhead involved in calculating them,
> I wouldn't suppose a particular reason for not having stats.
Done


Line 61:       of a table involved in a join query and only transmits a portion of the table
> Not very accurate, both tables are transferred across the network. Not sure
I'd prefer to prepare and fine-tune a brief explanation so I could reuse that wording in places where such terminology is mentioned to a reader that might not have seen it before. Anyone who needs detailed background info can follow the "related info" links at the end of the page.


Line 67:       recommended when setting up and deploying new clusters. This setting is
> We should mention why we recommend this. SHUFFLE is generally a safer optio
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 48:       Impala uses the <q>broadcast</q> technique that transmits the entire contents
> If both tables are missing stats the table listed first in the query will b
Done


Line 61:       the setting <codeph>DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</codeph> lets you
> This is the description for the SHUFFLE join, we should use similar wording
Done


http://gerrit.cloudera.org:8080/#/c/7300/2/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:       This option determines the join distribution that Impala uses when any of the tables
> Alex's comment around not using "Join strategy" hasn't been addressed. 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:           <topicref rev="2.9.0 IMPALA-5381 IMPALA-5583" href="topics/impala_default_join_distribution_mode.xml"/>
Why mention IMPALA-5583 also?


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:       This option determines the join strategy that Impala uses when any of the tables
We deliberately did not use "join strategy" in the option name because strategy is too generic.


Line 47:       Hive <codeph>ANALYZE TABLE</codeph> statement.
Sure you want to keep the ANALYZE TABLE part? In most situations we cannot effectively use what Hive produces.


Line 48:       By default, when a table involved in the join query does not have statistics,
Accuracy could be improved. What if both tables do not have stats? Clarify that one table is going to be broadcast. Might even be worth explicitly listing what happens if one table has stats and the other doesn't (the one without stats will be broadcast)


Line 58:       might be missing statistics due to the overhead involved in calculating them,
I wouldn't suppose a particular reason for not having stats.


Line 61:       of a table involved in a join query and only transmits a portion of the table
Not very accurate, both tables are transferred across the network. Not sure if we need to explain the differences between broadcast+shuffle here, maybe provide a link to their explanation/definition?


Line 67:       recommended when setting up and deploying new clusters. This setting is
We should mention why we recommend this. SHUFFLE is generally a safer option because the join build will be less prone to spilling and/or OOM.


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#4).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/140/

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Mostafa Mokhtar (Code Review)" <ge...@cloudera.org>.
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7300/3/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 52:       on the right-hand side of the join is broadcast. This behavior
I believe the word right-hand side is reserved to the position of the table in the plan rather than the order of the tables in the SQL query.

Consider "table referenced latter in the join order is broadcast"


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 1:

This gerrit is a "stake in the ground" with basic details about the query option. If there are other places where the option could be mentioned (under joins, compute stats, etc.), I'll handle those in a separate gerrit.

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Reviewed-on: http://gerrit.cloudera.org:8080/7300
Reviewed-by: Mostafa Mokhtar <mm...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Mostafa Mokhtar: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Mostafa Mokhtar (Code Review)" <ge...@cloudera.org>.
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 48:       Impala uses the <q>broadcast</q> technique that transmits the entire contents
> What is the answer if both tables are missing stats? Does Impala make a ded
If both tables are missing stats the table listed first in the query will be the probe side while the second table will be broadcasted.


Line 61:       from each table to each executor node.
> I'd prefer to prepare and fine-tune a brief explanation so I could reuse th
This is the description for the SHUFFLE join, we should use similar wording

[SHUFFLE] - Makes that join operation use the "partitioned" technique, which divides up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other nodes for processing. (The keyword SHUFFLE is used to indicate a "partitioned join", because that type of join is not related to "partitioned tables".) Since the alternative "broadcast" join mechanism is the default when table and index statistics are unavailable, you might use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more efficient for joins between large tables of similar size.


http://gerrit.cloudera.org:8080/#/c/7300/2/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:       This option determines the join strategy that Impala uses when any of the tables
Alex's comment around not using "Join strategy" hasn't been addressed. 

Can you please use "join distribution" instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "Mostafa Mokhtar (Code Review)" <ge...@cloudera.org>.
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
......................................................................

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>