You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Dan Hecht (Code Review)" <ge...@cloudera.org> on 2016/05/02 20:30:19 UTC

[Impala-CR](cdh5-trunk) IMPALA-1583: Simplify PartitionedHashJoinNode::ProcessProbeBatch()

Dan Hecht has posted comments on this change.

Change subject: IMPALA-1583: Simplify PartitionedHashJoinNode::ProcessProbeBatch()
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/2893/5/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 132:     // At this point the probe is considered matched.
confusing because in ProcessProbeRowRightSemiJoins() we had the same comment but it was the hash table node that has the match flag. clarify these two comments.


Line 134: // We can safely ignore this probe row for left anti joins.
this comment sounds like we should only do it for left anti join, but we always do SetAtEnd(). clarify/fix.


Line 156:                          current_probe_row_, status))) {
DCHECK(!status.ok()); ?


Line 197: outter
outer


Line 244:           if (UNLIKELY(!AppendRow(null_probe_rows_, current_probe_row_, status))) {
DCHECK(!status.ok());


Line 270:     if (UNLIKELY(!AppendRow(partition->probe_rows(), current_probe_row_, status))) {
DCHECK(!status.ok());


Line 333:     DCHECK(hash_tbl_iterator_.AtEnd());
DCHECK_GE(num_rows_added, 0); ? i.e. it can't be -1 on this path, right?


Line 341:   DCHECK_LE(num_rows_added, max_rows);
DCHECK_EQ(num_rows_added == -1, !status.ok());


http://gerrit.cloudera.org:8080/#/c/2893/5/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 177: out_batch
here and elsewhere - update comment for out_batch_iterator


Line 181: remaining
this is confusing because it's only the remaining capacity on the first iteration. rather than passing both num_rows_added and max_rows, could we just pass a *remaining_capacity which is decremented for each row (and then you have to stop when it hits 0)?

If that doesn't work (maybe because of the -1 case for num_rows_added, let's document that 'max_rows' is the stopping point for num_rows_added.


Line 189: '.
what about the RIGHT_ANTI_JOIN case?  could word it similar to ProcessProbeRowLeftSemiJoins() comment instead.


Line 230:   /// next probe row and its corresponding partition.
here and others: document num_rows_added == -1 case and relationship to status.


-- 
To view, visit http://gerrit.cloudera.org:8080/2893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2091bdf97ab34c5cdc84e84394c579a5b36afc0
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes