You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/09/19 18:40:27 UTC

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4448

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
10 files changed, 51 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 758:     if (row_batch->num_blocks() > 0) row_batch->MarkFlushResources();
> Can we move this logic into BufferedTupleStream::Close()?
I deliberately avoided doing that because the decision to flush should be the responsibility of the ExecNode, rather than something that is hidden in a utility class. 

The BufferedTupleStream doesn't really know what the ExecNode's memory requirements are for the next phase of its processing, so shouldn't be making the decision to flush resources.


http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 356:   bool flush_resources_;
> I don't see flush_resources_ being read/returned. Could we replace FlushRes
The bit is transferred in RowBatch::TransferResourceOwnership(), which is the key difference from MarkAtCapacity(). We need the extra bit of information to distinguish between a batch that's just full and a request to flush the resources all the way up the operator pipeline. The old AddTupleStream() API essentially did that, but in an obfuscated way.

E.g. if you have a pipeline of joins, and attach a Block at the bottom join node then call FlushResources(), you want resources to be flushed all the way up before pulling the next row from the bottom join node. The resources should be transferred from the probe batch to the output batch at each join node, so it will propagate all the way up.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8: Code-Review+1

Carry

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 758:     if (row_batch->num_blocks() > 0) row_batch->MarkFlushResources();
> I deliberately avoided doing that because the decision to flush should be t
Thanks for explaining, that makes sense to me.

I do have a slight preference for adding a flag to Close() to indicate whether MarkFlushResources() should be called on the batch for the following reasons:
* it forces callers of Call() to think about the desired behavior
* we only want to MarkFlushResources() if the stream being closed attached buffers, but in the current code we may mark  flush resources even if a a block had already been attached to the batch from elsewhere, so that is slightly clearer


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 11: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4448/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS11, Line 385: copied out or sent up the plan
              :   // tree via MarkNeedsDeepCopy()
copied out now or sent up the tree and copied out by a blocking ancestor.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4448/9/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

Line 653:     // the caller can pass the rows up the operator tree.
> let's make this comment actually be useful.  how about something like:
Done


http://gerrit.cloudera.org:8080/#/c/4448/9/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS9, Line 77: Different modes for flushing resources
> these are really different modes for flushing resources.  isn't it just a f
Done


PS9, Line 227: as soon as possible
> this is pretty vague.  how about saying what the rule is specifically:
That's clearer, thanks.


PS9, Line 242:  
> (that have not been attached to the batch)
Done


PS9, Line 243: .
> and deep copied.
Done. Had a few tries at wording this paragraph hopefully it's clearer now.


PS9, Line 244: This is used to control memory management for streaming rows
> this is kind of misleading and confusing now especially since "flush" contr
Just deleted it, I think it's unnecessary.


Line 246:   /// are not allowed to accumulate batches with the 'needs_deep_copy' flag.
> how about saying this is deprecated?
Added a TODO and reference to the JIRA.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 7: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4448/7/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 723:   DCHECK(batch->AtCapacity()); // Flush resources flash should have been set.
typo: flash -> flag


http://gerrit.cloudera.org:8080/#/c/4448/7/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 219:   /// the original owner, even when the ownership of batches s transferred. If the
typo: is


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4448/8//COMMIT_MSG
Commit Message:

Did the need_to_return/needs_deep_copy rename


http://gerrit.cloudera.org:8080/#/c/4448/8/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

Line 196:       batch->AddBlock(block, flush);
> when would we ever want to have batch != NULL and no flush?  Shouldn't we a
I don't think the BufferedTupleStream abstraction show have to know anything about the buffer management algorithm in the ExecNode implementations. Always having Close() flush assumes that that's what the ExecNode wants, but I don't think that would always be the case. E.g. it could make sense to have an ExecNode that flushes only when it needs more memory rather than every time it calls a particular BufferedTupleStream method.

This is also to make it explicit in the calling code that a flush is occurring (since it's normally an important part of the buffer management algorithm and it's pretty obscure if it's hidden in a Close() function).

We also have callsites where we call Close(batch, FlushMode::FLUSH_RESOURCES) where batch is sometimes NULL. E.g. Partition::Close(RowBatch* batch). In the case it makes sense to ignore just the flag if the batch is NULL.


http://gerrit.cloudera.org:8080/#/c/4448/8/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 828:       output_batch->MarkFlushResources();
> if we need to expose MarkFlushResources() too, how about not also having th
I added the flag mainly as a way to force the caller to think about whether they should be flushing resources or not. Happy to remove it if you think it's unnecessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 2:

> (2 comments)

What if the blocksize is set to something other than 8MB by startup option? Won't that defeat the auxiliary check potentially leading to issues?

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8:

Not really - flush_resources without need_to_return implies 2a, whereas flush_resources with need_to_return implies 2b. Without an additional flag how does the operator know whether the batch is referencing non-attached resources and whether it is dealing with 2a or 2b?

In case 2a blocking operators should acquire ownership if it hasn't already acquired it (the wrinkle is that there's no way to acquire a Block now so the accounting is wrong, but that will be addressed by the buffer pool). In case 2b, a copy-out is required.

This behaviour was essentially already present, it was just implicit and some operators depended on it in subtle ways. Before this patch, attaching some resources (buffered tuple streams) forced a flush but attaching others did not force a flush (blocks, I/O buffers). Some of the operators relied on this implicit behaviour. E.g. the hash join node assumes that attaching a buffered tuple stream would mean that its reservations get freed up (this currently is true unless its feeding into the build of a NLJ).

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#3).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
10 files changed, 107 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#7).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorter.cc
12 files changed, 191 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#8).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorter.cc
12 files changed, 191 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 1:

This depends on https://gerrit.cloudera.org/#/c/3873/ (I based it on that to avoid a later messy rebase).

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 4: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/252/

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Reviewed-on: http://gerrit.cloudera.org:8080/4448
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
16 files changed, 240 insertions(+), 159 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 11: Code-Review+1

Rebase, carry Alex's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 1:

(2 comments)

Changes make a lot of sense to me.

http://gerrit.cloudera.org:8080/#/c/4448/1/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 1185: }
Add a simple test that checks pinned blocks are attached to a non-NULL batch when calling Close() on the stream.


http://gerrit.cloudera.org:8080/#/c/4448/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 129:   bool ALWAYS_INLINE AtCapacity() {
The meaning of AtCapacity() has slightly changed and I think it might be a problem, but you can convince me otherwise :)

1. I think we still need to check need_to_return_
2. Before, when attaching a stream to a batch, that batch would always be considered at capacity. Now we are attaching pinned blocks, but those don't imply AtCapacity(). I'm worried about running into a scenario where we call GetNext() on an exec node before all those pinned blocks are  unpinned/freed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8:

(2 comments)

Yeah, I was thinking we could always attach resources to make it unambiguous. But I guess we can't necessarily do that on the non-close paths yet, until we have reservations?

I think one thing that makes the row-batch code tough to follow is that "needs to return" is such vague terminology (it already was), and that just gets worse since "flush" also means (streaming) operators need to return immediately.

So, what do you think about renaming "needs to return" with "needs deep copy" or something like that?  In my mind, "needs deep copy" more clearly implies that the batch needs to be flushed up through the streaming operators and then copied at the blocking operator, which also makes the relationship with "flushing" more clear.

http://gerrit.cloudera.org:8080/#/c/4448/8/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

Line 196:       batch->AddBlock(block, flush);
when would we ever want to have batch != NULL and no flush?  Shouldn't we always flush resources on closing if we give it a batch to attach to?

Likewise, if there is no batch, then we must have no flush (or rather, the flush mode doesn't make sense).

So, why both having the caller pass in 'flush'?  Why not just always flush in this branch?


http://gerrit.cloudera.org:8080/#/c/4448/8/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 828:       output_batch->MarkFlushResources();
if we need to expose MarkFlushResources() too, how about not also having the flag to AddBlock() but instead just calling this routine.  That more clearly separates attaching and flushing concept (which you commit message says was a goal and I agree it seems better).


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 13: Code-Review+2

(1 comment)

Carry +2

http://gerrit.cloudera.org:8080/#/c/4448/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS11, Line 385: copied out now or sent up the
              :   // plan and copied out by a blo
> copied out now or sent up the tree and copied out by a blocking ancestor.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 3:

Rebase, carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#9).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
16 files changed, 230 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#5).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
11 files changed, 158 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#10).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
16 files changed, 239 insertions(+), 158 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4448/1/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 1185: }
> Add a simple test that checks pinned blocks are attached to a non-NULL batc
Done


http://gerrit.cloudera.org:8080/#/c/4448/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 129:   bool ALWAYS_INLINE AtCapacity() {
> The meaning of AtCapacity() has slightly changed and I think it might be a 
We didn't check need_to_return_ before, it was sufficient to check num_rows_ == capacity_. The DCHECK I removed was redundant with the remaining one.

I think you're right that it's changed slightly, but I think the new behaviour makes more sense. The cases are:
1. The stream is empty - it doesn't make sense to consider it AtCapacity()
2. The stream has only small buffers < 8MB - in this case I think it's fine to not consider the batch AtCapacity() until the memory adds up to >= 8MB.
3. The stream has at least one large buffer >= 8MB - the aux memory usage check will cause this to be AtCapacity().

There's a bigger preexisting problem with the accounting of the attached blocks: the accounting isn't updated for the purpose of MemTrackers and BufferedBlockMgr reservations, so it's possible for another exec node to accumulate batches that are accounted against a different node. I'm going to need to fix that for the BufferPool changes, but I'm punting on it for now. This change will make that change simpler.


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#12).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
16 files changed, 240 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#4).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
10 files changed, 110 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Internal Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4448

to look at the new patch set (#6).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

This patch also separates the concepts of attaching resources and
flushing resources. Previously RowBatch::AddTupleStream() implicitly
flushed resources from the ExecNode pipeline, which various ExecNodes
relied on to free up memory reservations for subsequent processing. In a
subsequent patch I want the FlushResources() API to become stronger: it
will force streaming ExecNodes to flush their batches or forces
blocking ExecNodes to acquire ownership of the memory resources.
We can't do this right now since we don't have a way to transfer
ownership of BufferedBlockMgr Blocks.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/sorter.cc
12 files changed, 162 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8:

Another way of looking at it is that ignoring flush_resources is a resource management bug, since the resources will be accounted incorrectly or not freed, but ignoring need_to_return is a correctness/crash bug, since the memory referenced will be freed or repurposed.

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 4: Code-Review+2

Missed updating a DCHECK for the nested types codepath.

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 5:

I ran into some problems with spilling_test, where it does run with the read_size < 8mb configuration we had spoken about.

As a solution I added in a new API that I think is the direction we'll need to go with buffer management, where ExecNodes have more control over when resources are flushed.

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 2:

I agree that it could make a difference but I don't think we want to preserve the old behaviour. I think we should be consistent between Blocks attached by the sorter and Blocks attached by Join/Agg nodes. I looked at the history and the motivation seems to be in the join node:     

    // We want to return as soon as we have attached a tuple stream to the out_batch
    // (before preparing a new partition). The attached tuple stream will be recycled
    // by the caller, freeing up more memory when we prepare the next partition.
    if (out_batch->AtCapacity()) break;

So it seems to be a heuristic to break out of the loop in the join node. I think if we want to preserve that behaviour we should change the PHJ node to explicitly check whether there were blocks attached to the batch, since that seems to be the actual intention. It's unclear if it's worth adding that special-case to avoid accumulating a littl extra memory before we flush out the batch.

It's somewhat broken anyway since there's nothing to prevent the other exec node from holding onto the memory (e.g. if it's a NLJ node).

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 3: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/247/

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 758:     if (row_batch->num_blocks() > 0) row_batch->MarkFlushResources();
> Thanks for explaining, that makes sense to me.
Added the flag to RowBatch::AddBlock(), which is forwarded from BufferedTupleStream::CLose()


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................

IMPALA-4023: don't attach buffered tuple streams to batches

This simplifies the memory transfer model by eliminating one category of
resources that can be attached.

Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
10 files changed, 107 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4448/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 2: Code-Review+2

Please check if Alex's concerns are addressed before committing.

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 13: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 8:

It's not clear to me how this is really any different from needs-to-return.  Don't they both mean:
1) For streaming operators, return.
2) For blocking operators:
  a) If the resource can be transferred, then take ownership.
  b) if the resource can't be transferred, then copy out.

And case 2b will go away once all resources are transferrable.  But, why do we need both mechanisms in the mean time?

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 9:

(7 comments)

Thanks, the renaming helps a lot.

My feeling is that the separation of attaching resources and flushing resources would be clearest if the exec node just used MarkFlushResources() directly rather than threading through the parameter.  But if you and Alex feel otherwise, I'm okay with leaving it this way.

http://gerrit.cloudera.org:8080/#/c/4448/9/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

Line 653:     // the caller can pass the rows up the operator tree.
let's make this comment actually be useful.  how about something like:

No more data in this block.  The batch must be immediately returned up the operator tree and deep copied so that NextReadBlock() can reuse the read block's buffer.


http://gerrit.cloudera.org:8080/#/c/4448/9/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS9, Line 77: Different modes for flushing resources
these are really different modes for flushing resources.  isn't it just a flag to indicate whether resources need to be flushed or not?


PS9, Line 227: as soon as possible
this is pretty vague.  how about saying what the rule is specifically:

Used by an operator to indicate that it cannot produce more rows until the resources that it has attached to the row batch are freed or acquired by an ancestor operator.


PS9, Line 242:  
(that have not been attached to the batch)


PS9, Line 243: .
and deep copied.

(to differentiate from flush where the resources can be acquired).


PS9, Line 244: This is used to control memory management for streaming rows
this is kind of misleading and confusing now especially since "flush" controls this.


Line 246:   /// are not allowed to accumulate batches with the 'needs_deep_copy' flag.
how about saying this is deprecated?


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 9:

I feel like it's a bit less error-prone with the explicit flags, so I'll stick with that approach.

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 758:     if (row_batch->num_blocks() > 0) row_batch->MarkFlushResources();
Can we move this logic into BufferedTupleStream::Close()?


http://gerrit.cloudera.org:8080/#/c/4448/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 356:   bool flush_resources_;
I don't see flush_resources_ being read/returned. Could we replace FlushResources() with MarkAtCapacity()?


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4448/7/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 723:   DCHECK(batch->AtCapacity()); // Flush resources flash should have been set.
> typo: flash -> flag
Done


http://gerrit.cloudera.org:8080/#/c/4448/7/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 219:   /// the original owner, even when the ownership of batches s transferred. If the
> typo: is
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes