You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2017/06/30 19:03:10 UTC

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7346

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

KuduPartitionExpr never calls ScalarExprEvaluator::FreeLocalAllocations,
causing it to hang on to memory longer than it needs it.

Since we only need the value of the row for the call into the Kudu
client, we can call FreeLocalAllocations after that.

Testing:
- Manually verified that large Kudu inserts with string columns can pass
  with much lower mem limits now.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
---
M be/src/exprs/kudu-partition-expr.cc
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7346/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

KuduPartitionExpr never calls ScalarExprEvaluator::FreeLocalAllocations,
causing it to hang on to memory longer than it needs it.

Since we only need the value of the row for the call into the Kudu
client, we can call FreeLocalAllocations after that.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Manually verified that large Kudu inserts with string columns can pass
  with much lower mem limits now.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
---
M be/src/runtime/data-stream-sender.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
2 files changed, 14 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7346/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 1:

FYI, while you're fixing bugs in this area, I noticed recently that DataStreamSender::Send() drops the status from AddRow() for Kudu exchanges. I found it as part of compiling wiht gcc7 and I was going to put together a bunch of fixes, but that one might be worth rolling into this patchset

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7346/4/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 482:   RETURN_IF_ERROR(state->CheckQueryState());
> Done
thanks, looks good to me.  we really have to clean up how memory management works with UDFs to make it less brittle, but this is good for now.


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

Please see if Michael wants to look also.

http://gerrit.cloudera.org:8080/#/c/7346/4/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 482:   COUNTER_ADD(total_sent_rows_counter_, batch->num_rows());
this is fine, but any reason to not just hoist it from the if-stmt to here, which is the pattern we usually use (adjacent with CheckQueryState(), which also has to happen after evaluating udfs)?


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/825/

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 5: Code-Review+1

LGTM.

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

KuduPartitionExpr never calls ScalarExprEvaluator::FreeLocalAllocations,
causing it to hang on to memory longer than it needs it.

Since we only need the value of the row for the call into the Kudu
client, we can call FreeLocalAllocations after that.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Manually verified that large Kudu inserts with string columns can pass
  with much lower mem limits now.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
---
M be/src/exprs/kudu-partition-expr.cc
M be/src/runtime/data-stream-sender.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
3 files changed, 14 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7346/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7346/1/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 87:   eval->FreeLocalAllocations();
> Can you free the allocations outside of the expr? i.e. in DataStreamSender:
Two problems I see with that:
1. I'm not sure how we get the ScalarExprEvaluator object to call it on, since its a parameter to KuduPartitionExpr::GetIntVal. Is it safe to store a pointer to that in KuduPartitionExpr? Are we always guaranteed that it will be passed the same KuduPartiitonExpr?
2. This also needs to work for the Sort case, where we're just passing the KuduPartitionExpr as a regular Expr to sort on.


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

The DataStreamSender never frees the local allocations for the Kudu
partition exprs causing it to hang on to memory longer than it needs to.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Added an e2e test that runs a large insert with a mem limit that
  failed with oom previously.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Reviewed-on: http://gerrit.cloudera.org:8080/7346
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/runtime/data-stream-sender.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
2 files changed, 13 insertions(+), 7 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 3:

(2 comments)

nice! After this change, do you see that there isn't that much memory accumulating in either the data sinks or in the sort?

http://gerrit.cloudera.org:8080/#/c/7346/3//COMMIT_MSG
Commit Message:

PS3, Line 12: KuduPartitionExpr
The DataStreamSender never frees the local allocations for the Kudu partition exprs causing it to hang on to memory longer than it needs to.


PS3, Line 15: Since we only need the value of the row for the call into the Kudu
            : client, we can call FreeLocalAllocations after that.
remove - we don't wanna do this per row


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 4: Code-Review+1

looks good, thanks

Let's see if Tim or Dan want to take a look too

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 4:

I think Michael should take a look at this, he's intimately familiar with the expr code.

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 1:

Thomas, wrt my concern earlier about the hash partitioning case doing the call every row, I checked with Tim and he agreed he didn't think we should do that. So when you're making the change to DataStreamSender can you move the FreeLocalAllocations call for the hash partitioning case down a few lines so it's per-batch and not per-row?

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7346/2/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 456:     }
> you can just call
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Matthew Jacobs, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7346

to look at the new patch set (#5).

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

The DataStreamSender never frees the local allocations for the Kudu
partition exprs causing it to hang on to memory longer than it needs to.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Added an e2e test that runs a large insert with a mem limit that
  failed with oom previously.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
---
M be/src/runtime/data-stream-sender.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
2 files changed, 13 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7346/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7346/4/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 482:   RETURN_IF_ERROR(state->CheckQueryState());
> this is fine, but any reason to not just hoist it from the if-stmt to here,
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 1:

(1 comment)

Can you add a test case that runs a query w/ a mem limit such that the case failed before and passes now?

http://gerrit.cloudera.org:8080/#/c/7346/1/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 87:   eval->FreeLocalAllocations();
Can you free the allocations outside of the expr? i.e. in DataStreamSender::Send(). I think we can just do it there on a per batch basis. You'll see we do free the expr allocs for the HASH_PARTITION case, though there it seems to do it every row and I think that may be unnecessarily expensive wrt CPU.


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

The DataStreamSender never frees the local allocations for the Kudu
partition exprs causing it to hang on to memory longer than it needs to.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Added an e2e test that runs a large insert with a mem limit that
  failed with oom previously.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
---
M be/src/runtime/data-stream-sender.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
2 files changed, 14 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7346/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7346/1/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 87:   eval->FreeLocalAllocations();
> Two problems I see with that:
1. I don't understand the concern about where to get it? This class is already a ScalarExpr sublass, so you can call ScalarExprEvaluator::FreeLocalAllocations(e). See my comment in DSS about the code to write. If you're concerned about freeing the child expr memory, take a look at how the ScalarExprEvaluator creation works. You'll see in DSS prepare we call Create(), the function contexts get created there. Those are what ScalarExprEvaluator::FreeLocalAllocations(e) iterates over.

2. The sort should already be freeing local allocations, should be easy to tell by looking at the memory in the profile.


http://gerrit.cloudera.org:8080/#/c/7346/2/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 456:     }
you can just call

    ScalarExprEvaluator::FreeLocalAllocations(partition_expr_evals_);


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily
......................................................................


Patch Set 3:

(2 comments)

> (2 comments)
 > 
 > nice! After this change, do you see that there isn't that much
 > memory accumulating in either the data sinks or in the sort?

yes

http://gerrit.cloudera.org:8080/#/c/7346/3//COMMIT_MSG
Commit Message:

PS3, Line 12: KuduPartitionExpr
> The DataStreamSender never frees the local allocations for the Kudu partiti
Done


PS3, Line 15: Since we only need the value of the row for the call into the Kudu
            : client, we can call FreeLocalAllocations after that.
> remove - we don't wanna do this per row
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes