You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/06/06 14:57:37 UTC

[Impala-CR](cdh5-trunk) IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3315

Change subject: IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests
......................................................................

IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests

Fix a couple of resource management bugs in the sorter.

Add a sort test that consumes the entire output of the sorter, executing
with various amounts of memory. This test reproduced one of the sorter
bugs, where var-len blocks were not always attached to the output batch.

Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
---
M be/src/runtime/sorter.cc
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_sort.py
3 files changed, 45 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/15/3315/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3315

to look at the new patch set (#5).

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................

IMPALA-3670: fix sorter buffer mgmt bugs

Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 45 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/15/3315/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 6: Verified+1

http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/3321/

-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3315/3/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 816:   if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR);
I think this is clearer/simpler if these two DCHECKs are reduced to:

DCHECK_EQ(CONVERT_OFFSET_TO_PTR, HasVarLenBlocks() && !is_pinned_);


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 2:

Separated out the IMPALA-3699 test fix, since that will need to go onto the release branch.

-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


IMPALA-3670: fix sorter buffer mgmt bugs

Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Reviewed-on: http://gerrit.cloudera.org:8080/3315
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Tim Armstrong <ta...@cloudera.com>
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 45 insertions(+), 25 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3315/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 894:       // Also attach the var-len blocks at eos, since we no longer need them.
> why was this needed given block at line 867?
That's the false assumption that lead to the bug.

Consider the case when we have a single in-memory run, and we sorted it in-place. Then the var-len data is out-of-order relative to the fixed-len data. Any fixed-len tuple can reference any var-len block, so we can't delete var-len blocks as we go.

In that case CONVERT_OFFSET_TO_PTR is (correctly) false and line 867 is (correctly) never executed.


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3315/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 894:       // Also attach the var-len blocks at eos, since we no longer need them.
> let's add a brief comment explaining that.  
We already have a DCHECK for CONVERT_OFFSET_TO_PTR and is_pinned_ up the top, so there was some dead code there. Cleaned up the code above.


Line 896:         for (int i = 0; i < var_len_blocks_.size(); ++i) {
Cleaned up this loop a little and added a DCHECK.


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3315/3/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 816:   if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR);
> I think this is clearer/simpler if these two DCHECKs are reduced to:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................

IMPALA-3670: fix sorter buffer mgmt bugs

Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 37 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/15/3315/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3315/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 894:       // Also attach the var-len blocks at eos, since we no longer need them.
why was this needed given block at line 867?


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................


Patch Set 6: Code-Review+2

Rebase and address comment. Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3670,IMPALA-3669: fix sorter buffer mgmt bugs and tests
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3315/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 894:       // Also attach the var-len blocks at eos, since we no longer need them.
> That's the false assumption that lead to the bug.
let's add a brief comment explaining that.  

What's the current relationship between CONVER_OFFSET_TO_PTR and is_pinned_ now? Is that something we should/could tighten up? e.g. is (CONVERT_OFFSET_TO_PTR && is_pinned_) still possible?  

If so, won't this code pass NULL to AddBlock() in that case?
If not, we should clarify that invariant and clean up line 866.


-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3670: fix sorter buffer mgmt bugs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-3670: fix sorter buffer mgmt bugs
......................................................................

IMPALA-3670: fix sorter buffer mgmt bugs

Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
---
M be/src/runtime/sorter.cc
M tests/query_test/test_sort.py
2 files changed, 44 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/15/3315/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>