You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Huaisi Xu (Code Review)" <ge...@cloudera.org> on 2016/06/21 21:50:32 UTC

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Huaisi Xu has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3429

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................

Revert "IMPALA-2473: reduce scanner memory usage"

This reverts commit 044652a2400ecbafc3bcb153af65a0ee6334d425.

Conflicts:
	be/src/exec/hdfs-parquet-scanner.cc
	be/src/exec/hdfs-scanner.cc
	be/src/runtime/row-batch.h
	testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test

Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/row-batch.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
6 files changed, 32 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/29/3429/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-2.5.0 5.7.x) CDH-41243: Parquet scanner regression on wide tables

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: CDH-41243: Parquet scanner regression on wide tables
......................................................................


Patch Set 2:

Tim, release build looks good. it is similar to 5.4.10 after revert.

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

(4 comments)

I think we should only backport the changes to hdfs-parquet-scanner.cc and nested-types-tpch.test.

http://gerrit.cloudera.org:8080/#/c/3429/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1646
I think this is the only change we need to backport.


http://gerrit.cloudera.org:8080/#/c/3429/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 173
Don't need to revert this change.


http://gerrit.cloudera.org:8080/#/c/3429/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Let's not revert the changes in this file.


http://gerrit.cloudera.org:8080/#/c/3429/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 139:   bool AtCapacity(MemPool* tuple_pool) {
We don't want to backport this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0 5.7.x) CDH-41243: Parquet scanner regression on wide tables

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: CDH-41243: Parquet scanner regression on wide tables
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

Local testing in a vm shows reverting this on 57x is 2x slower than on 5.4.10. I verified that the returned row batch contains 1024 rows in aggregation node.

most of the time is still spent in scanner. Tim you know what may went wrong? 

I have conflict showed in the commit message to give you an idea for code review. I am not that sure about row-batch.h's revert. could you take a look?

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) CDH-41243: Parquet scanner regression on wide tables

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: CDH-41243: Parquet scanner regression on wide tables
......................................................................


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) CDH-41243: Parquet scanner regression on wide tables

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has uploaded a new patch set (#2).

Change subject: CDH-41243: Parquet scanner regression on wide tables
......................................................................

CDH-41243: Parquet scanner regression on wide tables

IMPALA-2473 introduced a check that prevent row batches growing
beyond 8MB, but it has a corner case that when an empty row
batch is larger than 8MB, it returns this row batch immediately
after it materialize one row, essentailly setting batch_size=1.

Revert part of "IMPALA-2473: reduce scanner memory usage":
   be/src/exec/hdfs-parquet-scanner.cc
   testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test

Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
---
M be/src/exec/hdfs-parquet-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
2 files changed, 0 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/29/3429/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-2.5.0 5.7.x) CDH-41243: Parquet scanner regression on wide tables

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has submitted this change and it was merged.

Change subject: CDH-41243: Parquet scanner regression on wide tables
......................................................................


CDH-41243: Parquet scanner regression on wide tables

IMPALA-2473 introduced a check that prevent row batches growing
beyond 8MB, but it has a corner case that when an empty row
batch is larger than 8MB, it returns this row batch immediately
after it materialize one row, essentailly setting batch_size=1.

Revert part of "IMPALA-2473: reduce scanner memory usage":
   be/src/exec/hdfs-parquet-scanner.cc
   testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test

Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Reviewed-on: http://gerrit.cloudera.org:8080/3429
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Huaisi Xu <hx...@cloudera.com>
---
M be/src/exec/hdfs-parquet-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
2 files changed, 0 insertions(+), 38 deletions(-)

Approvals:
  Huaisi Xu: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3429/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 125:   bool AtCapacity() {
the original AtCapacity() is one line before IMPALA-2473. but other commits (IMPALA-2757) added more code to this. not sure if they are related to IMPALA-2473. http://github.mtv.cloudera.com/CDH/Impala/commit/eae1fa16f0f7daf119d306cf554e21d67983970e


-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

Reverting the whole commit makes sense in 5.4.x, but there are other changes in 5.7.x that build on top of it. We don't want to revert other improvements.

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

> Reverting the whole commit makes sense in 5.4.x, but there are
 > other changes in 5.7.x that build on top of it. We don't want to
 > revert other improvements.

got it! that was what I suspected.. thx

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

What about in 5.4.x?

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-2.5.0 5.7.x) Revert "IMPALA-2473: reduce scanner memory usage"

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: Revert "IMPALA-2473: reduce scanner memory usage"
......................................................................


Patch Set 1:

> (4 comments)
 > 
 > I think we should only backport the changes to hdfs-parquet-scanner.cc
 > and nested-types-tpch.test.

OK. that is clearer... since the original plan we had was to revert the whole commit.

-- 
To view, visit http://gerrit.cloudera.org:8080/3429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I55a94363840c2b3804ed070c23f57f2117b4fab3
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.x
Gerrit-Owner: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No