You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2023/05/12 13:55:00 UTC

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19879


Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240, one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not power of 2 and leads to overallocation.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s->2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 49 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/19879/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13031/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 May 2023 15:58:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19879

to look at the new patch set (#4).

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not a power of 2 and leads to overallocation.
Created IMPALA-12142 for the potential default fetch_size change.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s -> 2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 51 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/19879/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13032/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 May 2023 16:16:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG@21
PS3, Line 21: Also noticed that the current default fetch_size=10240 in impala-shell
> Should mention that the fetch_size has been changed.
This patch doesn't change the default - created IMPALA-12142 to track this.


http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG@23
PS3, Line 23: not a pow
> Nit: not a power
Done


http://gerrit.cloudera.org:8080/#/c/19879/3/be/src/service/hs2-util.h
File be/src/service/hs2-util.h:

http://gerrit.cloudera.org:8080/#/c/19879/3/be/src/service/hs2-util.h@36
PS3, Line 36: /// Evaluate 'expr_eval' over the row [start_idx, start_idx + num_rows) from 'batch' into
> Should mention 'expected_result_count' in the comment.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:22:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13045/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:45:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG@21
PS3, Line 21: Also noticed that the current default fetch_size=10240 in impala-shell
Should mention that the fetch_size has been changed.


http://gerrit.cloudera.org:8080/#/c/19879/3//COMMIT_MSG@23
PS3, Line 23: not power
Nit: not a power


http://gerrit.cloudera.org:8080/#/c/19879/3/be/src/service/hs2-util.h
File be/src/service/hs2-util.h:

http://gerrit.cloudera.org:8080/#/c/19879/3/be/src/service/hs2-util.h@36
PS3, Line 36: /// Evaluate 'expr_eval' over the row [start_idx, start_idx + num_rows) from 'batch' into
Should mention 'expected_result_count' in the comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 10:53:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9317/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:37:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19879

to look at the new patch set (#5).

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not a power of 2 and leads to overallocation.
Created IMPALA-12142 for the potential default fetch_size change.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s -> 2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 50 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/19879/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not a power of 2 and leads to overallocation.
Created IMPALA-12142 for the potential default fetch_size change.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s -> 2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Reviewed-on: http://gerrit.cloudera.org:8080/19879
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 50 insertions(+), 31 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 3: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 01:52:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 23:30:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has removed a vote on this change.

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19879

to look at the new patch set (#2).

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not power of 2 and leads to overallocation.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s->2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 49 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/19879/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13046/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:56:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:37:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13021/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 14:13:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19879

to look at the new patch set (#3).

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not power of 2 and leads to overallocation.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s->2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
---
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
5 files changed, 49 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/19879/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 12:36:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9319/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 18:12:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12138: Optimize HS2 result vector allocations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19879 )

Change subject: IMPALA-12138: Optimize HS2 result vector allocations
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9317/


-- 
To view, visit http://gerrit.cloudera.org:8080/19879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Gerrit-Change-Number: 19879
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 17:52:30 +0000
Gerrit-HasComments: No