You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sahil Takiar (Code Review)" <ge...@cloudera.org> on 2019/08/23 16:47:25 UTC

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14129


Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M tests/query_test/test_result_spooling.py
3 files changed, 191 insertions(+), 42 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 19:38:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows.
> I did some profiling, and I think the answer is that it depends on the netw
Works for me (or we could hardcode 10000 or whatever too).



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Aug 2019 00:44:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14129

to look at the new patch set (#6).

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/query-state.h
M tests/query_test/test_result_spooling.py
5 files changed, 306 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4415/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Aug 2019 16:59:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14129

to look at the new patch set (#4).

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/query-state.h
M tests/query_test/test_result_spooling.py
5 files changed, 313 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 5:

Re-based patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Aug 2019 15:50:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4346/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 17:27:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Reviewed-on: http://gerrit.cloudera.org:8080/14129
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/query-state.h
M tests/query_test/test_result_spooling.py
5 files changed, 306 insertions(+), 56 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 15:33:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14129

to look at the new patch set (#5).

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/query-state.h
M tests/query_test/test_result_spooling.py
5 files changed, 312 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 4:

Added additional tests to buffered-tuple-stream-test based on the conversation in IMPALA-8890.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 21:48:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows.
> Another thought about the batch sizes...
I did some profiling, and I think the answer is that it depends on the network between the client and the server. Ran a few experiments comparing small vs. large fetch sizes:

* JDBC client running on the same host as a the minicluster - didn't see much perf change
* JDBC client running on a different EC2 machine as the minicluster (both in the same region) - depends on the # of rows and # of columns, but makes up to a 20% for a full table scan of TPCH 'orders'.
* JDBC client running on my laptop against a minicluster running on an EC2 host - makes a huge difference

So I think it really depends on how the client is deployed in relation to the Impala coordinator. I would guess that most clients (e.g. BI tools) would at least run in the same datacenter (or if on AWS same region), but maybe not the same host.

However, I don't think it hurts performance to increase the default fetch size. The only thing that would be an issue, as you mentioned, would be if the client was fetching faster than Impala was producing rows (maybe that can happen in highly selective scans). As you mentioned below, that might be solve-able by IMPALA-7312.

According to https://www.simba.com/products/Impala/doc/v2/ODBC_InstallGuide/win/content/odbc/options/rowsfetchedperblock.htm "testing has shown that performance gains are marginal beyond the default value of 10000 rows" - so maybe around 10 RowBatches is the sweet spot.

So I think setting the default to 10x BATCH_SIZE would be reasonable?



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Aug 2019 00:09:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4371/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 22:24:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

LGTM, might be able to simplify further so just wanted to mention that.

http://gerrit.cloudera.org:8080/#/c/14129/4/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/4/be/src/exec/buffered-plan-root-sink.h@145
PS4, Line 145:   bool current_batch_empty_ = true;
Is it sufficient to check that current_batch_row_ == current_batch_->num_rows()? I think this might be redundant with that state.



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 00:32:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.h@147
PS3, Line 147: current_batch_ == nullptr
> !current_batch_;
Replaced with 'current_batch_empty_'.


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@114
PS3, Line 114: current_batch_ != nullptr)
> !current_batch_
Replaced this with a new instance variable 'current_batch_empty_' which tracks the state of 'current_batch_' (new variable is necessary to support re-using current_batch_ across reads from the SpillableRowBatchQueue.


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@138
PS3, Line 138: bool* eos
> We should initialise it.
Done


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@197
PS3, Line 197:           current_batch_.reset();
> It seems a bit unnecessary to create and destroy the batch repeatedly. It's
Changed it so 'current_batch_' is initialized in Open() and destroyed in Close(). Added 'current_batch_empty_' to track the state of 'current_batch_' rather than relying on nullptr checks.



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 21:47:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows.
> I think it might be better to return > 1024 rows at a time by default - e.g
Another thought about the batch sizes...

The reason for the 1024 row internal batch size was so that RowBatches would fit in L1 cache (or close to it), while also being large enough to amortise fixed overhead of query execution.

I think the calculation for returning rows to the client is a bit different - the network overheads for an RPC are pretty high so we care a lot about amortising that.



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 17:34:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 6: Code-Review+2

Carrying +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 03:41:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14129

to look at the new patch set (#3).

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/runtime/query-state.h
M tests/query_test/test_result_spooling.py
4 files changed, 219 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4364/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 01:50:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14129/4/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/4/be/src/exec/buffered-plan-root-sink.h@145
PS4, Line 145:   bool current_batch_empty_ = true;
> Is it sufficient to check that current_batch_row_ == current_batch_->num_ro
Yeah, good point. It is sufficient to check if 'current_batch_row_' equals / not equals 0. Updated the patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 03:41:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14129

to look at the new patch set (#2).

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M tests/query_test/test_result_spooling.py
3 files changed, 189 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/14129/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 3: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.h@147
PS3, Line 147: current_batch_ == nullptr
!current_batch_;


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@114
PS3, Line 114: current_batch_ != nullptr)
!current_batch_


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@138
PS3, Line 138: bool* eos
Is it assumed that the caller will initialize this to false or should we explicitly set it to false before entering the while loop below ?



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 16:35:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4874/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 15:33:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@85
PS2, Line 85: 1024
QueryState::DEFAULT_BATCH_SIZE ?


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@136
PS2, Line 136:  = nullptr;
nit: unnecessary for unique_ptr


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@144
PS2, Line 144:   /// 'current_batch' points to.
This needs to be called with 'lock_' held right ?


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@145
PS2, Line 145: bool IsQueueEmpty() {
bool IsQueueEmpty const {


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows.
> I did some profiling, and I think the answer is that it depends on the netw
Thanks for doing the experiment. 10x BATCH_SIZE seems like a reasonable starting point.


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@162
PS2, Line 162: if (current_batch_ == nullptr)
if (!current_batch_)

http://www.cplusplus.com/reference/memory/unique_ptr/operator%20bool/


http://gerrit.cloudera.org:8080/#/c/14129/2/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/14129/2/tests/query_test/test_result_spooling.py@264
PS2, Line 264: for _ in range(int(ceil(float(self._num_rows) / float(fetch_size)))):
Given the assert at line 268, why not just do:

   while rows_fetched < self._num_rows:



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Aug 2019 01:08:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@138
PS3, Line 138: bool* eos
> Is it assumed that the caller will initialize this to false or should we ex
We should initialise it.

I feel pretty strongly that callees should not assume output parameters are initialised, unless it's explicitly documented in the method comment. It's pretty easy to introduce subtle bugs this way.


http://gerrit.cloudera.org:8080/#/c/14129/3/be/src/exec/buffered-plan-root-sink.cc@197
PS3, Line 197:           current_batch_.reset();
It seems a bit unnecessary to create and destroy the batch repeatedly. It's going to cause a bunch of unnecessary malloc()/free() calls. That might not be a problem, but we could just create it in Open() and reuse it until Close().



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 20:24:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

I wasn't sure what the ideal behavior should be w.r.t. the fetch size behavior. So open to suggestions if we want to change the behavior described in the commit message.

Looking through the JDBC spec the fetch size is just a "hint":

"Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero."

https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#setFetchSize-int-


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 16:58:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4347/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 17:35:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows.
I think it might be better to return > 1024 rows at a time by default - e.g. if you consider a client fetching a largish result set over a connection with high latency but reasonable bandwidth, then throughput is mostly going to be related to the # of RPCs. I haven't looked at the behaviour of different clients. Actually I think generally if the server is running faster than the client (very common), returning larger batches is good.

There's a trade-off between minimising round-trips and also responsiveness, but having a timeout on fetches might help if the client is running faster than the server, so that if the server is returning rows slower than the client can fetch them, you get some output back.

This actually depends a lot on the behaviour of clients - I don't know how many depend on the default fetch size.


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@152
PS2, Line 152:         rows_available_.Wait(l);
Don't need to solve in this PS, but I think we could solve IMPALA-7312 with a timed wait here



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 17:32:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4428/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Aug 2019 04:18:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@85
PS2, Line 85: 
> QueryState::DEFAULT_BATCH_SIZE ?
Done


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@136
PS2, Line 136: atch.
> nit: unnecessary for unique_ptr
Done


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@144
PS2, Line 144:   /// the logical queue of RowBatches and thus includes any RowBatch that
> This needs to be called with 'lock_' held right ?
Yeah, updated the code comments above.


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.h@145
PS2, Line 145: /// 'current_batch' p
> bool IsQueueEmpty const {
Done


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     num_results = min(num_results, MAX_FETCH_SIZE);
> Thanks for doing the experiment. 10x BATCH_SIZE seems like a reasonable sta
Done


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@152
PS2, Line 152:     while (!*eos && num_rows_read < num_rows_to_read) {
> Don't need to solve in this PS, but I think we could solve IMPALA-7312 with
Sounds good. Will work on IMPALA-7312, but in a different PS.


http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@162
PS2, Line 162:  eos and then return. The queu
> if (!current_batch_)
Done


http://gerrit.cloudera.org:8080/#/c/14129/2/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/14129/2/tests/query_test/test_result_spooling.py@264
PS2, Line 264: 
> Given the assert at line 268, why not just do:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Aug 2019 01:09:45 +0000
Gerrit-HasComments: Yes