You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/08/29 19:43:00 UTC

[jira] [Commented] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes

    [ https://issues.apache.org/jira/browse/IMPALA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918912#comment-16918912 ] 

ASF subversion and git services commented on IMPALA-8819:
---------------------------------------------------------

Commit 6308915a66d837b5545b601ef7f97caa5703c30f in impala's branch refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6308915 ]

IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes

Adds support for non-default fetch sizes when result spooling is enabled
(the default is to return BATCH_SIZE rows for each fetch request). When
result spooling is disabled, Impala can only return up to BATCH_SIZE
rows because it only buffers a single RowBatch at a time. When result
spooling is enabled, each fetch request returns exactly the number of
rows requested assuming there are that many rows left in the result set.
There is also an upper limit on the fetch size to prevent the resulting
QueryResultSet from getting too big.

Unlike the behavior when result spooling is disabled, fetches do not
break on RowBatch boundaries. For example, when result spooling is
disabled, if the fetch size is 10 and the batch size is 15, the second
fetch will return 5 rows. However, when result spooling is enabled the
second fetch will return 10 rows (assuming there is another RowBatch to
read).

Testing:
* Ran core tests
* Added new tests to test_result_spooling.py
* Added new tests to buffered-tuple-stream-test to validate writing to a
BufferedTupleStream before releasing row batches with 'attach_on_read'
set to true.

Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Reviewed-on: http://gerrit.cloudera.org:8080/14129
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> BufferedPlanRootSink should handle non-default fetch sizes
> ----------------------------------------------------------
>
>                 Key: IMPALA-8819
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8819
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The issue is that when reading from a {{RowBatch}} from the queue, the batch might contain more rows than the number requested by the client. So the {{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} and remember the index of the rows it read. Furthermore, {{num_results}} in {{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the query results cache in {{ClientRequestState}} has a cache hit (only happens if the client cursor is reset).
> Another issue is that the {{BufferedPlanRootSink}} can only read up to a single {{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} is specified, only {{BATCH_SIZE}} rows will be written to the given {{QueryResultSet}}. This is consistent with the legacy behavior of {{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because that means clients can only read {{BATCH_SIZE}} rows at a time. A higher fetch size would potentially reduce the number of round-trips necessary between the client and the coordinator, which could improve fetch performance (but only if the {{BlockingPlanRootSink}} is capable of filling all the requested rows).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org