You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/09/16 19:38:00 UTC

[jira] [Commented] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes

    [ https://issues.apache.org/jira/browse/IMPALA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930807#comment-16930807 ] 

ASF subversion and git services commented on IMPALA-8819:
---------------------------------------------------------

Commit 34d132c513bfe5dc46478d9cb780a93200301b91 in impala's branch refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=34d132c ]

IMPALA-8825: Add additional counters to PlanRootSink

Adds the counters RowsSent and RowsSentRate to the PLAN_ROOT_SINK
section of the profile:

  PLAN_ROOT_SINK:
     - PeakMemoryUsage: 4.01 MB (4202496)
     - RowBatchGetWaitTime: 0.000ns
     - RowBatchSendWaitTime: 0.000ns
     - RowsSent: 10 (10)
     - RowsSentRate: 416.00 /sec

RowsSent tracks the number of rows sent to the PlanRootSink via
PlanRootSink::Send. RowsSentRate tracks the rate that rows are sent to
the PlanRootSink.

Adds the counters NumRowsFetched, NumRowsFetchedFromCache, and
RowMaterializationRate to the ImpalaServer section of the profile.

  ImpalaServer:
     - ClientFetchWaitTimer: 11.999ms
     - NumRowsFetched: 10 (10)
     - NumRowsFetchedFromCache: 10 (10)
     - RowMaterializationRate: 9.00 /sec
     - RowMaterializationTimer: 1s007ms

NumRowsFetched tracks the total number of rows fetched by the query,
but does not include rows fetched from the cache. NumRowsFetchedFromCache
tracks the total number of rows fetched from the query results cache.
RowMaterializationRate tracks the rate at which rows are materialized.
RowMaterializationTimer already existed and tracks how much time is
spent materializing rows.

Testing:
* Added tests to test_fetch_first.py and query_test/test_fetch.py
* Enabled some tests in test_fetch_first.py that were pending
the completion of IMPALA-8819
* Ran core tests

Change-Id: Id9e101e2f3e2bf8324e149c780d35825ceecc036
Reviewed-on: http://gerrit.cloudera.org:8080/14180
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Sahil Takiar <st...@cloudera.com>


> BufferedPlanRootSink should handle non-default fetch sizes
> ----------------------------------------------------------
>
>                 Key: IMPALA-8819
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8819
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The issue is that when reading from a {{RowBatch}} from the queue, the batch might contain more rows than the number requested by the client. So the {{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} and remember the index of the rows it read. Furthermore, {{num_results}} in {{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the query results cache in {{ClientRequestState}} has a cache hit (only happens if the client cursor is reset).
> Another issue is that the {{BufferedPlanRootSink}} can only read up to a single {{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} is specified, only {{BATCH_SIZE}} rows will be written to the given {{QueryResultSet}}. This is consistent with the legacy behavior of {{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because that means clients can only read {{BATCH_SIZE}} rows at a time. A higher fetch size would potentially reduce the number of round-trips necessary between the client and the coordinator, which could improve fetch performance (but only if the {{BlockingPlanRootSink}} is capable of filling all the requested rows).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org