You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/09/16 19:38:00 UTC
[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink

    [ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930806#comment-16930806 ] 

ASF subversion and git services commented on IMPALA-8825:
---------------------------------------------------------

Commit 34d132c513bfe5dc46478d9cb780a93200301b91 in impala's branch refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=34d132c ]

IMPALA-8825: Add additional counters to PlanRootSink

Adds the counters RowsSent and RowsSentRate to the PLAN_ROOT_SINK
section of the profile:

  PLAN_ROOT_SINK:
     - PeakMemoryUsage: 4.01 MB (4202496)
     - RowBatchGetWaitTime: 0.000ns
     - RowBatchSendWaitTime: 0.000ns
     - RowsSent: 10 (10)
     - RowsSentRate: 416.00 /sec

RowsSent tracks the number of rows sent to the PlanRootSink via
PlanRootSink::Send. RowsSentRate tracks the rate that rows are sent to
the PlanRootSink.

Adds the counters NumRowsFetched, NumRowsFetchedFromCache, and
RowMaterializationRate to the ImpalaServer section of the profile.

  ImpalaServer:
     - ClientFetchWaitTimer: 11.999ms
     - NumRowsFetched: 10 (10)
     - NumRowsFetchedFromCache: 10 (10)
     - RowMaterializationRate: 9.00 /sec
     - RowMaterializationTimer: 1s007ms

NumRowsFetched tracks the total number of rows fetched by the query,
but does not include rows fetched from the cache. NumRowsFetchedFromCache
tracks the total number of rows fetched from the query results cache.
RowMaterializationRate tracks the rate at which rows are materialized.
RowMaterializationTimer already existed and tracks how much time is
spent materializing rows.

Testing:
* Added tests to test_fetch_first.py and query_test/test_fetch.py
* Enabled some tests in test_fetch_first.py that were pending
the completion of IMPALA-8819
* Ran core tests

Change-Id: Id9e101e2f3e2bf8324e149c780d35825ceecc036
Reviewed-on: http://gerrit.cloudera.org:8080/14180
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Sahil Takiar <st...@cloudera.com>


> Add additional counters to PlanRootSink
> ---------------------------------------
>
>                 Key: IMPALA-8825
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8825
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
>     - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time spent by the client thread waiting for rows to become available and the time spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and {{RowBatchQueuePutWaitTime}} inside the scan nodes
>  ** The difference between these counters and the ones in {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and {{RowMaterializationTimer}}) should be documented
>  * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} section
>  * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} and the number of rows fetched (might need to be tracked in the {{ClientRequestState}})
>  ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty much the same, but for {{BufferedPlanRootSink}} this is more useful
>  ** Similar to {{RowsReturned}} in each exec node
>  * The rate at which rows are sent and fetched
>  ** Should be useful when attempting to debug perf of the fetching rows (e.g. if the send rate is much higher than the fetch rate, then maybe there is something wrong with the client)
>  ** Similar to {{RowsReturnedRate}} in each exec node
> Open to other suggestions for counters that folks think are useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org