You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/09/03 21:32:00 UTC

[jira] [Commented] (IMPALA-7551) Inaccurate timeline for "Rows Available"

    [ https://issues.apache.org/jira/browse/IMPALA-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921735#comment-16921735 ] 

Sahil Takiar commented on IMPALA-7551:
--------------------------------------

I started to dig deeper into this, and a few things to note about this issue:

For any query with an exchange node in the Coordinator fragment, this shouldn't be a major issue. IMPALA-924 changed the {{ExchangeNode}} so that {{Open}} blocks until rows are actually available. IMPALA-924 actually added some test coverage for this in {{test_rows_availability.py}} as well, but it seems all the queries run with {{num_nodes=0}} so there is no coverage when {{num_nodes=1}} (which I think is where this issue was seen), which makes sense since there is no exchange node when {{num_nodes=1}}.

Another thing to note in the context of result spooling:
 * The Coordinator emits a "First Batch Sent" event after {{PlanRootSink::Send}} is called
 ** When result spooling is disabled, this is the time that the client actually fetched the first batch
 ** When result spooling is enabled, this is the time that the first batch was spooled

I'm not sure if there is any documentation about query timeline events, but I think we should add some, especially if the meaning changes depending on the Impala configuration.

> Inaccurate timeline for "Rows Available" 
> -----------------------------------------
>
>                 Key: IMPALA-7551
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7551
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Pooja Nilangekar
>            Assignee: Sahil Takiar
>            Priority: Major
>              Labels: observability, query-lifecycle, ramp-up
>
> While debugging IMPALA-6932, it was noticed that the "Rows Available" metric in the query profile was a short duration (~ 1 second) for a long running limit 1 query (~ 1 hour).
> Currently, it tracks when Open() from the top-most node in the plan returns, not when the first row is actually produced. This can be misleading. A better timeline would be to return true when the first non-empty batch was added to the PlanRootSink. 
> We should consider changing the definition of the FINISHED state accordingly as well, so that we don't transition to FINISHED until a row is actually available to fetch immediately.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org