You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2019/08/12 21:32:00 UTC

[jira] [Comment Edited] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

    [ https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905594#comment-16905594 ] 

Michael Ho edited comment on IMPALA-8845 at 8/12/19 9:31 PM:
-------------------------------------------------------------

{quote} I haven't traced the issue exactly, but what I think is happening is that the MERGING-EXCHANGE operator in the coordinator fragment hits eos whenever it has received enough rows to reach the limit defined in the query, which could occur before the DATASTREAM SINK sends all the rows from the TopN / Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the Merging-Exchange has been closed already and the other fragment instance is stuck in an RPC call. Usually, when the receiving fragment is closed, it will be put into a "closed receiver cache". Incoming traffic will probe against this cache and notices that it's closed already and short-circuits the reply to the DataStreamSender. At which point, the DataStreamSender should skip issuing the RPC (see [code here| https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411 ] However, there is an expiration time (5 minutes) for entries in the cache so eventually expired entries will be removed. Traffic arriving for that receiver may be stuck for {{--datastream_sender_timeout_ms}} before returning with an error.

I probably need to look at the log to confirm whether the latter case is what's happening there. Please also see https://issues.apache.org/jira/browse/IMPALA-6818




was (Author: kwho):
{quote} I haven't traced the issue exactly, but what I think is happening is that the MERGING-EXCHANGE operator in the coordinator fragment hits eos whenever it has received enough rows to reach the limit defined in the query, which could occur before the DATASTREAM SINK sends all the rows from the TopN / Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the Merging-Exchange has been closed already and the other fragment instance is stuck in an RPC call. Usually, when the receiving fragment is closed, it will be put into a "closed receiver cache". Incoming traffic will probe against this cache and notices that it's closed already and short-circuits the reply to the DataStreamSender. At which point, the DataStreamSender should skip issuing the RPC (see [code here| https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411 ] However, there is an expiration time (5 minutes) for entries in the cache so eventually expired entries will be removed. Traffic arriving for that receiver may be stuck for {{--datastream_sender_timeout_ms}} before returning with an error.

That said, if the DataStreamSender manages to 



> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-8845
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8845
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all non-coordinator fragments to shutdown. In certain setups, TopN queries ({{select * from [table] order by [col] limit [limit]}}) where all results are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan Node fragment ends up blocking waiting for a response to a {{TransmitData()}} RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} whenever it has received enough rows to reach the limit defined in the query, which could occur before the {{DATASTREAM SINK}} sends all the rows from the TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as eagerly as possible. Moving the close call to before the call to {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it shuts down and releases all {{ExecNode}} resources as soon as it can. When result spooling is enabled, this is particularly important because {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org