You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/04/15 17:18:00 UTC

[jira] [Commented] (IMPALA-8411) A race during query cancellation by a client can make the client see it as still running

    [ https://issues.apache.org/jira/browse/IMPALA-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818194#comment-16818194 ] 

Tim Armstrong commented on IMPALA-8411:
---------------------------------------

I think there's two related issues here - there is the eos behaviour tracked by IMPALA-7561 and 

I think what [~lv] is suggesting is defining the invariant that the Cancel() RPC *always* transitions the query into a EXCEPTION or CANCELLED state before it returns. (We don't currently use the CANCELLED state, but we intend to in future -  IMPALA-1262)

I'm not sure why the transition is asynchronous. It does actually seem like a bug that's masked because the query transitions into EXCEPTION via another mechanism. I may be missing something - maybe [~bikram.sngh91] has an idea?

Here's the code that causes the behaviour in ClientRequestState::Cancel()
{code}
    // If the query has reached a terminal state, no need to update the state.
    bool already_done = eos_ || operation_state_ == TOperationState::ERROR_STATE;
    if (!already_done && cause != NULL) {
      DCHECK(!cause->ok());
      discard_result(UpdateQueryStatus(*cause));
      query_events_->MarkEvent("Cancelled");
      DCHECK_EQ(operation_state_, TOperationState::ERROR_STATE);
    }
{code}

The method comment needs updating too:
{code}
  /// Cancels the child queries and the coordinator with the given cause.
  /// If cause is NULL, it assume this was deliberately cancelled by the user while in
  /// FINISHED operation state. Otherwise, sets state to ERROR_STATE (TODO: IMPALA-1262:
  /// use CANCELED_STATE). Does nothing if the query has reached EOS or already cancelled.
  ///
  /// Only returns an error if 'check_inflight' is true and the query is not yet
  /// in-flight. Otherwise, proceed and return Status::OK() even if the query isn't
  /// in-flight (for cleaning up after an error on the query issuing path).
  Status Cancel(bool check_inflight, const Status* cause) WARN_UNUSED_RESULT;
{code}


> A race during query cancellation by a client can make the client see it as still running
> ----------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8411
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8411
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Lars Volker
>            Priority: Major
>              Labels: query-lifecycle
>
> I observed that under load the check in https://gerrit.cloudera.org/#/c/12530/8/tests/query_test/test_cancellation.py@229 would sometimes fail. After some poking it appears that the client's cancellation request does not update the operation state (because "cause", the third parameter to Cancelnternal, defaults to nullptr). Then the client disconnects and the connection close triggers another call to CancelInternal, this time with cause = "Session closed". This is also what shows up in the profile. If the server is under load, handling the closing connection can take long enough so that a concurrent request for the state still returns RUNNING. The aim of this fix was to make CancelOperation synchronous, i.e. when it returns the subsequent call to GetOperationStatus would not return RUNNING.
> The comment on ClientRequestState::Cancel() suggests that this is omitted when handling client cancellation because we expect that to happen regularly for finished queries. In that case however, eos_ would be set and we wouldn't update the state either, no?
> /// Cancels the child queries and the coordinator with the given cause.
> /// If cause is NULL, it assume this was deliberately cancelled by the user while in
> /// FINISHED operation state. Otherwise, sets state to ERROR_STATE (TODO: IMPALA-1262:
> /// use CANCELED_STATE). Does nothing if the query has reached EOS or already cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org