You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2022/05/06 17:43:00 UTC

[jira] [Commented] (KUDU-3366) KRPC callback function not called when cancelling KRPC

    [ https://issues.apache.org/jira/browse/KUDU-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533006#comment-17533006 ] 

Wenzhe Zhou commented on KUDU-3366:
-----------------------------------

Here is code analysis in detail:

Impala coordinator calls RpcController::Cancel() to schedule a RPC cancellation task for reactor thread pool. When reactor thread executes the cancellation task with function ReactorThread::CancelOutboundCall(), the function calls Connection::CancelOutboundCall(), then calls OutboundCall::Cancel(). Connection::CancelOutboundCall() reset car->call as null pointer which will lead Connection::HandleOutboundCallTimeout() to skip calling OutboundCall::SetTimedOut(). OutboundCall::Cancel() will not call OutboundCall::SetCancelled() if the OutboundCall object is in SENDING state. OutboundCall::SetCancelled() will be called until OutboundCall:SetSent() is called when the state is transferred from SENDING to SENT. So if a RPC is cancelled, OutboundCall::SetTimedOut() will not be called for its OutboundCall object when the timeout is handled in Connection::HandleOutboundCallTimeout(), and OutboundCall::SetCancelled() will not be called until OutboundCall:SetSent() is called when OutboundCall object is in SENDING state.

OutboundCall:SetSent() is called by function CallTransferCallbacks::NotifyTransferFinished() if notification of transfer finishing is received after sending a RPC call on the wire.
Connection::ProcessOutboundTransfers() call OutboundCall::SetSending() to set OutboundCall's state as SENDING when starting transfer RPC. It then calls OutboundTransfer::SendBuffer() to send data through socket.
OutboundTransfer::SendBuffer() calls socket->Writev() to send data. If socket->Writev() return error, the SendBuffer() function will return error without calling CallTransferCallbacks::NotifyTransferFinished() so OutboundCall::SetSent() will not be called. This lead to OutboundCall::SetCancelled() is not called for the OutboundCall object.
Connection::ProcessOutboundTransfers() then calls ReactorThread::DestroyConnection() to destroy the connection. ReactorThread::DestroyConnection() calls Connection::Shutdown() to clear all outbound calls which have been sent and were awaiting a response. But for a RPC being cancelled, its car->call is already reset as null pointer so OutboundCall::SetFailed() will not be called for the OutboundCall object.

To summary, Connection::CancelOutboundCall() reset car->call as null pointer, which will lead Connection::HandleOutboundCallTimeout() to skip calling OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling OutboundCall::SetFailed(). socket->Writev() error causes OutboundCall::SetSent() not been called, hence OutboundCall::SetCancelled() not been called.
Since OutboundCall::SetFailed(), OutboundCall::SetCancelled() and OutboundCall::SetTimedOut() are not called for the OutboundCall object, the object cannot be transferred from SENDING state to a finished state, so that RPC callback function will not be called.

> KRPC callback function not called when cancelling KRPC
> ------------------------------------------------------
>
>                 Key: KUDU-3366
>                 URL: https://issues.apache.org/jira/browse/KUDU-3366
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Wenzhe Zhou
>            Priority: Major
>
> Impala ran into an issue which caused a thread hang when cancelling a query. Impala log messages shows that Impala coordinator called RpcController::Cancel() to cancel RPC, then waited RPC callback function to be called. But the KRPC callback function was not called. This caused the Impala thread wait forever. See Impala-11263.
> KRPC cancellation was implemented in KUDU-2065 with patch https://gerrit.cloudera.org/#/c/7455/. According to the comments of KUDU-2065, they decided not to do cancellation for outbound request in SENDING state since cancelling calls in SENDING state seems too complicated, and expect most calls to be drained quickly and outbound request will be transferred from SENDING to SENT.
> But reactor thread function ReactorThread::CancelOutboundCall() calls Connection::CancelOutboundCall() before calling OutboundCall::Cancel().  Connection::CancelOutboundCall() reset car->call as null pointer, this lead Connection::HandleOutboundCallTimeout() to skip calling OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling OutboundCall::SetFailed(). In case socket->Writev() fails while outbound request in SENDING state, CallTransferCallbacks::NotifyTransferFinished() will not be called, hence OutboundCall::SetSent() will not be called. This causes outbound request cannot be transferred from SENDING state to SENT state, hence KRPC callback function is not called in this corner case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)