You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Chun-Hung Hsiao (JIRA)" <ji...@apache.org> on 2018/01/05 04:34:00 UTC

[jira] [Created] (MESOS-8400) Retry logic for CSI calls when plugin crashes

Chun-Hung Hsiao created MESOS-8400:
--------------------------------------

             Summary: Retry logic for CSI calls when plugin crashes
                 Key: MESOS-8400
                 URL: https://issues.apache.org/jira/browse/MESOS-8400
             Project: Mesos
          Issue Type: Improvement
            Reporter: Chun-Hung Hsiao
            Assignee: Chun-Hung Hsiao


When a CSI plugin crashes, the container daemon in SLRP will reset its corresponding {{csi::Client}} service future. However, if there is a racy CSI call, the call may be issued before the future is reset, resulting in a failure for that CSI call. This could be avoided by introducing a retry logic. The following lists two possibilities:

1. If a GRPC channel can continue to work after its underlying domain socket is unbinded, removed and binded with the same filename (but different fd) again, then we can consider implementing the retry logic in `csi::Client`. The downside is that the racy call would go to the old future and all succeeding calls would go to the new future set up by the container daemon.

2. If the GRPC channel is bound to the domain socket fd, then we need to implement the retry logic in SLRP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)