You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by ab...@apache.org on 2020/05/07 09:57:52 UTC

[mesos] branch master updated (8682b5d -> 515b239)

This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


    from 8682b5d  Added MESOS-10118 to the 1.9.1 CHANGELOG.
     new 63ca5c1  Logged connection error message before shutting down the executor.
     new 515b239  Changed permissions for domain sockets to allow non-root executors.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/common/domain_sockets.hpp | 2 +-
 src/executor/executor.cpp     | 9 ++++++---
 2 files changed, 7 insertions(+), 4 deletions(-)


[mesos] 02/02: Changed permissions for domain sockets to allow non-root executors.

Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 515b239983e51dd06c8b5c347b4b739644113f8f
Author: Andrei Budnik <ab...@apache.org>
AuthorDate: Wed May 6 19:48:38 2020 +0200

    Changed permissions for domain sockets to allow non-root executors.
    
    Previously, the default permissions for domain sockets allowed
    r/w access only for the file's user, so an executor launched under
    a non-privileged user could not open the agent's socket. This patch
    adds r/w permissions for the group and other users to address
    the access problem.
    
    Review: https://reviews.apache.org/r/72478
---
 src/common/domain_sockets.hpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/common/domain_sockets.hpp b/src/common/domain_sockets.hpp
index 6d2b0ab..630ea93 100644
--- a/src/common/domain_sockets.hpp
+++ b/src/common/domain_sockets.hpp
@@ -33,7 +33,7 @@ namespace internal {
 namespace common {
 
 constexpr size_t DOMAIN_SOCKET_MAX_PATH_LENGTH = 108;
-constexpr int DOMAIN_SOCKET_DEFAULT_MODE = 0600;
+constexpr int DOMAIN_SOCKET_DEFAULT_MODE = 0666;
 
 
 inline Try<process::network::unix::Socket> createDomainSocket(


[mesos] 01/02: Logged connection error message before shutting down the executor.

Posted by ab...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 63ca5c159ef0272fc1d2add6ff73a88c8e16ea0c
Author: Andrei Budnik <ab...@apache.org>
AuthorDate: Wed May 6 13:13:51 2020 +0200

    Logged connection error message before shutting down the executor.
    
    Previously, if an executor failed to connect to the agent, it would
    silently shutdown itself without writing an error message to the log.
    After we added the support for the domain sockets, a set of potential
    failures during `connect` increased. In this patch, we logged
    the connection failures to help in debugging.
    
    Review: https://reviews.apache.org/r/72475
---
 src/executor/executor.cpp | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/executor/executor.cpp b/src/executor/executor.cpp
index 213f38e..974049c 100644
--- a/src/executor/executor.cpp
+++ b/src/executor/executor.cpp
@@ -561,11 +561,13 @@ protected:
       recoveryTimer = delay(
           recoveryTimeout.get(),
           self(),
-          &Self::_recoveryTimeout);
+          &Self::_recoveryTimeout,
+          failure);
 
       // Backoff and reconnect only if framework checkpointing is enabled.
       backoff();
     } else {
+      LOG(INFO) << "Disconnected from agent: " << failure << "; Shutting down";
       shutdown();
     }
   }
@@ -599,7 +601,7 @@ protected:
     return future;
   }
 
-  void _recoveryTimeout()
+  void _recoveryTimeout(const string& failure)
   {
     // It's possible that a new connection was established since the timeout
     // fired and we were unable to cancel this timeout. If this occurs, don't
@@ -612,7 +614,8 @@ protected:
 
     CHECK_SOME(recoveryTimeout);
     LOG(INFO) << "Recovery timeout of " << recoveryTimeout.get()
-              << " exceeded; Shutting down";
+              << " exceeded following the first connection failure: " << failure
+              << "; Shutting down";
 
     shutdown();
   }