You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Till Toenshoff (JIRA)" <ji...@apache.org> on 2016/02/05 02:45:39 UTC

[jira] [Comment Edited] (MESOS-4598) Logrotate ContainerLogger should not remove IP from environment.

    [ https://issues.apache.org/jira/browse/MESOS-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133466#comment-15133466 ] 

Till Toenshoff edited comment on MESOS-4598 at 2/5/16 1:44 AM:
---------------------------------------------------------------

Seems we have two problems which are related and solved locally;

1. any executor forked by an agent should share the same {{LIBPROCESS_IP}} to prevent reverse dns failures.
see https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265

2. the loggers (pipe processing subprocesses) forked by a logger module should not have the same {{LIBPROCESS_PORT}} to prevent bind failures.
see your patch

So the problem here is that initially, the logger fixed the problem of bind-errors with its subprocesses by exec'ing them with completely unset {{LIBPROCESS_}} vars. The above fix then keeps the IP as per issue #1. 
Currently, I believe that #1 and #2 are actually true for ANY libprocess parent and child process.

While the above RR might be a fine workaround, I am not sure that we should regard it as a proper fix.

The problem here is a rather central one I believe, not specific to the logger at all. 

Any libprocess os-process forked by a libprocess os-process does run into these issues. We should consider a central solution.

There seem options like;

- removing {{LIBPROCESS_PORT}} from the environment of a libprocess process once it gathered that value;
insert at https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L872

- inheriting {{LIBPROCESS_IP}} by default when forking via libprocess's subprocess

Are my assumptions wrong or any opinions and suggestions?


was (Author: tillt):
Seems we have two problems which are related and solved locally;

1. any executor forked by an agent should share the same {{LIBPROCESS_IP}} to prevent reverse dns failures.
see https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265

2. the logger forked by an agent should not have the same {{LIBPROCESS_PORT}} to prevent bind failures.
see your patch

So the problem here is that initially, the logger fixed the problem of bind-errors when starting with completely unset LIBPROCESS_ vars while exec'ing. The above fix then keeps the IP as per issue #1. 
Currently, I believe that #1 and #2 are actually true for ANY libprocess parent and child process.

While the above RR might be a fine workaround, I am not sure that we should regard it as a proper fix.

The problem here is a rather central one I believe, not specific to the logger at all. 

Any libprocess os-process forked by a libprocess os-process does run into these issues. We should consider a central solution.

There seem options like;

- removing LIBPROCESS_PORT from the environment of a libprocess process once it gathered that value;
insert at https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L872

- inheriting LIBPROCESS_IP by default when forking via libprocess's subprocess

Are my assumptions wrong or any opinions and suggestions?

> Logrotate ContainerLogger should not remove IP from environment.
> ----------------------------------------------------------------
>
>                 Key: MESOS-4598
>                 URL: https://issues.apache.org/jira/browse/MESOS-4598
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.27.0
>            Reporter: Joseph Wu
>            Assignee: Joseph Wu
>              Labels: mesosphere
>
> The {{LogrotateContainerLogger}} starts libprocess-using subprocesses.  Libprocess initialization will attempt to resolve the IP from the hostname.  If a DNS service is not available, this step will fail, which terminates the logger subprocess prematurely.
> Since the logger subprocesses live on the agent, they should use the same {{LIBPROCESS_IP}} supplied to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)