You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2019/01/11 23:21:17 UTC

Review Request 69727: Compared the device number of namespace handle instead of /proc.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69727/
-----------------------------------------------------------

Review request for mesos, Deepak Goel and Gilbert Song.


Bugs: MESOS-9518
    https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
-------

In recent versions of kernels, the device number of '/proc/<pid>/ns/net'
is different than that of '/proc'. It shows up as "nsfs" instead of
"proc" like the old kernels. For instance:

Newer kernel:

```
$ uname -nr
ubuntu-xenial 4.4.0-83-generic
$ stat -L -c %d /proc/self/ns/net
3
$ stat -L -c %d /proc
4
```

Older kernel:

```
$ uname -nr
core-dev 3.10.0-693.5.2.el7.x86_64
$ stat -L -c %d /proc/self/ns/net
3
$ stat -L -c %d /proc
3
```

As a result, we should compare the device number directly against the
namespace handle, instead of `/proc`.


Diffs
-----

  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp a1130a58553fb67bd8c212498b98978f116d7b0c 


Diff: https://reviews.apache.org/r/69727/diff/1/


Testing
-------

sudo make check


Thanks,

Jie Yu


Re: Review Request 69727: Compared the device number of namespace handle instead of /proc.

Posted by Deepak Goel <de...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69727/#review211921
-----------------------------------------------------------


Ship it!




Ship It!

- Deepak Goel


On Jan. 11, 2019, 11:21 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69727/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 11:21 p.m.)
> 
> 
> Review request for mesos, Deepak Goel and Gilbert Song.
> 
> 
> Bugs: MESOS-9518
>     https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In recent versions of kernels, the device number of '/proc/<pid>/ns/net'
> is different than that of '/proc'. It shows up as "nsfs" instead of
> "proc" like the old kernels. For instance:
> 
> Newer kernel:
> 
> ```
> $ uname -nr
> ubuntu-xenial 4.4.0-83-generic
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 4
> ```
> 
> Older kernel:
> 
> ```
> $ uname -nr
> core-dev 3.10.0-693.5.2.el7.x86_64
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 3
> ```
> 
> As a result, we should compare the device number directly against the
> namespace handle, instead of `/proc`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp a1130a58553fb67bd8c212498b98978f116d7b0c 
> 
> 
> Diff: https://reviews.apache.org/r/69727/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 69727: Compared the device number of namespace handle instead of /proc.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69727/#review211918
-----------------------------------------------------------


Ship it!




Ship It!

- Gilbert Song


On Jan. 11, 2019, 3:21 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69727/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 3:21 p.m.)
> 
> 
> Review request for mesos, Deepak Goel and Gilbert Song.
> 
> 
> Bugs: MESOS-9518
>     https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In recent versions of kernels, the device number of '/proc/<pid>/ns/net'
> is different than that of '/proc'. It shows up as "nsfs" instead of
> "proc" like the old kernels. For instance:
> 
> Newer kernel:
> 
> ```
> $ uname -nr
> ubuntu-xenial 4.4.0-83-generic
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 4
> ```
> 
> Older kernel:
> 
> ```
> $ uname -nr
> core-dev 3.10.0-693.5.2.el7.x86_64
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 3
> ```
> 
> As a result, we should compare the device number directly against the
> namespace handle, instead of `/proc`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp a1130a58553fb67bd8c212498b98978f116d7b0c 
> 
> 
> Diff: https://reviews.apache.org/r/69727/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 69727: Compared the device number of namespace handle instead of /proc.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69727/#review211919
-----------------------------------------------------------



Very interesting. I did not find a kernel doc mentioned this semantic change.

- Gilbert Song


On Jan. 11, 2019, 3:21 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69727/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 3:21 p.m.)
> 
> 
> Review request for mesos, Deepak Goel and Gilbert Song.
> 
> 
> Bugs: MESOS-9518
>     https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In recent versions of kernels, the device number of '/proc/<pid>/ns/net'
> is different than that of '/proc'. It shows up as "nsfs" instead of
> "proc" like the old kernels. For instance:
> 
> Newer kernel:
> 
> ```
> $ uname -nr
> ubuntu-xenial 4.4.0-83-generic
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 4
> ```
> 
> Older kernel:
> 
> ```
> $ uname -nr
> core-dev 3.10.0-693.5.2.el7.x86_64
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 3
> ```
> 
> As a result, we should compare the device number directly against the
> namespace handle, instead of `/proc`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp a1130a58553fb67bd8c212498b98978f116d7b0c 
> 
> 
> Diff: https://reviews.apache.org/r/69727/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 69727: Compared the device number of namespace handle instead of /proc.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69727/#review211925
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69727']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2766/mesos-review-69727

Relevant logs:

- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2766/mesos-review-69727/logs/mesos-tests.log):

```
I0112 00:35:51.245151 12600 exec.cpp:518] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent c93e353e-3adc-4ead-ba70-5f6617088bd3-S0
I0112 00:45:52.213604 12440 executor.cpp:1007] Command exited with status 0 (pid: 3328)
I0112 00:45:53.217759 12308 process.cpp:927] Stopped the socket accept loop
W0112 00:45:53.218771 12308 process.cpp:1890] Failed to send 'mesos.internal.StatusUpdateMessage' to '192.10.1.4:57363', connect: IO failed with error code: The remote computer refused the network connection.

I0112 00:50:51.212862 13984 executor.cpp:568] Recovery timeout of 15mins exceeded; Shutting down
I0112 00:50:51.213852 13828 default_executor.cpp:204] Received SHUTDOWN event
I0112 00:50:51.213852 13828 default_executor.cpp:1025] Shutting down
I0112 00:50:51.213852 13828 default_executor.cpp:1081] Terminating after 1secs
I0112 00:23:28.699481  5480 exec.cpp:162] Version: 1.8.0
I0112 00:23:28.730485  8620 exec.cpp:236] Executor registered on agent 5996a15c-a50f-4ef6-ac17-8544a704fe5e-S0
I0112 00:23:28.734446 12836 executor.cpp:184] Received SUBSCRIBED event
I0112 00:23:28.738476 12836 executor.cpp:188] Subscribed executor on windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net
W0112 00:35:51.232164  8288 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=416 to peer '192.10.1.4:57824': IO failed with error code: The specified network name is no longer available.

W0112 00:35:51.233156  8288 process.cpp:1423] Failed to recv on socket WindowsFD::Type::SOCKET=460 to peer '192.10.1.4:57363': IO failed with error code: The specified network name is no longer available.

I0112 00:35:51.234169 10660 exec.cpp:518] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent 5996a15c-a50f-4ef6-ac17-8544a704fe5e-S0
I0112 00:50:51.236862  8620 exec.cpp:499] Recovery timeout of 15mI0112 00:50:51.236862  8740 executor.cpp:568] Recovery timeout of 15mins exceeded; Shutting down
I0112 00:50:51.237905 10432 default_executor.cpp:204] Received SHUTDOWN event
I0112 00:50:51.237905 10432 default_executor.cpp:1025] Shutting down
I0112 00:50:51.237905 10432 default_executor.cpp:1081] Terminating after 1secs
ins exceeded; Shutting down
I0112 00:50:51.236862  8620 exec.cpp:445] Executor asked to shutdown
I0112 00:50:51.237905 12836 executor.cpp:184] Received SHUTDOWN event
I0112 00:50:51.237905 12836 executor.cpp:809] Shutting down
I0112 00:50:51.238862  8288 process.cpp:927] Stopped the socket accept loop
I0112 00:50:52.216568 14044 process.cpp:927] Stopped the socket accept loop
I0112 00:50:52.241585 11348 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Jan. 11, 2019, 11:21 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69727/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2019, 11:21 p.m.)
> 
> 
> Review request for mesos, Deepak Goel and Gilbert Song.
> 
> 
> Bugs: MESOS-9518
>     https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In recent versions of kernels, the device number of '/proc/<pid>/ns/net'
> is different than that of '/proc'. It shows up as "nsfs" instead of
> "proc" like the old kernels. For instance:
> 
> Newer kernel:
> 
> ```
> $ uname -nr
> ubuntu-xenial 4.4.0-83-generic
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 4
> ```
> 
> Older kernel:
> 
> ```
> $ uname -nr
> core-dev 3.10.0-693.5.2.el7.x86_64
> $ stat -L -c %d /proc/self/ns/net
> 3
> $ stat -L -c %d /proc
> 3
> ```
> 
> As a result, we should compare the device number directly against the
> namespace handle, instead of `/proc`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp a1130a58553fb67bd8c212498b98978f116d7b0c 
> 
> 
> Diff: https://reviews.apache.org/r/69727/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>