You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Cong Wang <cw...@twopensource.com> on 2015/12/10 00:12:02 UTC

Review Request 41158: Turn off rx checksum offloading for veth in container

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/
-----------------------------------------------------------

Review request for mesos, Ian Downes and Jie Yu.


Repository: mesos


Description
-------

We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 

Here is what happens:

1) We receive a corrupt packet externally
2) The hardware driver is able to checksum it and notices it has a bad checksum
3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
4) This packet is moved to a veth interface because it is for a container
5) Both sides of the veth pair have RX checksum offloading by default
6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
7) Packet is moved into the container TCP/IP stack
8) TCP layer is not going to checksum it since it is not necessary
9) The packet gets delivered to application layer


Diffs
-----

  src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 

Diff: https://reviews.apache.org/r/41158/diff/


Testing
-------

1) Turn rx checksum off manually and the bug is gone
2) Test this patch and verify rx checksum is turned off as expected.
3) I don't see any noticable performance issue after turning this off


Thanks,

Cong Wang


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109641
-----------------------------------------------------------

Ship it!


Ship It!

- Jie Yu


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109627
-----------------------------------------------------------

Ship it!



src/slave/containerizer/mesos/isolators/network/port_mapping.cpp (line 3572)
<https://reviews.apache.org/r/41158/#comment169231>

    s/kernel/the kernel/



src/slave/containerizer/mesos/isolators/network/port_mapping.cpp (line 3574)
<https://reviews.apache.org/r/41158/#comment169230>

    s/could/to


- Ian Downes


On Dec. 9, 2015, 3:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 3:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109719
-----------------------------------------------------------


Patch looks great!

Reviews applied: [41158]

Passed command: export OS=ubuntu:14.04;export CONFIGURATION="--verbose";export COMPILER=gcc; ./support/docker_build.sh

- Mesos ReviewBot


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Cong Wang <cw...@twopensource.com>.

> On Dec. 11, 2015, 1:16 a.m., David Robinson wrote:
> > src/slave/containerizer/mesos/isolators/network/port_mapping.cpp, lines 1097-1098
> > <https://reviews.apache.org/r/41158/diff/1/?file=1157658#file1157658line1097>
> >
> >     This writes to stderr, which can end up in the logs.
> >     
> >     [root@server ~]# ethtool --version 1> /dev/null
> >     ethtool version 6
> >     Usage:
> >     ethtool DEVNAME	Display standard information about device
> >     
> >     
> >     Log snippet:
> >     
> >     I1211 01:05:13.215730 10885 main.cpp:190] Build: 2015-12-10 22:54:33 by mockbuild
> >     I1211 01:05:13.215859 10885 main.cpp:192] Version: 0.26.0-tw5
> >     I1211 01:05:13.215996 10885 containerizer.cpp:142] Using isolation: cgroups/cpu,cgroups/mem,network/port_mapping,posix/disk,cgroups/perf_event,filesystem/posix
> >     ethtool version 6
> >     Usage:
> >     ethtool DEVNAME Display standard information about device
> >     I1211 01:05:13.251729 10885 port_mapping.cpp:1255] Using eth0 as the public interface
> >     I1211 01:05:13.252707 10885 port_mapping.cpp:1280] Using lo as the loopback interface

Ah, yet another difference on Fedora...

$ ethtool --version | grep v
ethtool version 3.8

Let me fix it.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109903
-----------------------------------------------------------


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by David Robinson <dr...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109903
-----------------------------------------------------------



src/slave/containerizer/mesos/isolators/network/port_mapping.cpp (lines 1097 - 1098)
<https://reviews.apache.org/r/41158/#comment169582>

    This writes to stderr, which can end up in the logs.
    
    [root@server ~]# ethtool --version 1> /dev/null
    ethtool version 6
    Usage:
    ethtool DEVNAME	Display standard information about device
    
    Log snippet:
    
    I1211 01:05:13.215730 10885 main.cpp:190] Build: 2015-12-10 22:54:33 by mockbuild
    I1211 01:05:13.215859 10885 main.cpp:192] Version: 0.26.0-tw5
    I1211 01:05:13.215996 10885 containerizer.cpp:142] Using isolation: cgroups/cpu,cgroups/mem,network/port_mapping,posix/disk,cgroups/perf_event,filesystem/posix
    ethtool version 6
    Usage:
    ethtool DEVNAME Display standard information about device
    I1211 01:05:13.251729 10885 port_mapping.cpp:1255] Using eth0 as the public interface
    I1211 01:05:13.252707 10885 port_mapping.cpp:1280] Using lo as the loopback interface


- David Robinson


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Cong Wang <cw...@twopensource.com>.

> On Dec. 10, 2015, 12:27 a.m., Jie Yu wrote:
> > This is more like a question: do we need to turn off tx side as well?

This is not needed, because 1) the physical interface can finally checksum it after it moves out of the container to the gateway interface; 2) if the physcial interface is not able to do it, the kernel can do it right before delivering it.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109644
-----------------------------------------------------------


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/#review109644
-----------------------------------------------------------


This is more like a question: do we need to turn off tx side as well?

- Jie Yu


On Dec. 9, 2015, 11:15 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41158/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 11:15 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-4105
>     https://issues.apache.org/jira/browse/MESOS-4105
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 
> 
> Here is what happens:
> 
> 1) We receive a corrupt packet externally
> 2) The hardware driver is able to checksum it and notices it has a bad checksum
> 3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
> 4) This packet is moved to a veth interface because it is for a container
> 5) Both sides of the veth pair have RX checksum offloading by default
> 6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
> 7) Packet is moved into the container TCP/IP stack
> 8) TCP layer is not going to checksum it since it is not necessary
> 9) The packet gets delivered to application layer
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 
> 
> Diff: https://reviews.apache.org/r/41158/diff/
> 
> 
> Testing
> -------
> 
> 1) Turn rx checksum off manually and the bug is gone
> 2) Test this patch and verify rx checksum is turned off as expected.
> 3) I don't see any noticable performance issue after turning this off
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 41158: Turn off rx checksum offloading for veth in container

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41158/
-----------------------------------------------------------

(Updated Dec. 9, 2015, 3:15 p.m.)


Review request for mesos, Ian Downes and Jie Yu.


Bugs: MESOS-4105
    https://issues.apache.org/jira/browse/MESOS-4105


Repository: mesos


Description
-------

We noticed that in some cases we delivered some corrupt packets to applications running in our containers. This is clearly wrong. 

Here is what happens:

1) We receive a corrupt packet externally
2) The hardware driver is able to checksum it and notices it has a bad checksum
3) The driver delivers this packet anyway to wait for TCP layer to checksum it again and then drop it
4) This packet is moved to a veth interface because it is for a container
5) Both sides of the veth pair have RX checksum offloading by default
6) The veth_xmit() marks the packet's checksum as UNNECESSARY since its peer device has rx checksum offloading
7) Packet is moved into the container TCP/IP stack
8) TCP layer is not going to checksum it since it is not necessary
9) The packet gets delivered to application layer


Diffs
-----

  src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 89bb36f936417de8169a2442729fbd7c9d60acb7 

Diff: https://reviews.apache.org/r/41158/diff/


Testing
-------

1) Turn rx checksum off manually and the bug is gone
2) Test this patch and verify rx checksum is turned off as expected.
3) I don't see any noticable performance issue after turning this off


Thanks,

Cong Wang