You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ian Downes (JIRA)" <ji...@apache.org> on 2015/12/10 00:15:11 UTC

[jira] [Created] (MESOS-4105) Network isolator causes corrupt packets to reach application

Ian Downes created MESOS-4105:
---------------------------------

             Summary: Network isolator causes corrupt packets to reach application
                 Key: MESOS-4105
                 URL: https://issues.apache.org/jira/browse/MESOS-4105
             Project: Mesos
          Issue Type: Bug
          Components: isolation
    Affects Versions: 0.25.0, 0.24.1, 0.24.0, 0.23.1, 0.23.0, 0.22.2, 0.22.1, 0.22.0, 0.21.2, 0.21.1, 0.21.0, 0.20.1, 0.20.0
            Reporter: Ian Downes
            Priority: Critical


The optional network isolator (network/port_mapping) will let corrupt TCP packets reach the application. This could lead to data corruption in applications. Normally these packets are dropped immediately by the network stack and do not reach the application. 

Networks may have a very low level of corrupt packets (a few per million) or, may have very high levels if there are hardware or software errors in networking equipment.

Investigation is ongoing but an initial hypothesis is being tested:
1) The checksum error is correctly detected by the host interface.
2) The Mesos tc filters used by the network isolator redirect the packet to the virtual interface, even when a checksum error has occurred.
3) Either in copying to the veth device or passing across the veth pipe the checksum flag is cleared.
4) The veth inside the container does not verify the checksum, even though TCP RX checksum offloading is supposedly on. \[This is hypothesized to be acceptable normally because it's receiving packets over the virtual link where corruption should not occur\] 
5) The container network stack accepts the packet and delivers it to the application.

Disabling tcp rx cso on the container veth appears to fix this: it forces the container network stack to compute the packet checksums (in software) whereby it detects the checksum errors and does not deliver the packet to the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)