You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Cong Wang <cw...@twopensource.com> on 2015/03/09 19:57:52 UTC

Re: Review Request 31505: (5/5) Add flow classifiers for fq_codel on egress

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated March 9, 2015, 6:57 p.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Address review comments.


Summary (updated)
-----------------

(5/5) Add flow classifiers for fq_codel on egress


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a 
  src/slave/containerizer/isolators/network/port_mapping.cpp 7a38735 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review76024
-----------------------------------------------------------


Patch looks great!

Reviews applied: [31502, 31503, 31504, 31505]

All tests passed.

- Mesos ReviewBot


On March 10, 2015, 11:28 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated March 10, 2015, 11:28 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a83fe970546b3eddcb29b413ee2a7888b9 
>   src/slave/containerizer/isolators/network/port_mapping.cpp aebc528b12e4afdf49367aa01926c05fa37da0e0 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review83334
-----------------------------------------------------------


Patch looks great!

Reviews applied: [31502, 31503, 31504, 31505]

All tests passed.

- Mesos ReviewBot


On May 12, 2015, 12:14 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 12, 2015, 12:14 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review83982
-----------------------------------------------------------


Patch looks great!

Reviews applied: [31502, 31503, 31504, 31505]

All tests passed.

- Mesos ReviewBot


On May 15, 2015, 9:32 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 15, 2015, 9:32 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84049
-----------------------------------------------------------


Patch looks great!

Reviews applied: [31502, 31503, 31504, 31505]

All tests passed.

- Mesos ReviewBot


On May 17, 2015, 12:27 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 17, 2015, 12:27 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.
> 
> Cong Wang wrote:
>     1) That depends on what is our fallback solution in case of bugs, potentially every line of new code needs a flag if not reverting to an old binary. :) So generally speaking, what is our solution when we hit some bug during deloyment? If it is reverting to the old version, then adding a flag just for deloyment is useless. Or if it is turning off some flag, then as I said every new code would potentially need a flag and some new code can't have a flag, that means this way doesn't scale.
>     
>     2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create() which should cover these new code too, no?
> 
> Jie Yu wrote:
>     For 1), if the fallback solution is to just to restart the slave, then we usually don't require a new flag. However, in this case, we will leave filters on host eth0, and a simple restart of the slave won't clean up those filters.
>     
>     For 2), Yes, we exercise the code of creating those filters, but do we have ASSERT or EXPECT to check if those filters are properly setup? AFAICT, even if the filters are not properly setup, the existing tests will still finish, no?
> 
> Cong Wang wrote:
>     1) We only leave the default filters, which is used to catch all non-matched packets (essentially a kenrel issue as mentioned in comments), so nothing harms here.
>     
>     
>     2) Test cases like ContainerICMPExternal check the return value of ::create() and ::isolate(), so if any of these filters fails to setup, then they should fail, right? I mean:
>     
>     
>       // Isolate the forked child.
>       AWAIT_READY(isolator.get()->isolate(containerId, pid.get()));
>     
>     Even if not, we should fix these test cases rather adding a new one? Since they are supposed to catch all tc setup failures?

1) What if the isolator creates some filters for the container and we decided to rollback after that due to bug. Do we leave other filters as well?

2) How do you capture the case where two filters associated the same container are assigned to different flows in host eth0 (I remembered that was a bug in the code previously) using the existing tests?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.
> 
> Cong Wang wrote:
>     1) That depends on what is our fallback solution in case of bugs, potentially every line of new code needs a flag if not reverting to an old binary. :) So generally speaking, what is our solution when we hit some bug during deloyment? If it is reverting to the old version, then adding a flag just for deloyment is useless. Or if it is turning off some flag, then as I said every new code would potentially need a flag and some new code can't have a flag, that means this way doesn't scale.
>     
>     2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create() which should cover these new code too, no?
> 
> Jie Yu wrote:
>     For 1), if the fallback solution is to just to restart the slave, then we usually don't require a new flag. However, in this case, we will leave filters on host eth0, and a simple restart of the slave won't clean up those filters.
>     
>     For 2), Yes, we exercise the code of creating those filters, but do we have ASSERT or EXPECT to check if those filters are properly setup? AFAICT, even if the filters are not properly setup, the existing tests will still finish, no?
> 
> Cong Wang wrote:
>     1) We only leave the default filters, which is used to catch all non-matched packets (essentially a kenrel issue as mentioned in comments), so nothing harms here.
>     
>     
>     2) Test cases like ContainerICMPExternal check the return value of ::create() and ::isolate(), so if any of these filters fails to setup, then they should fail, right? I mean:
>     
>     
>       // Isolate the forked child.
>       AWAIT_READY(isolator.get()->isolate(containerId, pid.get()));
>     
>     Even if not, we should fix these test cases rather adding a new one? Since they are supposed to catch all tc setup failures?
> 
> Jie Yu wrote:
>     1) What if the isolator creates some filters for the container and we decided to rollback after that due to bug. Do we leave other filters as well?
>     
>     2) How do you capture the case where two filters associated the same container are assigned to different flows in host eth0 (I remembered that was a bug in the code previously) using the existing tests?

Oh, you should mention the flag is just for recovery test, it is clearly not for deployment. :) But that means every new feature added to isolate() would have to get a new flag, otherwise not possible to do unit test for recovery with current code, perhaps we just need some internal flags for unit tests only.

Note, that also means we have to disable this feature at run-time and continue when kernel doesn't support fq_codel (currently we just fail).


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?

Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.
> 
> Cong Wang wrote:
>     1) That depends on what is our fallback solution in case of bugs, potentially every line of new code needs a flag if not reverting to an old binary. :) So generally speaking, what is our solution when we hit some bug during deloyment? If it is reverting to the old version, then adding a flag just for deloyment is useless. Or if it is turning off some flag, then as I said every new code would potentially need a flag and some new code can't have a flag, that means this way doesn't scale.
>     
>     2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create() which should cover these new code too, no?
> 
> Jie Yu wrote:
>     For 1), if the fallback solution is to just to restart the slave, then we usually don't require a new flag. However, in this case, we will leave filters on host eth0, and a simple restart of the slave won't clean up those filters.
>     
>     For 2), Yes, we exercise the code of creating those filters, but do we have ASSERT or EXPECT to check if those filters are properly setup? AFAICT, even if the filters are not properly setup, the existing tests will still finish, no?

1) We only leave the default filters, which is used to catch all non-matched packets (essentially a kenrel issue as mentioned in comments), so nothing harms here.


2) Test cases like ContainerICMPExternal check the return value of ::create() and ::isolate(), so if any of these filters fails to setup, then they should fail, right? I mean:


  // Isolate the forked child.
  AWAIT_READY(isolator.get()->isolate(containerId, pid.get()));

Even if not, we should fix these test cases rather adding a new one? Since they are supposed to catch all tc setup failures?


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.
> 
> Cong Wang wrote:
>     1) That depends on what is our fallback solution in case of bugs, potentially every line of new code needs a flag if not reverting to an old binary. :) So generally speaking, what is our solution when we hit some bug during deloyment? If it is reverting to the old version, then adding a flag just for deloyment is useless. Or if it is turning off some flag, then as I said every new code would potentially need a flag and some new code can't have a flag, that means this way doesn't scale.
>     
>     2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create() which should cover these new code too, no?

For 1), if the fallback solution is to just to restart the slave, then we usually don't require a new flag. However, in this case, we will leave filters on host eth0, and a simple restart of the slave won't clean up those filters.

For 2), Yes, we exercise the code of creating those filters, but do we have ASSERT or EXPECT to check if those filters are properly setup? AFAICT, even if the filters are not properly setup, the existing tests will still finish, no?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.

1) That depends on what is our fallback solution in case of bugs, potentially every line of new code needs a flag if not reverting to an old binary. :) So generally speaking, what is our solution when we hit some bug during deloyment? If it is reverting to the old version, then adding a flag just for deloyment is useless. Or if it is turning off some flag, then as I said every new code would potentially need a flag and some new code can't have a flag, that means this way doesn't scale.

2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create() which should cover these new code too, no?


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.

For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise, if there's a bug, all slave needs to be rebooted.

For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise the code path in the isolator. The test you mentioned is a unit test for the routing library stuff.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these two issues.

Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior; 2) We already have a test case, FqCodelClassifier.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


Two high level comments:

1) We need a slave flag to control if turning on this feature or not
2) Do you think we should at least add some unit test for this? For example, verify that the filters are properly set up?


src/linux/routing/filter/filter.hpp
<https://reviews.apache.org/r/31505/#comment136460>

    These two can be merged since `classid` is already Optional. So please remove the first constructor and change the callsites accordingly.



src/linux/routing/filter/filter.hpp
<https://reviews.apache.org/r/31505/#comment136461>

    These two can be merged too. Remove the latter and updates the callsites.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136470>

    Please add a NOTE: prefix here for the comments.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136465>

    We should call it flowIds:
    ```
    flowIds[sourcePorts.get()] = classid.get().secondary();
    ```



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136468>

    s/classid/flowId/



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136466>

    'classid' here could be None, right? It'll cause classid.get() to crash. So please move it to the end of this function:
    
    ```
    if (flowId.isSome()) {
      info->flowId = flowId.get();
      freeFlowIds.erase(flowId.get());
    }
    ```



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136469>

    Kill this blank line.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136476>

    Rest of the host packets...



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136480>

    Could you please move this right above `set<string> targets` and add a LOG as well?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136482>

    s/Host/host/
    
    We use camelCase for varibles. Please fix it here and everywhere else.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136481>

    Please add a blank line above.


- Jie Yu


On May 17, 2015, 12:27 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 17, 2015, 12:27 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 26, 2015, 9:54 p.m., Chi Zhang wrote:
> > src/slave/containerizer/isolators/network/port_mapping.cpp, lines 1843-1850
> > <https://reviews.apache.org/r/31505/diff/8/?file=972152#file972152line1843>
> >
> >     this could also happen if we are recovering a container not using this feature right?

Probably. It hard to mention all the cases in comments.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review85268
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Chi Zhang <ch...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review85268
-----------------------------------------------------------



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment136803>

    this could also happen if we are recovering a container not using this feature right?


- Chi Zhang


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review85328
-----------------------------------------------------------


Patch looks great!

Reviews applied: [31502, 31503, 31504, 31505]

All tests passed.

- Mesos ReviewBot


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review86155
-----------------------------------------------------------

Ship it!


I added the slave flag for you.

Please follow up with tests:
https://issues.apache.org/jira/browse/MESOS-2799

- Jie Yu


On June 1, 2015, 10:45 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 10:45 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
>   src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
>   src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
>   src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
>   src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On June 2, 2015, 12:57 a.m., Chi Zhang wrote:
> > Cong, is this the patch the requires code from 3.2.26 for the first time? If so, should we add a compile-time flag to config.ac? That way Paul can safely back off the code for the "stat string conversion" from the other patch you commented on.
> 
> Cong Wang wrote:
>     No, https://reviews.apache.org/r/31503/ this one should be the first commit requires 3.2.26.
> 
> Chi Zhang wrote:
>     hmm looks like both are committed? could you submit a fix for that please?

Done: https://issues.apache.org/jira/browse/MESOS-2803 . Feel free to take it if you have time.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review86133
-----------------------------------------------------------


On June 1, 2015, 10:45 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 10:45 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
>   src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
>   src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
>   src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
>   src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On June 2, 2015, 12:57 a.m., Chi Zhang wrote:
> > Cong, is this the patch the requires code from 3.2.26 for the first time? If so, should we add a compile-time flag to config.ac? That way Paul can safely back off the code for the "stat string conversion" from the other patch you commented on.

No, https://reviews.apache.org/r/31503/ this one should be the first commit requires 3.2.26.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review86133
-----------------------------------------------------------


On June 1, 2015, 10:45 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 10:45 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
>   src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
>   src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
>   src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
>   src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Chi Zhang <ch...@gmail.com>.

> On June 2, 2015, 12:57 a.m., Chi Zhang wrote:
> > Cong, is this the patch the requires code from 3.2.26 for the first time? If so, should we add a compile-time flag to config.ac? That way Paul can safely back off the code for the "stat string conversion" from the other patch you commented on.
> 
> Cong Wang wrote:
>     No, https://reviews.apache.org/r/31503/ this one should be the first commit requires 3.2.26.

hmm looks like both are committed? could you submit a fix for that please?


- Chi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review86133
-----------------------------------------------------------


On June 1, 2015, 10:45 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 10:45 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
>   src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
>   src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
>   src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
>   src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Chi Zhang <ch...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review86133
-----------------------------------------------------------


Cong, is this the patch the requires code from 3.2.26 for the first time? If so, should we add a compile-time flag to config.ac? That way Paul can safely back off the code for the "stat string conversion" from the other patch you commented on.

- Chi Zhang


On June 1, 2015, 10:45 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 10:45 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
>   src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
>   src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
>   src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
>   src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated June 1, 2015, 10:45 p.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Rebase the patch per dicussion with Jie.


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/linux/routing/filter/basic.cpp 4ce8acb7040f2ed8ec3834dd7702a20f5316c20f 
  src/linux/routing/filter/filter.hpp aaca57fbe80e3ffa3dd2c2bbed93849013b7a382 
  src/linux/routing/filter/icmp.cpp 76877fb94a1c1e79133b8975419d2ea7d0300650 
  src/linux/routing/filter/ip.hpp 9645f9488938c55fec253b36d9fa30eae72a8ca2 
  src/linux/routing/filter/ip.cpp 0f3b856bd04f6a881e35452a24399715cd8a174f 
  src/slave/containerizer/isolators/network/port_mapping.hpp 6579286b779882bec493a8e0f10486adc316dc6e 
  src/slave/containerizer/isolators/network/port_mapping.cpp 843e52d6f9923d0ee0a0297cd5c464b8b72f5de3 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated May 26, 2015, 8:41 p.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Address review comments


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
  src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
  src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
  src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
  src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
  src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
  src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated May 17, 2015, 12:27 a.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Set terminal flag for u32 filters.


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
  src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
  src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
  src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
  src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated May 15, 2015, 9:32 p.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Address review comments from Jie


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
  src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 14, 2015, 7:44 p.m., Jie Yu wrote:
> > src/slave/containerizer/isolators/network/port_mapping.cpp, line 1767
> > <https://reviews.apache.org/r/31505/diff/5/?file=955872#file955872line1767>
> >
> >     No need to use `filter::` prefix for `filter::Filter` as it's already included.

Without it I get:

../../src/slave/containerizer/isolators/network/port_mapping.cpp: In member function 'Try<mesos::internal::slave::PortMappingIsolatorProcess::Info*> mesos::internal::slave::PortMappingIsolatorProcess::_recover(pid_t)':
../../src/slave/containerizer/isolators/network/port_mapping.cpp:1766:38: error: template argument 1 is invalid
   Result<vector<Filter<ip::Classifier>>> eth0EgressFilters =


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review83690
-----------------------------------------------------------


On May 12, 2015, 12:14 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 12, 2015, 12:14 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On May 14, 2015, 7:44 p.m., Jie Yu wrote:
> > src/slave/containerizer/isolators/network/port_mapping.hpp, line 329
> > <https://reviews.apache.org/r/31505/diff/5/?file=955871#file955871line329>
> >
> >     Please add some comments about what this is. Consider using hashset instead of std::set.

We don't need a hash here, we only need to get an unused ID from this set.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review83690
-----------------------------------------------------------


On May 12, 2015, 12:14 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 12, 2015, 12:14 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review83690
-----------------------------------------------------------



src/slave/containerizer/isolators/network/port_mapping.hpp
<https://reviews.apache.org/r/31505/#comment134860>

    Please add some comments about what this is. Consider using hashset instead of std::set.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134858>

    Could you please add some comments about what this is? For example, what is container min flowid? What is host flowid? Why ICMP and ARP are sharing the same flowid?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134733>

    What does that mean? What do you want to revise? You probably want to restructure the TODO like the following:
    
    ```
    TODO(cwang): Maybe we can continue .... See details in MESOS-2370.
    ```



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134859>

    The comments here is a little confusing. The flowIDs here is actually freeFlowIds, right? How about just calling it freeFlowIds?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134863>

    We define variables when we want to use them. So you don't have to declare this variable here.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134866>

    Can you move this code block after getting the vethClassifiers?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134862>

    No need to use `filter::` prefix for `filter::Filter` as it's already included.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134864>

    s/flowClassifiers/eth0EgressFilters/



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134865>

    s/classifiers/vethIngressClassifiers/



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134875>

    You probably also want to do some sanity check here to make sure all port ranges from a container points to the same flow on eth0 egress. Also, PortRange supports hashing, so you can probably pre-create a hashmap<PortRange, classid> mapping and lookup the hashmap later:
    
    ```
    // Construct a port range to classid mapping from host eth0 egress.
    // TODO: Consider moving the following to a common place so that we
    // don't duplicate this logic for each container.
    Result<vector<Filter<ip::Classifier>>> eth0EgressFilters =
      ip::filters(eth0, fq_codel::HANDLE);
      
    ...
    
    hashmap<PortRange, uint16_t> classids;
    foreach (const Filter<ip::Classifier>& filter, eth0EgressFilters.get()) {
      const Option<PortRange> sourcePorts = filter.classifier().sourcePorts();
      const Option<uint16_t> classid = filter.classifier().classid(); 
      
      if (sourcePorts.isNone()) {
        return Error("Missing source ports for filters on egress of " + eth0);
      }
      
      if (classid.isNone()) {
        return Error("Missing classid for filters on egress of " + eth0);
      }
      
      if (classids.contains(range) {
        return Error(
          "Duplicated port range " + stringify(range) +
          " detected on egress of " + eth0);
      }
      
      classids[range.get()] = classid.get();
    }
    
    ...
    
    IntervalSet<uint16_t> nonEphemeralPorts;
    IntervalSet<uint16_t> ephemeralPorts;
    Option<uint16_t> classid;
    
    foreach (...) {
      ...
      if (classids.contains(sourcePorts.get()) {
        if (classid.isNone()) {
          classid = classids.get(sourcePorts.get());
        } else if (classid != classids.get(sourcePorts.get())) {
          return Error("A container is associated with multiple flows on eth0 egress");
        }
      } else if (classid.isSome()) {
        // This is the case where some port range of a container is assigned
        // to a flow while some isn't. This could happen if slave crashes
        // while those filters are created. However, this is OK for us because
        // ...
        LOG(WARNING) << "...";
      }
      ...
    }
    ```



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134885>

    info->flowId = getNextFlowId();



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134886>

    s/flowId/info->flowId/



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134889>

    Please passing in flowId and make 'addHostIPFilters' accept an Option<uint16_t> flowId.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134890>

    Shouldn't this be
    
    Priority(ICMP_FILTER_PRIORITY, NORMAL)?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134892>

    Add a space before '='



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134891>

    Ditto here.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134894>

    Priority(DEFAULT_FILTER_PRIORITY, NORMAL); ?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134901>

    Do you need a new flow here? I would imagine to reuse the existing flow if exists?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134871>

    Could you please add a comment here saying that all IP packets from a container will be assigned a  *single* flow on host eth0.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134899>

    Why is that? I would imagine some port range still points the original flow?



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134870>

    s/const//
    
    We typically don't use const for primitive type parameters.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment134896>

    Priority(IP_FILTER_PRIORITY, NORMAL)?


- Jie Yu


On May 12, 2015, 12:14 a.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 12, 2015, 12:14 a.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated May 12, 2015, 12:14 a.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Move to new API's


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e 
  src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/
-----------------------------------------------------------

(Updated March 10, 2015, 11:28 p.m.)


Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.


Changes
-------

Move to new API's.


Summary (updated)
-----------------

Add flow classifiers for fq_codel on egress


Bugs: MESOS-2422
    https://issues.apache.org/jira/browse/MESOS-2422


Repository: mesos


Description
-------

Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.

We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.

TODO: get some performance numbers


Diffs (updated)
-----

  src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a83fe970546b3eddcb29b413ee2a7888b9 
  src/slave/containerizer/isolators/network/port_mapping.cpp aebc528b12e4afdf49367aa01926c05fa37da0e0 

Diff: https://reviews.apache.org/r/31505/diff/


Testing
-------

Manually start two mesos containers with netperf running side.


Thanks,

Cong Wang


Re: Review Request 31505: (5/5) Add flow classifiers for fq_codel on egress

Posted by Cong Wang <cw...@twopensource.com>.

> On March 9, 2015, 7:37 p.m., Jie Yu wrote:
> > src/slave/containerizer/isolators/network/port_mapping.cpp, lines 1528-1533
> > <https://reviews.apache.org/r/31505/diff/2/?file=884517#file884517line1528>
> >
> >     I agree with Chi that the API is a little confusing and the code is hard to read and follow here.
> >     
> >     IMO, `flowId()` shouldn't be part of `ip:Classifier`'s interfaces because it has nothing to do with "classifying a packet".
> >     
> >     Here are my suggestions. We need to expose a new interface `vector<Filter<ip::Classifier>> filters(link, handle);` (it's already there in the internal namesapce, we just need to expose it).
> >     
> >     We can get the action attached to the filter and then we can get the flow id from the action.
> >     
> >     Let's chat about this.

It is historical that flowid is stored in each filter probably because TC action was not even invented at that time. This matches TC definition at least. Here we only read the flowid when recover. As you said in some comment, libnl doesn't support to read action. Not sure if it worthes the effort to fix it, since it may need more work.


- Cong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review75738
-----------------------------------------------------------


On March 9, 2015, 6:57 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated March 9, 2015, 6:57 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 7a38735 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: (5/5) Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.

> On March 9, 2015, 7:37 p.m., Jie Yu wrote:
> > src/slave/containerizer/isolators/network/port_mapping.cpp, lines 1528-1533
> > <https://reviews.apache.org/r/31505/diff/2/?file=884517#file884517line1528>
> >
> >     I agree with Chi that the API is a little confusing and the code is hard to read and follow here.
> >     
> >     IMO, `flowId()` shouldn't be part of `ip:Classifier`'s interfaces because it has nothing to do with "classifying a packet".
> >     
> >     Here are my suggestions. We need to expose a new interface `vector<Filter<ip::Classifier>> filters(link, handle);` (it's already there in the internal namesapce, we just need to expose it).
> >     
> >     We can get the action attached to the filter and then we can get the flow id from the action.
> >     
> >     Let's chat about this.
> 
> Cong Wang wrote:
>     It is historical that flowid is stored in each filter probably because TC action was not even invented at that time. This matches TC definition at least. Here we only read the flowid when recover. As you said in some comment, libnl doesn't support to read action. Not sure if it worthes the effort to fix it, since it may need more work.

I added a review to get all filters:
https://reviews.apache.org/r/31870/

Once your action::Flow patch is in, we can add the decode logic in `decodeFilter`.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review75738
-----------------------------------------------------------


On March 9, 2015, 6:57 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated March 9, 2015, 6:57 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 7a38735 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: (5/5) Add flow classifiers for fq_codel on egress

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review75738
-----------------------------------------------------------


Some high level comments. Let's chat about that.


src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment122974>

    I am rethinking about the API about flows and here are my thoughts. I think this is pretty important for readability.
    
    Each `Flow` action needs to have a parent and an ID. So the interface is like the following:
    ```
    class Flow
    {
      // NOTE: 'id' must be great than zero.
      Flow(const Handle& parent, uint16_t id);
      
      // Construct a flow from 32 bit integer. This
      // constructor is only used by filter internals.
      Flow(uint32_t classid);
      
      uint16_t id();
    };
    ```
    
    So `parent` is the qdisc handle, and `id` is any 16 bit integer greater than 0.



src/slave/containerizer/isolators/network/port_mapping.cpp
<https://reviews.apache.org/r/31505/#comment122976>

    I agree with Chi that the API is a little confusing and the code is hard to read and follow here.
    
    IMO, `flowId()` shouldn't be part of `ip:Classifier`'s interfaces because it has nothing to do with "classifying a packet".
    
    Here are my suggestions. We need to expose a new interface `vector<Filter<ip::Classifier>> filters(link, handle);` (it's already there in the internal namesapce, we just need to expose it).
    
    We can get the action attached to the filter and then we can get the flow id from the action.
    
    Let's chat about this.


- Jie Yu


On March 9, 2015, 6:57 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated March 9, 2015, 6:57 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 7a38735 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Re: Review Request 31505: (5/5) Add flow classifiers for fq_codel on egress

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review75751
-----------------------------------------------------------


Bad patch!

Reviews applied: [31502, 31503, 31504]

Failed command: ./support/apply-review.sh -n -r 31504

Error:
 2015-03-09 19:34:55 URL:https://reviews.apache.org/r/31504/diff/raw/ [8942/8942] -> "31504.patch" [1]
error: patch failed: src/linux/routing/filter/filter.hpp:42
error: src/linux/routing/filter/filter.hpp: patch does not apply
Failed to apply patch

- Mesos ReviewBot


On March 9, 2015, 6:57 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated March 9, 2015, 6:57 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/network/port_mapping.hpp 0f9ad4a 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 7a38735 
> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>