You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Niklas Quarfot Nielsen (JIRA)" <ji...@apache.org> on 2014/08/15 18:41:18 UTC

[jira] [Created] (MESOS-1706) Introduce socket / connection pooling to libprocess

Niklas Quarfot Nielsen created MESOS-1706:
---------------------------------------------

             Summary: Introduce socket / connection pooling to libprocess
                 Key: MESOS-1706
                 URL: https://issues.apache.org/jira/browse/MESOS-1706
             Project: Mesos
          Issue Type: Improvement
          Components: libprocess
            Reporter: Niklas Quarfot Nielsen


Just wrote a libprocess connection throughput stress test (basically two libprocess programs sending messsages back and forth). One end is multihomed so we can scale up the number of clients.
The throughput with a single client (10 "concurrent" connections or rather, send up to 10 message before awaiting responses) is roughly 8000 - 9000 requests per second.
I think I (accidentially) produced more load (around 30.000 requests per second) - but I am running into one particular error in both cases: `Failed to send, connect: Cannot assign requested address`.  According to http://khanna111.com/articles/TCPAAIU.html - it seems the only way around it is the some kind of connection pooling (we already use SO_REUSEADDR). 

It happens during connect() and hints that the machine is running out of available ports on the sender end (when getting randomly assigned ports).

{code}
I0815 07:03:49.348409 30317 main.cpp:109] 8984.79 requests / second (delta: 1.000356864secs)
I0815 07:03:50.348898 30320 main.cpp:109] 8715.88 requests / second (delta: 1.000473088secs)
I0815 07:03:51.349040 30317 main.cpp:109] 8622.64 requests / second (delta: 1.000157184secs)
I0815 07:03:52.349184 30320 main.cpp:109] 9039.69 requests / second (delta: 1.000144896secs)
I0815 07:03:53.349478 30319 main.cpp:109] 8768.42 requests / second (delta: 1.000293888secs)
I0815 07:03:54.349954 30322 main.cpp:109] 8728.9 requests / second (delta: 1.000470016secs)
I0815 07:03:55.350334 30316 main.cpp:109] 8628.79 requests / second (delta: 1.000371968secs)
I0815 07:03:56.350957 30320 main.cpp:109] 8726.57 requests / second (delta: 1.000621824secs)
I0815 07:03:57.351474 30318 main.cpp:109] 8587.46 requests / second (delta: 1.000529152secs)
I0815 07:03:58.351805 30314 main.cpp:109] 8475.16 requests / second (delta: 1.000335104secs)
F0815 07:03:59.092653 30323 process.cpp:2197] Failed to send, connect: Cannot assign requested address [99]
*** Check failure stack trace: ***
Aborted

{code}

One way to deal with it couple be to introduce the notion of connection pooling.
Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)