You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Yann Ylavic <yl...@gmail.com> on 2014/10/08 01:18:49 UTC

Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Hi,

some notes about the current implementation of this (trunk only).

First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
This, I think, is not the intention of Yingqi Lu's original proposal,
and probably my fault since I asked for the patch to be splitted in
two for a better understanding (finally the SO_REUSEPORT patch only
has been commited).
The fact is that without SO_REUSEPORT, this serves nothing, and we'd
better use one listener per bucket (as originally proposed), or a
single bucket with no duplication (as before) if the performance gains
do not worth it.
WDYT?

Also, there is no opt-in/out for this functionalities, nor a way to
configure number of buckets ratio wrt number of CPUs (cores).
For example, SO_REUSEPORT also exists on *BSD, but I doubt it would
work as expected since IFAICT this not the same thing as in Linux
(DragonFly's implementation seems to be closed to Linux' one though).
Yet, the dynamic setsockopt() check will also succeed on BSD, and the
functionality be enabled.
So opt in (my preference) or out?

Finally, some global variables (not even ap_ prefixed) are used to
communicate between listen.c and the MPM. This helps not breaking the
API, but this is trunk...
I guess we can fix it, this is just a (self or anyone's) reminder :)

Regards,
Yann.

Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Oct 8, 2014 at 2:26 AM, Yann Ylavic <yl...@gmail.com> wrote:
> On Wed, Oct 8, 2014 at 2:03 AM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Wed, Oct 8, 2014 at 1:50 AM, Lu, Yingqi <yi...@intel.com> wrote:
>>> 3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?
>>
>> Maybe defining new functions with more arguments (to be used by the
>> existing ones with NULL or default values) is a better alternative.
>
> For example, ap_duplicate_listeners could be modified to provide
> mpm_listen and even do the computation of num_buckets and provide it
> (this is not an API change since it is trunk only for now).
>
> ap_close_listeners() could be then restored as before (work on
> ap_listeners only) and ap_close_duplicated_listeners(mpm_listen) be
> introduced and used in the MPMs instead.
>
> Hence ap_listen_rec *mpm_listeners could be MPM local, which would
> then call ap_duplicate_listeners(..., &mpm_listeners, &num_buckets)
> and ap_close_duplicated_listeners(mpm_listeners)

All these (new) fields could also be in a struct so that future
changes won't require a new function.

>
> That's just a quick thought...
>
>>
>> Please be aware that existing AP_DECLAREd functions API must not change though.
>>
>> Regards,
>> Yann.
>>
>>>
>>> Thanks,
>>> Yingqi
>>>
>>>
>>> -----Original Message-----
>>> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
>>> Sent: Tuesday, October 07, 2014 4:19 PM
>>> To: httpd
>>> Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk
>>>
>>> Hi,
>>>
>>> some notes about the current implementation of this (trunk only).
>>>
>>> First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
>>> This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
>>> The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
>>> WDYT?
>>>
>>> Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
>>> For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
>>> Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
>>> So opt in (my preference) or out?
>>>
>>> Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
>>> I guess we can fix it, this is just a (self or anyone's) reminder :)
>>>
>>> Regards,
>>> Yann.

RE: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by "Lu, Yingqi" <yi...@intel.com>.
Hi Yann,

Thanks for your quick email.

Yes, with current implementation, accept mutex is not being removed, just being cut into smaller ones. My point was with smaller system, the hardware resource is less too so that the maximum traffic it can drive is not as much as the big systems. In that sense, the child process/bucket contention may not be hugely increased compared to big system. Running at peak performance level, the total number of child process should scale with the size of the systems if there is no other hardware resource limitations. Then, the child process/bucket should maintain at the similar rate no matter of the system size if we use some reasonable ListenCoresBucketsRatio.

Regarding to the "timeout" issue, I think I did not write it clearly in my last email. Testing trunk version with ServerLimit=Number_buckets=StartServer, I did not see any connection timeouts or connection losses. I only saw performance regressions.

The "timeout or connection losses" issues only occur when I tested the approach that create the listen socket inside child process. In this case, master process does not control any listen sockets any more, but let each child do it on its own. If I remember correctly, I think that was your quick prototype a while back after I posted the original patch. In the original discussion thread, I mentioned the connection issues and the performance degradation as well. 

Again, thank you very much for your help!

Yingqi


-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com] 
Sent: Friday, November 07, 2014 7:49 AM
To: httpd
Subject: Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Hi Yingqi,

thanks for sharing your results.

On Thu, Nov 6, 2014 at 9:12 PM, Lu, Yingqi <yi...@intel.com> wrote:
> I do not see any documents regarding to this new configurable flag 
> ListenCoresBucketsRatio (maybe I missed it)

Will do it when possible, good point.

> Regarding to how to make small systems take advantage of this patch, I actually did some testing on system with less cores. The data show that when system has less than 16 cores, more than 1 bucket does not bring any throughput and response time benefits. The patch is used mainly for big systems to resolve the scalability issue. That is the reason why we previously hard coded the ratio to 8 (impact only on system has 16 cores or more).
>
> The accept_mutex is not much a bottleneck anymore with the current patch implantation. Current implementation already cut 1 big mutex into multiple smaller mutexes in the multiple listen statements case (each bucket has its dedicated accept_mutex). To prove this, our data show performance parity between 1 listen statement (listen 80, no accept_mutex) and 2 listen statements (listen 192.168.1.1 80, listen 192.168.1.2 80, with accept_mutex) with current trunk version. Comparing against without SO_REUSEPORT patch, we see 28% performance gain with 1 listen statement case and 69% gain with 2 listen statements case.

With the current implementation and a reasonable number of servers
(children) started, this is surely true, your numbers prove it.
However, the less buckets (CPU cores), the more contention on each bucket (ie. listeners waiting on the same socket(s)/mutex).
So the results with less cores are quite expected IMHO.

But we can't remove the accept mutex since there will always be more servers than the number of buckets.

>
> Regarding to the approach that enables each child has its own listen socket, I did some testing with current trunk version to increase the number of buckets to be equal to a reasonable serverlimit (this avoids number of child processes changes). I also verified that MaxClient and ThreadPerChild were set properly. I used single listen statement so that accept_mutex was disabled. Comparing against the current approach, this has ~25% less throughput with significantly higher response time.
>
> In addition to this, implementing the listen socket for each child separately has less performance as well as connection loss/timeout issues with current Linux kernel. Below are more information/data we collected with "each child process has its own listen socket" approach:
> 1. During the run, we noticed that there are tons of “read timed out” errors. These errors not only happen when the system is highly utilized, it even happens when system is only 10% utilized. The response time was high.
> 2. Compared to current trunk implementation, we found "each child has its own listen socket approach" results in significantly higher (up to 10X) response time at different CPU utilization levels. At peak performance level, it has 20+% less throughput with tons of “connection reset” errors in additional to “read timed out” errors. Current trunk implementation does not have errors.
> 3. During the graceful restart, there are tons of connection losses.

Did you also set StartServers = ServerLimit?
One bucket per child implies that all the children are up to receive connections or the system may distribute connections to buckets waiting for a child to handle them.
Linux may distribute the connections based on the listen()ing sockets, not the ones currently being accept()ed by some child.

I don't know your configuration regarding ServerLimit, or more occurrately the number of children really started during the steady state of the stress test: let that number be S.

I suppose that S >= num_buckets in your tests with the current implementation, so there is always at least one child to accept() connections on a bucket, so this cannot happen.

I expect that with one bucket per child (listen()ed in the parent process), any number of cores, no accept mutex, and StartServers = ServerLimit = S, the system distributes evenly the connections accross all the children, without any "read timeout" or graceful restart issue.
Otherwise there is a(nother) kernel bug not worked around by the current implementation, and the same thing may happen when (S /
num_buckets) reaches some limit...

>
> Based on the above findings, I think we may want to keep the current 
> approach. It is a clean, working and better performing one :-)

My point is not (at all) to replace the current approach, but maybe have another ListenBuckets* directive for systems with any number of cores. This would not change the current ListenCoresBucketsRatio behaviour, just looking at another way to configure/exploit listeners buckets ;)

Regards,
Yann.

Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by Yann Ylavic <yl...@gmail.com>.
Hi Yingqi,

thanks for sharing your results.

On Thu, Nov 6, 2014 at 9:12 PM, Lu, Yingqi <yi...@intel.com> wrote:
> I do not see any documents regarding to this new configurable flag ListenCoresBucketsRatio (maybe I missed it)

Will do it when possible, good point.

> Regarding to how to make small systems take advantage of this patch, I actually did some testing on system with less cores. The data show that when system has less than 16 cores, more than 1 bucket does not bring any throughput and response time benefits. The patch is used mainly for big systems to resolve the scalability issue. That is the reason why we previously hard coded the ratio to 8 (impact only on system has 16 cores or more).
>
> The accept_mutex is not much a bottleneck anymore with the current patch implantation. Current implementation already cut 1 big mutex into multiple smaller mutexes in the multiple listen statements case (each bucket has its dedicated accept_mutex). To prove this, our data show performance parity between 1 listen statement (listen 80, no accept_mutex) and 2 listen statements (listen 192.168.1.1 80, listen 192.168.1.2 80, with accept_mutex) with current trunk version. Comparing against without SO_REUSEPORT patch, we see 28% performance gain with 1 listen statement case and 69% gain with 2 listen statements case.

With the current implementation and a reasonable number of servers
(children) started, this is surely true, your numbers prove it.
However, the less buckets (CPU cores), the more contention on each
bucket (ie. listeners waiting on the same socket(s)/mutex).
So the results with less cores are quite expected IMHO.

But we can't remove the accept mutex since there will always be more
servers than the number of buckets.

>
> Regarding to the approach that enables each child has its own listen socket, I did some testing with current trunk version to increase the number of buckets to be equal to a reasonable serverlimit (this avoids number of child processes changes). I also verified that MaxClient and ThreadPerChild were set properly. I used single listen statement so that accept_mutex was disabled. Comparing against the current approach, this has ~25% less throughput with significantly higher response time.
>
> In addition to this, implementing the listen socket for each child separately has less performance as well as connection loss/timeout issues with current Linux kernel. Below are more information/data we collected with "each child process has its own listen socket" approach:
> 1. During the run, we noticed that there are tons of “read timed out” errors. These errors not only happen when the system is highly utilized, it even happens when system is only 10% utilized. The response time was high.
> 2. Compared to current trunk implementation, we found "each child has its own listen socket approach" results in significantly higher (up to 10X) response time at different CPU utilization levels. At peak performance level, it has 20+% less throughput with tons of “connection reset” errors in additional to “read timed out” errors. Current trunk implementation does not have errors.
> 3. During the graceful restart, there are tons of connection losses.

Did you also set StartServers = ServerLimit?
One bucket per child implies that all the children are up to receive
connections or the system may distribute connections to buckets
waiting for a child to handle them.
Linux may distribute the connections based on the listen()ing sockets,
not the ones currently being accept()ed by some child.

I don't know your configuration regarding ServerLimit, or more
occurrately the number of children really started during the steady
state of the stress test: let that number be S.

I suppose that S >= num_buckets in your tests with the current
implementation, so there is always at least one child to accept()
connections on a bucket, so this cannot happen.

I expect that with one bucket per child (listen()ed in the parent
process), any number of cores, no accept mutex, and StartServers =
ServerLimit = S, the system distributes evenly the connections accross
all the children, without any "read timeout" or graceful restart
issue.
Otherwise there is a(nother) kernel bug not worked around by the
current implementation, and the same thing may happen when (S /
num_buckets) reaches some limit...

>
> Based on the above findings, I think we may want to keep the current approach. It is a clean, working and better performing one :-)

My point is not (at all) to replace the current approach, but maybe
have another ListenBuckets* directive for systems with any number of
cores. This would not change the current ListenCoresBucketsRatio
behaviour, just looking at another way to configure/exploit listeners
buckets ;)

Regards,
Yann.

RE: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by "Lu, Yingqi" <yi...@intel.com>.
Hi Yann,

I do not see any documents regarding to this new configurable flag ListenCoresBucketsRatio (maybe I missed it) and also users may not be familiar with it, I still think maybe it is better to keep the default to 8 at least in the trunk. 

Regarding to how to make small systems take advantage of this patch, I actually did some testing on system with less cores. The data show that when system has less than 16 cores, more than 1 bucket does not bring any throughput and response time benefits. The patch is used mainly for big systems to resolve the scalability issue. That is the reason why we previously hard coded the ratio to 8 (impact only on system has 16 cores or more). 

The accept_mutex is not much a bottleneck anymore with the current patch implantation. Current implementation already cut 1 big mutex into multiple smaller mutexes in the multiple listen statements case (each bucket has its dedicated accept_mutex). To prove this, our data show performance parity between 1 listen statement (listen 80, no accept_mutex) and 2 listen statements (listen 192.168.1.1 80, listen 192.168.1.2 80, with accept_mutex) with current trunk version. Comparing against without SO_REUSEPORT patch, we see 28% performance gain with 1 listen statement case and 69% gain with 2 listen statements case. 

Regarding to the approach that enables each child has its own listen socket, I did some testing with current trunk version to increase the number of buckets to be equal to a reasonable serverlimit (this avoids number of child processes changes). I also verified that MaxClient and ThreadPerChild were set properly. I used single listen statement so that accept_mutex was disabled. Comparing against the current approach, this has ~25% less throughput with significantly higher response time.

In addition to this, implementing the listen socket for each child separately has less performance as well as connection loss/timeout issues with current Linux kernel. Below are more information/data we collected with "each child process has its own listen socket" approach:
1. During the run, we noticed that there are tons of “read timed out” errors. These errors not only happen when the system is highly utilized, it even happens when system is only 10% utilized. The response time was high.
2. Compared to current trunk implementation, we found "each child has its own listen socket approach" results in significantly higher (up to 10X) response time at different CPU utilization levels. At peak performance level, it has 20+% less throughput with tons of “connection reset” errors in additional to “read timed out” errors. Current trunk implementation does not have errors.
3. During the graceful restart, there are tons of connection losses. 

Based on the above findings, I think we may want to keep the current approach. It is a clean, working and better performing one :-)

Thanks,
Yingqi


-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
Sent: Thursday, November 06, 2014 4:59 AM
To: httpd
Subject: Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Rebasing discussion here since this thread seems to be referenced in PR55897, and the discussion has somehow been forked and continued in [1].

[1]. http://mail-archives.apache.org/mod_mbox/httpd-dev/201410.mbox/%3C9ACD5B67AAC5594CB6268234CF29CF9AA37E994E@ORSMSX113.amr.corp.intel.com%3E

On Sat, Oct 11, 2014 at 1:55 AM, Lu, Yingqi <yi...@intel.com> wrote:
> Attached patch is generated based on current trunk. It covers for prefork/worker/event/eventopt MPM.

The patch (modified) has now been applied to trunk with r1635521.

On Thu, Oct 30, 2014 at 5:10 PM, Lu, Yingqi <yi...@intel.com> wrote:
> As this is getting better, I am wondering if you guys have plan to put this SO_REUSEPORT patch into the stable version.
> If yes, do you have a rough timeline?

The whole feature could certainly be proposed for 2.4.x since there is no (MAJOR) API change.

On Thu, Nov 6, 2014 at 6:52 AM, Lu, Yingqi <yi...@intel.com> wrote:
> I just took some testing on the most recent trunk version.
> I found out that num_buckets is default to 1 (ListenCoresBucketsRatio is default to 0).
> Adding ListenCoresBucketsRatio is great since user can have control over this.
> However, I am thinking it may be better to make this default at 8. 
> This will make the SO_REUSEPORT support to be default enabled (8 buckets).
(8 buckets with 64 CPU cores, lucky you...).

Yes this change wrt to your original patch is documented in the commit message, including how change it to an opt-out.
I chose the opt-in way because I almost always find it safer, especially for backports to stable.
I have no strong opinion on this regarding trunk, though, could be an opt-out (easily) there.

Let's see what others say on this and the backport to 2.4.x.
Anyone?

> In case users are not aware of this new ListenCoresBucketsRatio 
> configurable flag, they can still enjoy the performance benefits.

Users with 64 cores available should always care about performances tuning ;)

Btw, I wonder if there are other ways to take advantage of the listeners buckets (even with fewer cores).
The other advantage of SO_REUSEPORT is that, provided that each child has its own listeners bucket, we can avoid the accept mutex lock (which also seemed yo be a bottleneck if I recall your original proposal's discussion correctly).

Did you make any testing without the accept mutex and with a number of buckets equal to some reasonable ServerLimit (and then play with ThreadPerChild to reach the MaxClient/MaxRequestWorkers goal)?

Regards,
Yann.

Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by Yann Ylavic <yl...@gmail.com>.
Rebasing discussion here since this thread seems to be referenced in
PR55897, and the discussion has somehow been forked and continued in
[1].

[1]. http://mail-archives.apache.org/mod_mbox/httpd-dev/201410.mbox/%3C9ACD5B67AAC5594CB6268234CF29CF9AA37E994E@ORSMSX113.amr.corp.intel.com%3E

On Sat, Oct 11, 2014 at 1:55 AM, Lu, Yingqi <yi...@intel.com> wrote:
> Attached patch is generated based on current trunk. It covers for prefork/worker/event/eventopt MPM.

The patch (modified) has now been applied to trunk with r1635521.

On Thu, Oct 30, 2014 at 5:10 PM, Lu, Yingqi <yi...@intel.com> wrote:
> As this is getting better, I am wondering if you guys have plan to put this SO_REUSEPORT patch into the stable version.
> If yes, do you have a rough timeline?

The whole feature could certainly be proposed for 2.4.x since there is
no (MAJOR) API change.

On Thu, Nov 6, 2014 at 6:52 AM, Lu, Yingqi <yi...@intel.com> wrote:
> I just took some testing on the most recent trunk version.
> I found out that num_buckets is default to 1 (ListenCoresBucketsRatio is default to 0).
> Adding ListenCoresBucketsRatio is great since user can have control over this.
> However, I am thinking it may be better to make this default at 8. This will make the SO_REUSEPORT
> support to be default enabled (8 buckets).
(8 buckets with 64 CPU cores, lucky you...).

Yes this change wrt to your original patch is documented in the commit
message, including how change it to an opt-out.
I chose the opt-in way because I almost always find it safer,
especially for backports to stable.
I have no strong opinion on this regarding trunk, though, could be an
opt-out (easily) there.

Let's see what others say on this and the backport to 2.4.x.
Anyone?

> In case users are not aware of this new ListenCoresBucketsRatio
> configurable flag, they can still enjoy the performance benefits.

Users with 64 cores available should always care about performances tuning ;)

Btw, I wonder if there are other ways to take advantage of the
listeners buckets (even with fewer cores).
The other advantage of SO_REUSEPORT is that, provided that each child
has its own listeners bucket, we can avoid the accept mutex lock
(which also seemed yo be a bottleneck if I recall your original
proposal's discussion correctly).

Did you make any testing without the accept mutex and with a number of
buckets equal to some reasonable ServerLimit (and then play with
ThreadPerChild to reach the MaxClient/MaxRequestWorkers goal)?

Regards,
Yann.

RE: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by "Lu, Yingqi" <yi...@intel.com>.
Dear All,

Attached patch is generated based on current trunk. It covers for prefork/worker/event/eventopt MPM. It supposes to address following issues regarding to SO_RESUEPORT support vs. current trunk version:

1. Same as current trunk version implementation, when active_CPU_num <= 8 or when so_reuseport is not supported by the kernel, ap_num_buckets is set to 1. In any case, there is 1 dedicated listener per bucket.

2. Remove global variables (mpm_listen, enable_default_listeners and num_buckets). mpm_listen is changed to MPM local. enabled_default_listener is completely removed. num_buckets is changed to MPM local (ap_num_buckets). I rename have_so_reuseport to ap_have_so_reuseport. The reason for keeping that global is because this variable is being used by ap_log_common(). Based on the feedback here, I think it may not be a good idea to change the function interface. 

3. Change ap_duplicated_listener to have more parameters. This function is being called from MPM local (prefork.c/worker.c/event.c/eventopt.c). In this function, prefork_listener (or worker_listener/even_listener/etc) array will be initialized and be set value. ap_num_buckets is also calculated inside this function. In addition, this version solves the issue with "one_process" case (current trunk version has issue with one_process enabled).

4. Change ap_close_listener() back to previous (2.4.X version). 

5. Change dummy_connection back to previous (2.4.X version).

6. Add ap_close_duplicated_listeners(). This is called from mpms when stopping httpd.

7. Add ap_close_child_listener(). When listener_thread (child process in prefork) exit, only the dedicated listener needs to be closed (the rest are already being closed in child_main when the child process starts).

8. Remove duplication of listener when ap_num_buckets = 1 or without SO_REUSEPORT support (ap_num_buckets = 1). With so_reuseport, only duplicated (ap_num_buckets - 1) listeners (1 less duplication less current trunk implementation).

9. Inside each mpm, move child_bucket, child_pod and child_mutex (worker/prefork only) to a struct. Also, add member bucket to the same struct. 

Please review and let me know your feedback. 

Thanks,
Yingqi

-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com] 
Sent: Tuesday, October 07, 2014 5:26 PM
To: httpd
Subject: Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

On Wed, Oct 8, 2014 at 2:03 AM, Yann Ylavic <yl...@gmail.com> wrote:
> On Wed, Oct 8, 2014 at 1:50 AM, Lu, Yingqi <yi...@intel.com> wrote:
>> 3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?
>
> Maybe defining new functions with more arguments (to be used by the 
> existing ones with NULL or default values) is a better alternative.

For example, ap_duplicate_listeners could be modified to provide mpm_listen and even do the computation of num_buckets and provide it (this is not an API change since it is trunk only for now).

ap_close_listeners() could be then restored as before (work on ap_listeners only) and ap_close_duplicated_listeners(mpm_listen) be introduced and used in the MPMs instead.

Hence ap_listen_rec *mpm_listeners could be MPM local, which would then call ap_duplicate_listeners(..., &mpm_listeners, &num_buckets) and ap_close_duplicated_listeners(mpm_listeners)

That's just a quick thought...

>
> Please be aware that existing AP_DECLAREd functions API must not change though.
>
> Regards,
> Yann.
>
>>
>> Thanks,
>> Yingqi
>>
>>
>> -----Original Message-----
>> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
>> Sent: Tuesday, October 07, 2014 4:19 PM
>> To: httpd
>> Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on 
>> trunk
>>
>> Hi,
>>
>> some notes about the current implementation of this (trunk only).
>>
>> First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
>> This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
>> The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
>> WDYT?
>>
>> Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
>> For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
>> Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
>> So opt in (my preference) or out?
>>
>> Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
>> I guess we can fix it, this is just a (self or anyone's) reminder :)
>>
>> Regards,
>> Yann.

Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Oct 8, 2014 at 2:03 AM, Yann Ylavic <yl...@gmail.com> wrote:
> On Wed, Oct 8, 2014 at 1:50 AM, Lu, Yingqi <yi...@intel.com> wrote:
>> 3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?
>
> Maybe defining new functions with more arguments (to be used by the
> existing ones with NULL or default values) is a better alternative.

For example, ap_duplicate_listeners could be modified to provide
mpm_listen and even do the computation of num_buckets and provide it
(this is not an API change since it is trunk only for now).

ap_close_listeners() could be then restored as before (work on
ap_listeners only) and ap_close_duplicated_listeners(mpm_listen) be
introduced and used in the MPMs instead.

Hence ap_listen_rec *mpm_listeners could be MPM local, which would
then call ap_duplicate_listeners(..., &mpm_listeners, &num_buckets)
and ap_close_duplicated_listeners(mpm_listeners)

That's just a quick thought...

>
> Please be aware that existing AP_DECLAREd functions API must not change though.
>
> Regards,
> Yann.
>
>>
>> Thanks,
>> Yingqi
>>
>>
>> -----Original Message-----
>> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
>> Sent: Tuesday, October 07, 2014 4:19 PM
>> To: httpd
>> Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk
>>
>> Hi,
>>
>> some notes about the current implementation of this (trunk only).
>>
>> First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
>> This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
>> The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
>> WDYT?
>>
>> Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
>> For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
>> Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
>> So opt in (my preference) or out?
>>
>> Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
>> I guess we can fix it, this is just a (self or anyone's) reminder :)
>>
>> Regards,
>> Yann.

RE: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by "Lu, Yingqi" <yi...@intel.com>.
Regarding to your comments #2, we tested on a 16 thread system and it does not bring any performance value. That is the reason I calculate this way.

Thanks for the comments below. I will try to send out a fix soon.

Thanks,
Yingqi

-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com] 
Sent: Tuesday, October 07, 2014 5:04 PM
To: httpd
Subject: Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

On Wed, Oct 8, 2014 at 1:50 AM, Lu, Yingqi <yi...@intel.com> wrote:
> Here is what I think. Currently (trunk version as well as my original 
> patch),
>
> 1. Without SO_REUSEPORT or when available CPU number < 8, num_bucket = 1 anyway. It duplicates 1 listener and use that for this single bucket. If folks think we should not duplicate in this case, I can modify the code to do that.

Yes I think the duplication should be avoided.

But is one listener per bucket an interesting alternative to num_buckets = 1?

>
> 2. num_buckets is calculated = available_CPU_num/8. When available CPU is less than 8, num_buckets = 1. It checks if SO_REUSEPORT is enabled in the kernel. If yes, it will enable it. I guess that is opt-in? Maybe I misunderstood you here, Yann. Please correct me if I do.

Why fixed 8, wouldn't one with less than 16 cores want the feature?

>
> 3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?

Maybe defining new functions with more arguments (to be used by the existing ones with NULL or default values) is a better alternative.

Please be aware that existing AP_DECLAREd functions API must not change though.

Regards,
Yann.

>
> Thanks,
> Yingqi
>
>
> -----Original Message-----
> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
> Sent: Tuesday, October 07, 2014 4:19 PM
> To: httpd
> Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on 
> trunk
>
> Hi,
>
> some notes about the current implementation of this (trunk only).
>
> First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
> This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
> The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
> WDYT?
>
> Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
> For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
> Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
> So opt in (my preference) or out?
>
> Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
> I guess we can fix it, this is just a (self or anyone's) reminder :)
>
> Regards,
> Yann.

Re: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by Yann Ylavic <yl...@gmail.com>.
On Wed, Oct 8, 2014 at 1:50 AM, Lu, Yingqi <yi...@intel.com> wrote:
> Here is what I think. Currently (trunk version as well as my original patch),
>
> 1. Without SO_REUSEPORT or when available CPU number < 8, num_bucket = 1 anyway. It duplicates 1 listener and use that for this single bucket. If folks think we should not duplicate in this case, I can modify the code to do that.

Yes I think the duplication should be avoided.

But is one listener per bucket an interesting alternative to num_buckets = 1?

>
> 2. num_buckets is calculated = available_CPU_num/8. When available CPU is less than 8, num_buckets = 1. It checks if SO_REUSEPORT is enabled in the kernel. If yes, it will enable it. I guess that is opt-in? Maybe I misunderstood you here, Yann. Please correct me if I do.

Why fixed 8, wouldn't one with less than 16 cores want the feature?

>
> 3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?

Maybe defining new functions with more arguments (to be used by the
existing ones with NULL or default values) is a better alternative.

Please be aware that existing AP_DECLAREd functions API must not change though.

Regards,
Yann.

>
> Thanks,
> Yingqi
>
>
> -----Original Message-----
> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
> Sent: Tuesday, October 07, 2014 4:19 PM
> To: httpd
> Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk
>
> Hi,
>
> some notes about the current implementation of this (trunk only).
>
> First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
> This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
> The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
> WDYT?
>
> Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
> For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
> Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
> So opt in (my preference) or out?
>
> Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
> I guess we can fix it, this is just a (self or anyone's) reminder :)
>
> Regards,
> Yann.

RE: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Posted by "Lu, Yingqi" <yi...@intel.com>.
Here is what I think. Currently (trunk version as well as my original patch),

1. Without SO_REUSEPORT or when available CPU number < 8, num_bucket = 1 anyway. It duplicates 1 listener and use that for this single bucket. If folks think we should not duplicate in this case, I can modify the code to do that.

2. num_buckets is calculated = available_CPU_num/8. When available CPU is less than 8, num_buckets = 1. It checks if SO_REUSEPORT is enabled in the kernel. If yes, it will enable it. I guess that is opt-in? Maybe I misunderstood you here, Yann. Please correct me if I do.

3. Yes, I did use some extern variables. I can change the name of them to better coordinate with the variable naming conversion. We should do something with ap_prefixed? Is there anything else I should consider when I rename the variable?

Thanks,
Yingqi


-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com] 
Sent: Tuesday, October 07, 2014 4:19 PM
To: httpd
Subject: Listeners buckets and duplication w/ and w/o SO_REUSEPORT on trunk

Hi,

some notes about the current implementation of this (trunk only).

First, whether or not SO_REUSEPORT is available, we do duplicate the listeners.
This, I think, is not the intention of Yingqi Lu's original proposal, and probably my fault since I asked for the patch to be splitted in two for a better understanding (finally the SO_REUSEPORT patch only has been commited).
The fact is that without SO_REUSEPORT, this serves nothing, and we'd better use one listener per bucket (as originally proposed), or a single bucket with no duplication (as before) if the performance gains do not worth it.
WDYT?

Also, there is no opt-in/out for this functionalities, nor a way to configure number of buckets ratio wrt number of CPUs (cores).
For example, SO_REUSEPORT also exists on *BSD, but I doubt it would work as expected since IFAICT this not the same thing as in Linux (DragonFly's implementation seems to be closed to Linux' one though).
Yet, the dynamic setsockopt() check will also succeed on BSD, and the functionality be enabled.
So opt in (my preference) or out?

Finally, some global variables (not even ap_ prefixed) are used to communicate between listen.c and the MPM. This helps not breaking the API, but this is trunk...
I guess we can fix it, this is just a (self or anyone's) reminder :)

Regards,
Yann.