You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by PSUdaemon <gi...@git.apache.org> on 2016/09/01 22:04:12 UTC

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

GitHub user PSUdaemon opened a pull request:

    https://github.com/apache/trafficserver/pull/956

    TS-4806: Fix up event processor thread stacks

    Fix event processor to create stacks on the appropriate numa node and with the appropriate page size. Also, stop using the main thread as ET_NET 0 since we can't control any of these aspects of it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/PSUdaemon/trafficserver numa_stacks

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #956
    
----
commit 9df0dcb23f120d9932077a347bb57ce2652be0b9
Author: Phil Sorber <so...@apache.org>
Date:   2016-08-30T21:43:23Z

    TS-4806: Add ability to pass a new stack to thread creation.

commit 6bdbbd713d83c0fc40590d0377cb4e1bd62c0bb6
Author: Phil Sorber <so...@apache.org>
Date:   2016-08-31T16:51:46Z

    TS-4806: Normalize stacksize

commit 5ff928a165b77e97c42cdad2a870a5dfd429f8e1
Author: Phil Sorber <so...@apache.org>
Date:   2016-08-31T21:01:01Z

    TS-4806: Allocate thread stacks on corresponding NUMA nodes.

commit a94587bf16ab3fd30038b48fcd37fc7593cac4d9
Author: Phil Sorber <so...@apache.org>
Date:   2016-08-31T22:00:12Z

    TS-4806: Stop re-using main thread as net thread.

commit 0297bfc60a9960d7632f6dc8b3b9955dda1182a2
Author: Phil Sorber <so...@apache.org>
Date:   2016-09-01T21:55:51Z

    TS-4806: Make stacks use huge pages if enabled.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by PSUdaemon <gi...@git.apache.org>.
Github user PSUdaemon commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77461624
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    --- End diff --
    
    So each thread has a CPU set that we convert into a node set. Node in this case is a NUMA memory node. So the likely case is that it returns one node. But if you chose to bind your threads to the `machine` or `system` in `records.config` then it might cover multiple nodes. In which case we want to interleave.
    
    I don't think this depends at all on how `malloc(3)` chooses to allocate. It's all about what memory the OS (kernel/libc) returns for a given thread. So we tell the OS, please give this thread memory from a specific NUMA node.
    
    Each thread will likely be on a different node so we need to look at the CPU set for each thread. The PU's are also usually interleaved. so PU0 might be on socket 0 but PU1 is on socket 1 and then PU2 is back on socket 0. So a reasonable CPU set might be PU{0,2,4,6} on a dual socket quad core system. And that might then reasonably translate to NUMA node 0.
    
    `BIND` means used the specified node. So when one CPU set falls into one NUMA node (the likely and preferred case) then we tell it to just use that one node.
    
    Also, I will be happy to comment this up a bit more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77443787
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    --- End diff --
    
    If I understand this correctly, we swizzle the memory allocation policy for the thread so that the subsequent allocations for the stacks will be interleaved across nodes.
    
    Does this depend on the mechanics of how ``malloc(3)`` chooses to allocate?
    
    Why do we have to figure this out for each thread? Can we just do it once then allocate all the stacks?
    
    Why do we even bother with ``HWLOC_MEMBIND_BIND``? If there's only 1 node, it seems like it would have to bind memory to that node?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77443720
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    --- End diff --
    
    Commenting this would be really helpful. I'd also recommend breaking it into helper functions so that it can be understood without necessarily having the ``hwloc`` man page in hand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by PSUdaemon <gi...@git.apache.org>.
Github user PSUdaemon commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77461688
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    +
    +      if (ats_hugepage_enabled()) {
    +        stack = ats_alloc_hugepage(stacksize);
    --- End diff --
    
    What do you mean by "smaller than a huge page?" Further up I align the stack size to the chosen page alignment. So it should be able to use the entire huge page we give it.
    
    The reason is the default 1MB stack size would use 256 4k pages. So 1 huge page seems better, and it's likely what the user wants if they have chose to enable huge pages in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by PSUdaemon <gi...@git.apache.org>.
Github user PSUdaemon commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77467242
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    --- End diff --
    
    One thing to clarify, that I think this may have been confusing on. We don't set the memory binding for the threads that we are going to spawn for the eventprocessor, we are just setting it for the current thread for the time we need to allocate a new stack. We leave the newly spawned threads to have the default memory binding, which is basically to allocate where they run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    @SolidWallOfCode for a condition variable approach, there is the ``EventNotify`` class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77443808
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    --- End diff --
    
    Can ``obj_count`` ever be 0? In that case we would not allocate a stack?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by SolidWallOfCode <gi...@git.apache.org>.
Github user SolidWallOfCode commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    Yeah, we should look at using a condition variable instead of the ``sleep`` at some  point but I don't see that making any significant difference for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by PSUdaemon <gi...@git.apache.org>.
Github user PSUdaemon closed the pull request at:

    https://github.com/apache/trafficserver/pull/956


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    FreeBSD build *successful*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/680/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    Linux build *successful*! See https://ci.trafficserver.apache.org/job/Github-Linux/590/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77443795
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    +
    +      if (ats_hugepage_enabled()) {
    +        stack = ats_alloc_hugepage(stacksize);
    --- End diff --
    
    Why is is helpful to use huge pages for stacks? Especially if the stack is smaller than a huge page?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    @PSUdaemon this is much cleaner \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    Linux build *successful*! See https://ci.trafficserver.apache.org/job/Github-Linux/562/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    Linux build *successful*! See https://ci.trafficserver.apache.org/job/Github-Linux/574/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by zwoop <gi...@git.apache.org>.
Github user zwoop commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    A cookie for @PSUdaemon  for properly setting all Github fields!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    Linux build *successful*! See https://ci.trafficserver.apache.org/job/Github-Linux/576/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    FreeBSD build *successful*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/694/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77443664
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, HWLOC_MEMBIND_THREAD);
    +      }
    +
    +      if (ats_hugepage_enabled()) {
    +        stack = ats_alloc_hugepage(stacksize);
    +      }
    +
    +      if (stack == NULL) {
    +        stack = ats_memalign(ats_pagesize(), stacksize);
    +      }
    --- End diff --
    
    Break the actual stack allocation into a helper function. Add a ``Debug`` log to show the actual stack size allocation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    FreeBSD build *successful*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/678/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by zwoop <gi...@git.apache.org>.
Github user zwoop commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    I'm ok with this, the sleep(1) in the main thread looks clunky as we discussed, but whatevs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    FreeBSD build *successful*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/667/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by PSUdaemon <gi...@git.apache.org>.
Github user PSUdaemon commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77461718
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    --- End diff --
    
    It can be, and it's basically an error case from HWLOC in that it couldn't figure out your topology. If you pass a `NULL`stack it just allocates the way it did before, which is to leave it up to the OS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #956: TS-4806: Fix up event processor thread stac...

Posted by gtenev <gi...@git.apache.org>.
Github user gtenev commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r78275728
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -129,34 +197,58 @@ EventProcessor::start(int n_event_threads, size_t stacksize)
         obj_name = (char *)"Machine";
       }
     
    +  // How many of the above `obj_type` do we have in our topology?
       obj_count = hwloc_get_nbobjs_by_type(ink_get_topology(), obj_type);
       Debug("iocore_thread", "Affinity: %d %ss: %d PU: %d", affinity, obj_name, obj_count, ink_number_of_processors());
     
     #endif
       for (i = 0; i < n_ethreads; i++) {
         ink_thread tid;
    -    if (i > 0) {
    -      snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    -      tid = all_ethreads[i]->start(thr_name, stacksize);
    -    } else {
    -      tid = ink_thread_self();
    -    }
    +
     #if TS_USE_HWLOC
         if (obj_count > 0) {
    +      // Get our `obj` instance with index based on the thread number we are on.
           obj = hwloc_get_obj_by_type(ink_get_topology(), obj_type, i % obj_count);
    --- End diff --
    
    We could defensively check/handle ``NULL`` to "protect" ``obj->cpuset`` later.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #956: TS-4806: Fix up event processor thread stacks

Posted by gtenev <gi...@git.apache.org>.
Github user gtenev commented on the issue:

    https://github.com/apache/trafficserver/pull/956
  
    @PSUdaemon, the patch looks good, +1 on fixing the sleep(1).
    
    Built and run it in production with ``exec_thread.affinity: 1`` and it run fine. 
    
    Here some of the things I checked.
    
    ``numastat`` output looked pretty much the same like before the patch was applied
    
    ```
    $ sudo numastat -p $(pgrep -f traffic_server)
    
    Per-node process memory usage (in MBs) for PID 31075 ([TS_MAIN])
                               Node 0          Node 1           Total
                      --------------- --------------- ---------------
    Huge                         0.00            0.00            0.00
    Heap                         0.00            0.00            0.00
    Stack                        1.52            0.94            2.46
    Private                 118751.97       119729.43       238481.40
    ----------------  --------------- --------------- ---------------
    Total                   118753.49       119730.37       238483.86
    ```
    
    "Stop using the main thread as ET_NET 0 ..." (from the Jira)
    Here is a new thread showing now: TS_MAIN
    
    ```
    $ ps -e -T -o 'pid,ucmd'|grep $(pgrep -f traffic_server)|cut -d" " -f2|sort |uniq -c
          5 [ACCEPT
         24 [ET_AIO
         24 [ET_NET
          1 [ET_OCSP
          2 [ET_TASK
          1 [LOG_FLUSH]
          1 [LOG_PREPROC
          2 traffic_server
          1 [TS_MAIN]
    ```
    
    ``traffic_server`` ``ET_NET`` threads are distributed evenly over the 2 NUMA nodes on the machine where I tested (running on nodesets and bound to the corresponding cpusets as expected)
    
    ```
    $  sudo lstopo --top --no-io -.xml|grep traffic_server|awk '{match($12, /(.*)"/, a); printf("%s %s ", $6, $9); system("ps -e -T -o pid,spid,ucmd|grep " a[1]);}'
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31101 [ET_NET 22]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31099 [ET_NET 20]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31097 [ET_NET 18]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31095 [ET_NET 16]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31093 [ET_NET 14]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31091 [ET_NET 12]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31089 [ET_NET 10]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31087 [ET_NET 8]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31085 [ET_NET 6]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31083 [ET_NET 4]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31081 [ET_NET 2]
    allowed_cpuset="0x3ff003ff" allowed_nodeset="0x00000001" 31075 31079 [ET_NET 0]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31102 [ET_NET 23]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31100 [ET_NET 21]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31098 [ET_NET 19]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31096 [ET_NET 17]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31094 [ET_NET 15]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31092 [ET_NET 13]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31090 [ET_NET 11]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31088 [ET_NET 9]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31086 [ET_NET 7]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31084 [ET_NET 5]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31082 [ET_NET 3]
    allowed_cpuset="0x000000ff,0xc00ffc00" allowed_nodeset="0x00000002" 31075 31080 [ET_NET 1]
    ```
    
    
    The NUMA policy is ``preferred=node0`` and ``preferred=node1`` for the corresponding stack segments.
    
    ```
    $ for pid in `ps -e -T -o 'spid,ucmd'|grep ET_NET |cut -d" " -f1 `; do sudo grep stack /proc/${pid}/numa_maps; done| awk '{match($3, /.*:(.*)/, a); printf("%s ", $2); system("ps -e -T -o pid,spid,ucmd|grep " a[1])}' |sort -u
    . . .
    prefer:0 31075 31079 [ET_NET 0]
    prefer:0 31075 31080 [ET_NET 1]
    prefer:0 31075 31081 [ET_NET 2]
    prefer:0 31075 31082 [ET_NET 3]
    . . .
    prefer:0 31075 31100 [ET_NET 21]
    prefer:0 31075 31101 [ET_NET 22]
    prefer:0 31075 31102 [ET_NET 23]
    . . .
    prefer:1 31075 31079 [ET_NET 0]
    prefer:1 31075 31080 [ET_NET 1]
    prefer:1 31075 31081 [ET_NET 2]
    prefer:1 31075 31082 [ET_NET 3]
    . . .
    prefer:1 31075 31100 [ET_NET 21]
    prefer:1 31075 31101 [ET_NET 22]
    prefer:1 31075 31102 [ET_NET 23]
    ```
    
    Checked a few ``ET_NET`` stack segment sizes and they look OK 
    (as configured by ``proxy.config.thread.default.stacksize: 1048576``) 
    
    ```
    $ for pid in `ps -e -T -o 'spid,ucmd'|grep ET_NET |cut -d" " -f1 `; do sudo grep stack /proc/${pid}/maps; done| awk '{print $1}' |head
    2aaaaaf66000-2aaaab066000
    2aaaab76b000-2aaaab86b000
    2aaab0c01000-2aaab0e01000
    2aaab0e02000-2aaab0f02000
    2aaab160f000-2aaab170f000
    2aaab4001000-2aaab4101000
    2aaab4102000-2aaab4202000
    2aaab4203000-2aaab4303000
    2aaab4801000-2aaab4901000
    2aaab4902000-2aaab4a02000
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---