You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@trafficserver.apache.org by "B Wyatt (JIRA)" <ji...@apache.org> on 2011/09/07 22:07:10 UTC

[jira] [Created] (TS-947) AIO Race condition on non NT systems

AIO Race condition on non NT systems
------------------------------------

                 Key: TS-947
                 URL: https://issues.apache.org/jira/browse/TS-947
             Project: Traffic Server
          Issue Type: Bug
          Components: Core
         Environment: stock build with static libts, running on a 4 core server
            Reporter: B Wyatt


Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.

In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 

{code:title=iocore/aio/AIO.cc(annotated)}
void *
aio_thread_main(void *arg)
{
  ...
  ink_mutex_acquire(&my_aio_req->aio_mutex);
  for (;;) {
    do {
      current_req = my_aio_req;
      /* check if any pending requests on the atomic list */
A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
        aio_move(my_aio_req);
      if (!(op = my_aio_req->aio_todo.pop()) && !(op =
my_aio_req->http_aio_todo.pop()))
B>>>    break;
      ...
      <<service request>>
      ...
    } while (1);
C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
  }
  ...
}

static void
aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
{
  ...
  if (!ink_mutex_try_acquire(&req->aio_mutex)) {
D>>>ink_atomiclist_push(&req->aio_temp_list, op);
  } else {
    /* check if any pending requests on the atomic list */
    if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
      aio_move(req);
    /* now put the new request */
    aio_insert(op, req);
    ink_cond_signal(&req->aio_cond);
    ink_mutex_release(&req->aio_mutex);
  }
  ...
}
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (TS-947) AIO Race condition on non NT systems

Posted by "Alan M. Carroll (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan M. Carroll reassigned TS-947:
----------------------------------

    Assignee: Alan M. Carroll
    
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: Alan M. Carroll
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-947) AIO Race condition on non NT systems

Posted by "B Wyatt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099273#comment-13099273 ] 

B Wyatt commented on TS-947:
----------------------------

I am testing a patch that removes the use of the atomic temp_list entirely.  Instead it relies on acquiring the aio_mutex for every request enqueue.  This may be sub-optimal, but it is preferable to occasional soft-locking on initialization or request delays.

I have attached the patch.

> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-947) AIO Race condition on non NT systems

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-947:
-----------------------------

    Fix Version/s:     (was: 3.1.2)
                   sometime

Moving this out for later, we might want John Plevyak to look at this, and see if this needs a more thorough redesign.
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: sometime
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-947) AIO Race condition on non NT systems

Posted by "Alan M. Carroll (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131804#comment-13131804 ] 

Alan M. Carroll commented on TS-947:
------------------------------------

[Mailing list discussion|http://mail-archives.apache.org/mod_mbox/trafficserver-dev/201109.mbox/%3CCAMkwMBEEU8UQa+ymaQVSoqGy0hAJjbuevdEx17eQjBOG9v7+YA@mail.gmail.com%3E]

Key quote by John Pleyvak

{quote}So the situation is that the existing design dates from a time when
threads and scheduling for Unix were primitive.  It was not uncommon in
those
days for threads to "go away" for half a second, a second or more in a
loaded system.

To deal with this, the current design is non-blocking all the way down.
This means that no code (correctly written) ever blocks on a mutex (we use try_lock) which
is the origin of that AIO code.

Now, threading and scheduling has dramatically improved and the problems
that motivated
that design decision have (most probably :) gone away.

It would be dramatically easier to do away with this requirement and write
the code with
small critical sections and just block if we miss a mutex.  First, that is
the way most
programs are written these days and second, having to back out of the entire
stack, saving
context is a huge burden.  It makes the code hard to read, write and debug.

The upshot is that my suggestion to just add a 10 msec timeout is for the
short term.
I'd like to see a blocking, small critical section AIO/Cache/RamCache patch
performance
tested against the existing code *then* pull the trigger on such a new
design and fix the
offending code (including the AIO race).

I plan to work on such a patch and would be very interested in talking to
anyone who might be interested in such a thing as well.
{quote}
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-947) AIO Race condition on non NT systems

Posted by "Alan M. Carroll (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131692#comment-13131692 ] 

Alan M. Carroll commented on TS-947:
------------------------------------

The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvack has volunteered to take lead on that although he welcomes contributions from others.
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: Alan M. Carroll
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-947) AIO Race condition on non NT systems

Posted by "mohan_zl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107719#comment-13107719 ] 

mohan_zl commented on TS-947:
-----------------------------

One question:
The variable sleep_wait in struct AIOThreadInfo is not used now, why not nuke it away or use it for Jplevyak's patch, looks like:

timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(thr_info->sleep_wait));

> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>             Fix For: 3.1.1
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-947) AIO Race condition on non NT systems

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-947:
-----------------------------

    Fix Version/s: 3.1.1

> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>             Fix For: 3.1.1
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-947) AIO Race condition on non NT systems

Posted by "B Wyatt (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

B Wyatt updated TS-947:
-----------------------

    Attachment: lock-safe-AIO.patch

> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (TS-947) AIO Race condition on non NT systems

Posted by "Alan M. Carroll (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan M. Carroll reassigned TS-947:
----------------------------------

    Assignee: John Plevyak  (was: Alan M. Carroll)
    
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (TS-947) AIO Race condition on non NT systems

Posted by "Alan M. Carroll (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131692#comment-13131692 ] 

Alan M. Carroll edited comment on TS-947 at 10/20/11 3:34 PM:
--------------------------------------------------------------

The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvak has volunteered to take lead on that although he welcomes contributions from others.
                
      was (Author: amc):
    The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvack has volunteered to take lead on that although he welcomes contributions from others.
                  
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-947) AIO Race condition on non NT systems

Posted by "B Wyatt (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

B Wyatt updated TS-947:
-----------------------

    Attachment: timed-wait-AIO.patch

Added JPlevyak's timed wait solution to the ticket, it is the current solution in the source tree. 
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: sometime
>
>         Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-947) AIO Race condition on non NT systems

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-947:
-----------------------------

    Fix Version/s:     (was: 3.1.1)
                   3.1.2

Moving all unassigned bugs out to 3.1.2
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>             Fix For: 3.1.2
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira