You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org> on 2011/12/05 22:32:42 UTC
[jira] [Updated] (TS-947) AIO Race condition on non NT systems

     [ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-947:
-----------------------------

    Fix Version/s:     (was: 3.1.2)
                   sometime

Moving this out for later, we might want John Plevyak to look at this, and see if this needs a more thorough redesign.
                
> AIO Race condition on non NT systems
> ------------------------------------
>
>                 Key: TS-947
>                 URL: https://issues.apache.org/jira/browse/TS-947
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: stock build with static libts, running on a 4 core server
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: sometime
>
>         Attachments: lock-safe-AIO.patch
>
>
> Refer to code below.  The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D).  As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks. 
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
>   ...
>   ink_mutex_acquire(&my_aio_req->aio_mutex);
>   for (;;) {
>     do {
>       current_req = my_aio_req;
>       /* check if any pending requests on the atomic list */
> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>         aio_move(my_aio_req);
>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>>    break;
>       ...
>       <<service request>>
>       ...
>     } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>   }
>   ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
>   ...
>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>   } else {
>     /* check if any pending requests on the atomic list */
>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>       aio_move(req);
>     /* now put the new request */
>     aio_insert(op, req);
>     ink_cond_signal(&req->aio_cond);
>     ink_mutex_release(&req->aio_mutex);
>   }
>   ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira