You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Alan M. Carroll (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/10/20 17:35:12 UTC
[jira] [Issue Comment Edited] (TS-947) AIO Race condition on non NT
systems
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131692#comment-13131692 ]
Alan M. Carroll edited comment on TS-947 at 10/20/11 3:34 PM:
--------------------------------------------------------------
The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvak has volunteered to take lead on that although he welcomes contributions from others.
was (Author: amc):
The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvack has volunteered to take lead on that although he welcomes contributions from others.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira