You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "B Wyatt (JIRA)" <ji...@apache.org> on 2011/09/07 22:07:10 UTC
[jira] [Created] (TS-947) AIO Race condition on non NT systems
AIO Race condition on non NT systems
------------------------------------
Key: TS-947
URL: https://issues.apache.org/jira/browse/TS-947
Project: Traffic Server
Issue Type: Bug
Components: Core
Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
{code:title=iocore/aio/AIO.cc(annotated)}
void *
aio_thread_main(void *arg)
{
...
ink_mutex_acquire(&my_aio_req->aio_mutex);
for (;;) {
do {
current_req = my_aio_req;
/* check if any pending requests on the atomic list */
A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
aio_move(my_aio_req);
if (!(op = my_aio_req->aio_todo.pop()) && !(op =
my_aio_req->http_aio_todo.pop()))
B>>> break;
...
<<service request>>
...
} while (1);
C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
}
...
}
static void
aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
{
...
if (!ink_mutex_try_acquire(&req->aio_mutex)) {
D>>>ink_atomiclist_push(&req->aio_temp_list, op);
} else {
/* check if any pending requests on the atomic list */
if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
aio_move(req);
/* now put the new request */
aio_insert(op, req);
ink_cond_signal(&req->aio_cond);
ink_mutex_release(&req->aio_mutex);
}
...
}
{code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-947) AIO Race condition on non NT systems
Posted by "Alan M. Carroll (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan M. Carroll reassigned TS-947:
----------------------------------
Assignee: Alan M. Carroll
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: Alan M. Carroll
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-947) AIO Race condition on non NT systems
Posted by "B Wyatt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099273#comment-13099273 ]
B Wyatt commented on TS-947:
----------------------------
I am testing a patch that removes the use of the atomic temp_list entirely. Instead it relies on acquiring the aio_mutex for every request enqueue. This may be sub-optimal, but it is preferable to occasional soft-locking on initialization or request delays.
I have attached the patch.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-947) AIO Race condition on non NT systems
Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leif Hedstrom updated TS-947:
-----------------------------
Fix Version/s: (was: 3.1.2)
sometime
Moving this out for later, we might want John Plevyak to look at this, and see if this needs a more thorough redesign.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: sometime
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-947) AIO Race condition on non NT systems
Posted by "Alan M. Carroll (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131804#comment-13131804 ]
Alan M. Carroll commented on TS-947:
------------------------------------
[Mailing list discussion|http://mail-archives.apache.org/mod_mbox/trafficserver-dev/201109.mbox/%3CCAMkwMBEEU8UQa+ymaQVSoqGy0hAJjbuevdEx17eQjBOG9v7+YA@mail.gmail.com%3E]
Key quote by John Pleyvak
{quote}So the situation is that the existing design dates from a time when
threads and scheduling for Unix were primitive. It was not uncommon in
those
days for threads to "go away" for half a second, a second or more in a
loaded system.
To deal with this, the current design is non-blocking all the way down.
This means that no code (correctly written) ever blocks on a mutex (we use try_lock) which
is the origin of that AIO code.
Now, threading and scheduling has dramatically improved and the problems
that motivated
that design decision have (most probably :) gone away.
It would be dramatically easier to do away with this requirement and write
the code with
small critical sections and just block if we miss a mutex. First, that is
the way most
programs are written these days and second, having to back out of the entire
stack, saving
context is a huge burden. It makes the code hard to read, write and debug.
The upshot is that my suggestion to just add a 10 msec timeout is for the
short term.
I'd like to see a blocking, small critical section AIO/Cache/RamCache patch
performance
tested against the existing code *then* pull the trigger on such a new
design and fix the
offending code (including the AIO race).
I plan to work on such a patch and would be very interested in talking to
anyone who might be interested in such a thing as well.
{quote}
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-947) AIO Race condition on non NT systems
Posted by "Alan M. Carroll (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131692#comment-13131692 ]
Alan M. Carroll commented on TS-947:
------------------------------------
The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvack has volunteered to take lead on that although he welcomes contributions from others.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: Alan M. Carroll
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-947) AIO Race condition on non NT systems
Posted by "mohan_zl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107719#comment-13107719 ]
mohan_zl commented on TS-947:
-----------------------------
One question:
The variable sleep_wait in struct AIOThreadInfo is not used now, why not nuke it away or use it for Jplevyak's patch, looks like:
timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(thr_info->sleep_wait));
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Fix For: 3.1.1
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-947) AIO Race condition on non NT systems
Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leif Hedstrom updated TS-947:
-----------------------------
Fix Version/s: 3.1.1
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Fix For: 3.1.1
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-947) AIO Race condition on non NT systems
Posted by "B Wyatt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
B Wyatt updated TS-947:
-----------------------
Attachment: lock-safe-AIO.patch
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-947) AIO Race condition on non NT systems
Posted by "Alan M. Carroll (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan M. Carroll reassigned TS-947:
----------------------------------
Assignee: John Plevyak (was: Alan M. Carroll)
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TS-947) AIO Race condition on non NT
systems
Posted by "Alan M. Carroll (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131692#comment-13131692 ]
Alan M. Carroll edited comment on TS-947 at 10/20/11 3:34 PM:
--------------------------------------------------------------
The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvak has volunteered to take lead on that although he welcomes contributions from others.
was (Author: amc):
The timed wait fix was put on trunk, which ameliorates this problem. However, the mailing list consensus was that this is a hack and the queuing logic should be fixed to be correct. John Pleyvack has volunteered to take lead on that although he welcomes contributions from others.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-947) AIO Race condition on non NT systems
Posted by "B Wyatt (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
B Wyatt updated TS-947:
-----------------------
Attachment: timed-wait-AIO.patch
Added JPlevyak's timed wait solution to the ticket, it is the current solution in the source tree.
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: sometime
>
> Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-947) AIO Race condition on non NT systems
Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leif Hedstrom updated TS-947:
-----------------------------
Fix Version/s: (was: 3.1.1)
3.1.2
Moving all unassigned bugs out to 3.1.2
> AIO Race condition on non NT systems
> ------------------------------------
>
> Key: TS-947
> URL: https://issues.apache.org/jira/browse/TS-947
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Environment: stock build with static libts, running on a 4 core server
> Reporter: B Wyatt
> Fix For: 3.1.2
>
> Attachments: lock-safe-AIO.patch
>
>
> Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list.
> In the case of cache volume initialization, there is no "future request" and the initialization sequence soft locks.
> {code:title=iocore/aio/AIO.cc(annotated)}
> void *
> aio_thread_main(void *arg)
> {
> ...
> ink_mutex_acquire(&my_aio_req->aio_mutex);
> for (;;) {
> do {
> current_req = my_aio_req;
> /* check if any pending requests on the atomic list */
> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> aio_move(my_aio_req);
> if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> my_aio_req->http_aio_todo.pop()))
> B>>> break;
> ...
> <<service request>>
> ...
> } while (1);
> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> }
> ...
> }
> static void
> aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
> {
> ...
> if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> ...
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira