You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Travis CI <bu...@travis-ci.org> on 2020/08/29 20:43:14 UTC

Broken: apache/httpd#1091 (trunk - 74d32d7)

Build Update for apache/httpd
-------------------------------------

Build: #1091
Status: Broken

Duration: 12 mins and 2 secs
Commit: 74d32d7 (trunk)
Author: Christophe Jaillet
Message: Fix a few warnings on 64 bits windows compilation

git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 13f79535-47bb-0310-9956-ffa450edef68

View the changeset: https://github.com/apache/httpd/compare/e602f55d734a...74d32d7bafd3

View the full build log and details: https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&utm_source=email


--

You can unsubscribe from build emails from the apache/httpd repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=69847&utm_medium=notification&utm_source=email.
Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email.
Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications.


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 9/2/20 4:51 PM, Joe Orton wrote:
> On Wed, Sep 02, 2020 at 12:36:26PM +0200, Ruediger Pluem wrote:
>> will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?
> 
> I think we get very little ordering constraints without memory 
> synchronisation, regardless of the order of the C statements another 
> thread can see a different/inconsistent memory state, but not really 
> sure.

You mean due to CPU caches and because some of the values might be stored in registers? Would volatile for these vars help?

> 
> I tried reproducing locally and can trigger crashes in these code paths 
> quite easily with running ab against pool-debug/event build with 
> memcache or shmcb configured as session cache.
> 
> In this case one thread is walking the pool heirarchy and trying to lock 
> a pool mutex which has already been deleted, which triggers a libc 
> assert() instead.  The mutex is in ptrans, I assume from another event 
> thread.

It looks like there is some mutex protection in apr_pool_clear_debug, apr_pool_destroy_debug and
comments that point to apr_pool_walk_tree (e.g.
https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1916). Maybe the issue is that
pool_destroy_debug doesn't do a locking before calling pool_clear_debug.
Can you provide a gdb print for the pool parameters passed in frame 15 till 8 to apr_pool_walk_tree?
I am not sure how pool became zero in the mutex struct you dumped as I see no such action in the mutex cleanup.

> 
> If we see this regularly I'd conclude the apr_pool_find() stuff is 
> simply unsafe and should be turned off.

As the crashes seem to be caused by an APR problem shall we continue this discussion on dev@apr or should we work this out in more
detail here?

> 
> (gdb) where
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f842445f895 in __GI_abort () at abort.c:79
> #2  0x00007f842445f769 in __assert_fail_base (fmt=0x7f84245cd0b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
>     assertion=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", 
>     file=0x7f8424652104 "tpp.c", line=82, function=<optimized out>) at assert.c:92
> #3  0x00007f842446ee76 in __GI___assert_fail (
>     assertion=assertion@entry=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", 
>     file=file@entry=0x7f8424652104 "tpp.c", line=line@entry=82, 
>     function=function@entry=0x7f84246521c0 <__PRETTY_FUNCTION__.0> "__pthread_tpp_change_priority") at assert.c:101
> #4  0x00007f842464f849 in __pthread_tpp_change_priority (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=2088)
>     at tpp.c:82
> #5  0x00007f84246458a0 in __pthread_mutex_lock_full (mutex=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:545
> #6  0x00007f8424645e05 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:73
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
> #8  0x00007f84246f437e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x7f839000d680)
>     at memory/unix/apr_pools.c:1505
> #9  apr_pool_walk_tree (pool=0x7f839000d680, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #10 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x1482e30)
>     at memory/unix/apr_pools.c:1511
> #11 apr_pool_walk_tree (pool=0x1482e30, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #12 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14828b0)
>     at memory/unix/apr_pools.c:1511
> #13 apr_pool_walk_tree (pool=0x14828b0, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #14 0x00007f84246f54d8 in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0)
>     at memory/unix/apr_pools.c:1511
> #15 apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0) at memory/unix/apr_pools.c:1492
> #16 apr_pool_find (mem=mem@entry=0x7f83a4023f06) at memory/unix/apr_pools.c:2291
> #17 0x00007f84246e72cd in apr_table_addn (t=0x7f83a4023ab0, key=key@entry=0x7f83a4023f00 "Host", 
>     val=val@entry=0x7f83a4023f06 "localhost:8532") at tables/apr_tables.c:823
> #18 0x0000000000455bb4 in ap_get_mime_headers_core (r=r@entry=0x7f83a4002370, bb=bb@entry=0x7f83a40230a0) at protocol.c:1353
> #19 0x0000000000458a3f in ap_read_request (conn=conn@entry=0x7f83a401b340) at protocol.c:1513
> #20 0x0000000000434f6f in ap_process_http_async_connection (c=0x7f83a401b340) at http_core.c:148
> #21 ap_process_http_connection (c=0x7f83a401b340) at http_core.c:252
> #22 0x000000000044cb00 in ap_run_process_connection (c=c@entry=0x7f83a401b340) at connection.c:42
> #23 0x00007f842488c8f1 in process_socket (thd=thd@entry=0x7f8408004920, p=<optimized out>, sock=0x7f8390001480, 
>     cs=0x7f83a4015ef0, my_child_num=my_child_num@entry=3, my_thread_num=my_thread_num@entry=23) at event.c:1106
> #24 0x00007f842488cfd8 in worker_thread (thd=0x7f8408004920, dummy=<optimized out>) at event.c:2396
> #25 0x00007f84246ff57e in dummy_worker (opaque=0x7f8408004920) at threadproc/unix/thread.c:145
> #26 0x00007f8424643432 in start_thread (arg=<optimized out>) at pthread_create.c:477
> #27 0x00007f842453b913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) up 7
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
> 127	        rv = pthread_mutex_lock(&mutex->mutex);
> (gdb) print *mutex
> $1 = {pool = 0x0, mutex = {__data = {__lock = 1094795585, __count = 1094795585, __owner = 1094795585, __nusers = 1094795585, 
>       __kind = 1094795585, __spins = 16705, __elision = 16705, __list = {__prev = 0x4141414141414141, 
>         __next = 0x4141414141414141}}, __size = 'A' <repeats 40 times>, __align = 4702111234474983745}, 
>   cond = 0x4141414141414141, locked = 1094795585, num_waiters = 1094795585}
> 
> 

Regards

Rüdiger


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Joe Orton <jo...@redhat.com>.
On Wed, Sep 02, 2020 at 12:36:26PM +0200, Ruediger Pluem wrote:
> will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?

I think we get very little ordering constraints without memory 
synchronisation, regardless of the order of the C statements another 
thread can see a different/inconsistent memory state, but not really 
sure.

I tried reproducing locally and can trigger crashes in these code paths 
quite easily with running ab against pool-debug/event build with 
memcache or shmcb configured as session cache.

In this case one thread is walking the pool heirarchy and trying to lock 
a pool mutex which has already been deleted, which triggers a libc 
assert() instead.  The mutex is in ptrans, I assume from another event 
thread.

If we see this regularly I'd conclude the apr_pool_find() stuff is 
simply unsafe and should be turned off.

(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f842445f895 in __GI_abort () at abort.c:79
#2  0x00007f842445f769 in __assert_fail_base (fmt=0x7f84245cd0b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", 
    file=0x7f8424652104 "tpp.c", line=82, function=<optimized out>) at assert.c:92
#3  0x00007f842446ee76 in __GI___assert_fail (
    assertion=assertion@entry=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)", 
    file=file@entry=0x7f8424652104 "tpp.c", line=line@entry=82, 
    function=function@entry=0x7f84246521c0 <__PRETTY_FUNCTION__.0> "__pthread_tpp_change_priority") at assert.c:101
#4  0x00007f842464f849 in __pthread_tpp_change_priority (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=2088)
    at tpp.c:82
#5  0x00007f84246458a0 in __pthread_mutex_lock_full (mutex=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:545
#6  0x00007f8424645e05 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:73
#7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
#8  0x00007f84246f437e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x7f839000d680)
    at memory/unix/apr_pools.c:1505
#9  apr_pool_walk_tree (pool=0x7f839000d680, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#10 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x1482e30)
    at memory/unix/apr_pools.c:1511
#11 apr_pool_walk_tree (pool=0x1482e30, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#12 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14828b0)
    at memory/unix/apr_pools.c:1511
#13 apr_pool_walk_tree (pool=0x14828b0, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#14 0x00007f84246f54d8 in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0)
    at memory/unix/apr_pools.c:1511
#15 apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0) at memory/unix/apr_pools.c:1492
#16 apr_pool_find (mem=mem@entry=0x7f83a4023f06) at memory/unix/apr_pools.c:2291
#17 0x00007f84246e72cd in apr_table_addn (t=0x7f83a4023ab0, key=key@entry=0x7f83a4023f00 "Host", 
    val=val@entry=0x7f83a4023f06 "localhost:8532") at tables/apr_tables.c:823
#18 0x0000000000455bb4 in ap_get_mime_headers_core (r=r@entry=0x7f83a4002370, bb=bb@entry=0x7f83a40230a0) at protocol.c:1353
#19 0x0000000000458a3f in ap_read_request (conn=conn@entry=0x7f83a401b340) at protocol.c:1513
#20 0x0000000000434f6f in ap_process_http_async_connection (c=0x7f83a401b340) at http_core.c:148
#21 ap_process_http_connection (c=0x7f83a401b340) at http_core.c:252
#22 0x000000000044cb00 in ap_run_process_connection (c=c@entry=0x7f83a401b340) at connection.c:42
#23 0x00007f842488c8f1 in process_socket (thd=thd@entry=0x7f8408004920, p=<optimized out>, sock=0x7f8390001480, 
    cs=0x7f83a4015ef0, my_child_num=my_child_num@entry=3, my_thread_num=my_thread_num@entry=23) at event.c:1106
#24 0x00007f842488cfd8 in worker_thread (thd=0x7f8408004920, dummy=<optimized out>) at event.c:2396
#25 0x00007f84246ff57e in dummy_worker (opaque=0x7f8408004920) at threadproc/unix/thread.c:145
#26 0x00007f8424643432 in start_thread (arg=<optimized out>) at pthread_create.c:477
#27 0x00007f842453b913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) up 7
#7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
127	        rv = pthread_mutex_lock(&mutex->mutex);
(gdb) print *mutex
$1 = {pool = 0x0, mutex = {__data = {__lock = 1094795585, __count = 1094795585, __owner = 1094795585, __nusers = 1094795585, 
      __kind = 1094795585, __spins = 16705, __elision = 16705, __list = {__prev = 0x4141414141414141, 
        __next = 0x4141414141414141}}, __size = 'A' <repeats 40 times>, __align = 4702111234474983745}, 
  cond = 0x4141414141414141, locked = 1094795585, num_waiters = 1094795585}


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 9/2/20 10:47 AM, Joe Orton wrote:
> On Wed, Sep 02, 2020 at 10:11:39AM +0200, Ruediger Pluem wrote:
>>
>>
>> On 9/2/20 9:10 AM, Joe Orton wrote:
>>> On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
>>>> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
>>>> If yes, I agree that this is the case and that I missed this and hence my point is mood.
>>>
>>> I think that's right.  I don't want to disappear too far down the rabbit 
>>> hole for this, given that the *only* symptom we are seeing is the APR 
>>> pool check failing.
>>>
>>> Is there any way in which APR is going to guarantee that the internal 
>>> pool state is always consistent across threads if one thread is 
>>> simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another 
>>> calls apr_pool_find()?  I can't see it.
>>
>> I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
>> but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
>> have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
>> I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
>> allocated by a pool.
> 
> In this case:
> 
> https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1798
> 
>     node->beginp[node->index] = mem;
>     node->endp[node->index] = (char *)mem + size;
>     node->index++;
> 
> without any memory ordering/consistency guarantee, is it not possible 
> for another thread to see
> 
>     node->endp[node->index] = (char *)mem + size;
>     node->index++;
> 
> without the ->beginp update, and hence the node would appear to 
> "contain" memory in the range [0, endp) since the node struct is 
> zero-initialized.

I think this can only happen for fresh nodes as

node->index++;

happens after

node->beginp[node->index] = mem;
node->endp[node->index] = (char *)mem + size;

and even in this case I think

&& node->endp[index] > *pmem) {

(https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l2275)

will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?



Regards

Rüdiger


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Joe Orton <jo...@redhat.com>.
On Wed, Sep 02, 2020 at 10:11:39AM +0200, Ruediger Pluem wrote:
> 
> 
> On 9/2/20 9:10 AM, Joe Orton wrote:
> > On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
> >> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
> >> If yes, I agree that this is the case and that I missed this and hence my point is mood.
> > 
> > I think that's right.  I don't want to disappear too far down the rabbit 
> > hole for this, given that the *only* symptom we are seeing is the APR 
> > pool check failing.
> > 
> > Is there any way in which APR is going to guarantee that the internal 
> > pool state is always consistent across threads if one thread is 
> > simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another 
> > calls apr_pool_find()?  I can't see it.
> 
> I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
> but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
> have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
> I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
> allocated by a pool.

In this case:

https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1798

    node->beginp[node->index] = mem;
    node->endp[node->index] = (char *)mem + size;
    node->index++;

without any memory ordering/consistency guarantee, is it not possible 
for another thread to see

    node->endp[node->index] = (char *)mem + size;
    node->index++;

without the ->beginp update, and hence the node would appear to 
"contain" memory in the range [0, endp) since the node struct is 
zero-initialized.

....or am I overthinking this?

We could add an "yes this is consistent" atomic int to the debug_node_t 
and make pool-debug mode *even slower* to avoid possible races!?


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 9/2/20 9:10 AM, Joe Orton wrote:
> On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
>> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
>> If yes, I agree that this is the case and that I missed this and hence my point is mood.
> 
> I think that's right.  I don't want to disappear too far down the rabbit 
> hole for this, given that the *only* symptom we are seeing is the APR 
> pool check failing.
> 
> Is there any way in which APR is going to guarantee that the internal 
> pool state is always consistent across threads if one thread is 
> simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another 
> calls apr_pool_find()?  I can't see it.

I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
allocated by a pool.

Regards

Rüdiger


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Joe Orton <jo...@redhat.com>.
On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
> If yes, I agree that this is the case and that I missed this and hence my point is mood.

I think that's right.  I don't want to disappear too far down the rabbit 
hole for this, given that the *only* symptom we are seeing is the APR 
pool check failing.

Is there any way in which APR is going to guarantee that the internal 
pool state is always consistent across threads if one thread is 
simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another 
calls apr_pool_find()?  I can't see it.

Regards, Joe


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 9/1/20 1:43 PM, Joe Orton wrote:
> On Tue, Sep 01, 2020 at 01:04:14PM +0200, Ruediger Pluem wrote:
>>
>>
>> On 9/1/20 10:18 AM, Joe Orton wrote:
>>> On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
>>>> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
>>>>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
>>>>>
>>>>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #1  <signal handler called>
>>>>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
>>>>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
>>>>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
>>>>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
>>>>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
>>>>>
>>>>
>>>> My only guess is some sort of pool corruption. The key is a literal 
>>>> and as such its memory address shouldn't be in the scope of any memory 
>>>> managed by a pool.
>>>
>>> I agree.  This is triggered by running TLSv1.2 tests with memcache 
>>> configured as the session cache (a relatively new test).  The memcache 
>>> socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o 
>>> locks held across multiple threads so that might be a good place to 
>>> investigate.
>>>
>>
>> A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
>> we should not have two threads working with the same connection at the same time.
>>
>> OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.
>>
>> I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
>> case) for the below:
>>
>>     balloc = apr_bucket_alloc_create((*conn)->tp);
>>     (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
>>     (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);
>>
>> I think that tp can be used by multiple threads at the same time.
> 
> The subpools here are specific to the apr_memcache_conn_t, access to 
> which is managed via apr_reslist_* in ms_find_conn() - so *should* be 
> thread-safe I think?  I think the subpool creation within 
> ms_conn_construct() is also safe but may be missing the bigger picture.

Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
If yes, I agree that this is the case and that I missed this and hence my point is mood.

Regards

Rüdiger

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Joe Orton <jo...@redhat.com>.
On Tue, Sep 01, 2020 at 01:04:14PM +0200, Ruediger Pluem wrote:
> 
> 
> On 9/1/20 10:18 AM, Joe Orton wrote:
> > On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
> >> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
> >>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
> >>>
> >>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #1  <signal handler called>
> >>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
> >>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
> >>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
> >>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
> >>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
> >>>
> >>
> >> My only guess is some sort of pool corruption. The key is a literal 
> >> and as such its memory address shouldn't be in the scope of any memory 
> >> managed by a pool.
> > 
> > I agree.  This is triggered by running TLSv1.2 tests with memcache 
> > configured as the session cache (a relatively new test).  The memcache 
> > socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o 
> > locks held across multiple threads so that might be a good place to 
> > investigate.
> > 
> 
> A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
> we should not have two threads working with the same connection at the same time.
> 
> OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.
> 
> I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
> case) for the below:
> 
>     balloc = apr_bucket_alloc_create((*conn)->tp);
>     (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
>     (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);
> 
> I think that tp can be used by multiple threads at the same time.

The subpools here are specific to the apr_memcache_conn_t, access to 
which is managed via apr_reslist_* in ms_find_conn() - so *should* be 
thread-safe I think?  I think the subpool creation within 
ms_conn_construct() is also safe but may be missing the bigger picture.

> BTW: How often does this test fail?

I don't remember seeing it fail before.  This test combination has only 
been running since r1879594 (July 7th).

We could try adding an ab run or something to see if it's easier to 
trigger, it'd be generally a good idea for session cache testing anyway.

Regards, Joe


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 9/1/20 10:18 AM, Joe Orton wrote:
> On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
>> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
>>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
>>>
>>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  <signal handler called>
>>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
>>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
>>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
>>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
>>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
>>>
>>
>> My only guess is some sort of pool corruption. The key is a literal 
>> and as such its memory address shouldn't be in the scope of any memory 
>> managed by a pool.
> 
> I agree.  This is triggered by running TLSv1.2 tests with memcache 
> configured as the session cache (a relatively new test).  The memcache 
> socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o 
> locks held across multiple threads so that might be a good place to 
> investigate.
> 

A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
we should not have two threads working with the same connection at the same time.

OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.

I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
case) for the below:

    balloc = apr_bucket_alloc_create((*conn)->tp);
    (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
    (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);


I think that tp can be used by multiple threads at the same time.


BTW: How often does this test fail?


Regards

Rüdiger


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Joe Orton <jo...@redhat.com>.
On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
> > Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
> > 
> > #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  <signal handler called>
> > #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
> > val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
> > #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
> > #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
> > #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
> > 
> 
> My only guess is some sort of pool corruption. The key is a literal 
> and as such its memory address shouldn't be in the scope of any memory 
> managed by a pool.

I agree.  This is triggered by running TLSv1.2 tests with memcache 
configured as the session cache (a relatively new test).  The memcache 
socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o 
locks held across multiple threads so that might be a good place to 
investigate.


Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Ruediger Pluem <rp...@apache.org>.

On 8/30/20 7:22 AM, Christophe JAILLET wrote:
> Le 29/08/2020 à 22:43, Travis CI a écrit :
>> apache
>>
>> /
>>
>> httpd
>>
>> <https://travis-ci.org/github/apache/httpd?utm_medium=notification&utm_source=email>
>>
>> branch icontrunk <https://github.com/apache/httpd/tree/trunk>
>>
>> build has failed
>> Build #1091 was broken <https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&utm_source=email>
>> arrow to build time
>> clock icon12 mins and 2 secs
>>
>> Christophe Jaillet avatarChristophe Jaillet
>>
>> 74d32d7 CHANGESET → <https://github.com/apache/httpd/compare/e602f55d734a...74d32d7bafd3>
>>
>> Fix a few warnings on 64 bits windows compilation
>>
>> git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 13f79535-47bb-0310-9956-ffa450edef68
>>
>> Want to know about upcoming build environment updates?
>>
>> Would you like to stay up-to-date with the upcoming Travis CI build environment updates? We set up a mailing list for you!
>>
>> SIGN UP HERE <http://eepurl.com/9OCsP>
>>
>> book icon
>>
>> Documentation <https://docs.travis-ci.com/> about Travis CI
>>
>> Have any questions? We're here to help. <ma...@travis-ci.com>
>> Unsubscribe <https://travis-ci.org/account/preferences/unsubscribe?repository=69847&utm_medium=notification&utm_source=email>
>> from build emails from the apache/httpd repository.
>> To unsubscribe from *all* build emails, please update your settings
>> <https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email>.
>> black and white travis ci logo <https://travis-ci.com>
>>
>> Travis CI GmbH, Rigaer Str. 8, 10427 Berlin, Germany | GF/CEO: Randy Jacops | Contact: contact@travis-ci.com
>> <ma...@travis-ci.com> | Amtsgericht Charlottenburg, Berlin, HRB 140133 B | Umsatzsteuer-ID gemäß §27 a
>> Umsatzsteuergesetz: DE282002648
>>
> 
> Hi,
> 
> spurious failure, unlikely related to the corresponding commit.
> Never seen that before.
> 
> Any idea?
> 
> CJ
> 
> 
> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
> 
> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
> #1  <signal handler called>
> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
> 

My only guess is some sort of pool corruption. The key is a literal and as such its memory address shouldn't be in the scope of
any memory managed by a pool.

Regards

Rüdiger



Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Posted by Christophe JAILLET <ch...@wanadoo.fr>.
Le 29/08/2020 à 22:43, Travis CI a écrit :
> apache
> 
> /
> 
> httpd
> 
> <https://travis-ci.org/github/apache/httpd?utm_medium=notification&utm_source=email> 
> 
> 
> branch icontrunk <https://github.com/apache/httpd/tree/trunk>
> 
> build has failed
> Build #1091 was broken 
> <https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&utm_source=email>
> arrow to build time
> clock icon12 mins and 2 secs
> 
> Christophe Jaillet avatarChristophe Jaillet
> 
> 74d32d7 CHANGESET → 
> <https://github.com/apache/httpd/compare/e602f55d734a...74d32d7bafd3>
> 
> Fix a few warnings on 64 bits windows compilation
> 
> git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 
> 13f79535-47bb-0310-9956-ffa450edef68
> 
> Want to know about upcoming build environment updates?
> 
> Would you like to stay up-to-date with the upcoming Travis CI build 
> environment updates? We set up a mailing list for you!
> 
> SIGN UP HERE <http://eepurl.com/9OCsP>
> 
> book icon
> 
> Documentation <https://docs.travis-ci.com/> about Travis CI
> 
> Have any questions? We're here to help. <ma...@travis-ci.com>
> Unsubscribe 
> <https://travis-ci.org/account/preferences/unsubscribe?repository=69847&utm_medium=notification&utm_source=email> 
> from build emails from the apache/httpd repository.
> To unsubscribe from *all* build emails, please update your settings 
> <https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email>. 
> 
> black and white travis ci logo <https://travis-ci.com>
> 
> Travis CI GmbH, Rigaer Str. 8, 10427 Berlin, Germany | GF/CEO: Randy 
> Jacops | Contact: contact@travis-ci.com <ma...@travis-ci.com> | 
> Amtsgericht Charlottenburg, Berlin, HRB 140133 B | Umsatzsteuer-ID gemäß 
> §27 a Umsatzsteuergesetz: DE282002648
> 

Hi,

spurious failure, unlikely related to the corresponding commit.
Never seen that before.

Any idea?

CJ


Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):

#0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  <signal handler called>
#2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, 
key=key@entry=0x560cb949006a "Connection", val=val@entry=0x560cb9488f86 
"close") at tables/apr_tables.c:750
#5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at 
http_protocol.c:309
#6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, 
b=0x560cba65db80) at http_filters.c:1369
#7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, 
b=0x560cba65db80) at protocol.c:2040