You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2012/11/01 13:04:01 UTC

Re: apr_memcache operation timeouts

I have completed a solution to this problem, which can be a drop-in update
for the existing apr_memcache.c.  It is now checked in for my module as
http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
.

It differs from the solution in
https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:

   - It doesn't require an API change; it but it enforces the 50ms timeout
   that already exists for apr_multgetp for all operations.
   - It works under my load test (which I found is not true of the patch in
   51065).

For my own purposes, I will be shipping my module with apr_memcache2 so I
get the behavior I want regardless of what version of Apache is installed.
 But I'd like to propose my patch for apr_memcache.c.  The patch is
attached, and I've also submitted it as an alternative patch to bug 51065.

If you agree with the strategy I used to solve this problem, then please
let me know if I can help with any changes required to get this into the
main distribution,


On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jm...@google.com> wrote:

> I've had some preliminary success with my own variant of apr_memcache.c
> (creatively called apr_memcache2.c).  Rather than setting the socket
> timeout, I've been mimicing the timeout strategy I saw in
> apr_memcache_multgetp, by adding a new helper method:
>
> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>                                                apr_memcache2_conn_t* conn)
> {
>     apr_pollset_t* pollset;
>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>     if (rv == APR_SUCCESS) {
>         apr_pollfd_t pollfd;
>         pollfd.desc_type = APR_POLL_SOCKET;
>         pollfd.reqevents = APR_POLLIN;
>         pollfd.p = temp_pool;
>         pollfd.desc.s = conn->sock;
>         pollfd.client_data = NULL;
>         apr_pollset_add(pollset, &pollfd);
>         apr_int32_t queries_recvd;
>         const apr_pollfd_t* activefds;
>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
>                               &activefds);
>         if (rv == APR_SUCCESS) {
>             assert(queries_recvd == 1);
>             assert(activefds->desc.s == conn->sock);
>             assert(activefds->client_data == NULL);
>         }
>     }
>     return rv;
> }
>
> And calling that before many of the existing calls to get_server_line as:
>
>     rv = wait_for_server_or_timeout_no_pool(conn);
>     if (rv != APR_SUCCESS) {
>         ms_release_conn(ms, conn);
>         return rv;
>     }
>
> This is just an experiment; I think I can streamline this by
> pre-populating the pollfd structure as part of the apr_memcache_conn_t
> (actually now apr_memcache2_conn_t).
>
> I have two questions about this:
>
> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
> somewhat different from the one that ships with Apache 2.2.  In particular
> the 2.4 version cannot be compiled against the headers that come with a 2.2
> distribution.  Is there any downside to taking my hacked 2.2 apr_memcache.c
> and running it in Apache 2.4?  Or should I maintain two hacks?
>
> 2. This seems wasteful in terms of system calls.  I am making an extra
> call to poll, rather than relying on the socket timeout.  The socket
> timeout didn't work as well as this though.  Does anyone have any theories
> as to why, or what could be done to the patch in
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>
> -Josh
>
>
> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jm...@google.com>wrote:
>
>> Following up: I tried doing what I suggested above: patching that change
>> into my own copy of apr_memcache.c  It was first of all a bad idea to pull
>> in only part of apr_memcache.c because that file changed slightly between
>> 2.2 and 2.4 and our module works in both.
>>
>> I was successful making my own version of apr_memcache (renaming
>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>> socket timeout from -1 to 10 seconds, then the system behaved very poorly
>> under load test (though it worked fine in our unit-tests and system-tests).
>>  In other words, I think the proposed patch that Jeff pointed to above is
>> not really working (as he predicted).  This test was done without
>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> thereafter behave poorly.
>>
>> I'm going to follow up on that bugzilla entry, but for now I'm going to
>> pursue my own complicated mechanism of timing out the calls from my side.
>>
>> -Josh
>>
>>
>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jm...@google.com>wrote:
>>
>>> Thanks Jeff, that is very helpful.  We are considering a course of
>>> action and before doing any work toward this, I'd like to understand the
>>> pitfalls from people that understand Apache better than us.
>>>
>>> Here's our reality: we believe we need to incorporate memcached for
>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>>> large sites & hosting providers.  We are fairly close (we think) to
>>> releasing this functionality as beta.  However, in such large sites, stuff
>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it does we
>>> want to fall back to serving partially unoptimized sites rather than
>>> hanging the servers.
>>>
>>> I understand the realities of backward-compatible APIs.  My expectation
>>> is that this would take years to make it into an APR distribution we could
>>> depend on.  We want to deploy this functionality in weeks.  The workarounds
>>> we have tried backgrounding the apr_memcache calls in a thread and timing
>>> out in mainline are complex and even once they work 100% will be very
>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>>> 'apachectl stop') if this happens more than (say) once a month.
>>>
>>> Our plan is to copy the patched implementation of
>>> apr_memcache_server_connect and the static methods it calls into a new .c
>>> file we will link into our module, naming the new entry-point something
>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>>> CS/SE perspective this is offensive and we admit it, but from a product
>>> quality perspective we believe this beats freezes and complicated/imperfect
>>> workarounds with threads.
>>>
>>> So I have two questions for the Apache community:
>>>
>>>    1. What are the practical problems with this approach?  Note that in
>>>    any case a new APR rev would require editing/ifdefing our code anyway, so I
>>>    think immunity from APR updates such as this patch being applied is not a
>>>    distinguishing drawback.
>>>    2. Is there an example of the correct solution to the technical
>>>    problem Jeff highlighted: "it is apparently missing a call to adjust
>>>    the socket timeout and to discard the connection if the timeout is reached".
>>>     That sounds like a pattern that might be found elsewhere in the Apache
>>>    HTTPD code base.
>>>
>>> Thanks in advance for your help!
>>> -Josh
>>>
>>>
>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <tr...@gmail.com> wrote:
>>>
>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jm...@google.com>
>>>> wrote:
>>>> > Is there a mechanism to time out individual operations?
>>>>
>>>> No, the socket connect timeout is hard-coded at 1 second and the
>>>> socket I/O timeout is disabled.
>>>>
>>>> Bugzilla bug https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>>>> has a patch, though it is apparently missing a call to adjust the
>>>> socket timeout and to discard the connection if the timeout is
>>>> reached.  More importantly, the API can only be changed in future APR
>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>>>> APR-Util 1.6.
>>>>
>>>> >
>>>> > If memcached freezes, then it appears my calls to 'get' will block
>>>> until
>>>> > memcached wakes up.  Is there any way to set a timeout for that call?
>>>> >
>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
>>>> before
>>>> > doing a 'get'?
>>>> >
>>>> > Here are my observations:
>>>> >
>>>> > apr_memcache_multgetp seems to time out in bounded time if I SIGSTOP
>>>> the
>>>> > memcached process. Yes!
>>>> >
>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > That last one really sucks as I am using that to print the status of
>>>> all my
>>>> > cache shards to the log file if I detected a problem :(
>>>> >
>>>> >
>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>>>>  Can I
>>>> > induce the others to have reasonable timeout behavior?
>>>> >
>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache, at
>>>> > least with graceful-stop.
>>>> >
>>>> >
>>>> > On a related note, the apr_memcache
>>>> > documentation<
>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>>>> >is
>>>> > very thin.  I'd be happy to augment it with my observations on its
>>>> > usage
>>>> > and the meaning of some of the arguments if that was desired.  How
>>>> would I
>>>> > go about that?
>>>>
>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>>>> include files, build (make dox) and inspect the results, submit a
>>>> patch to dev@apr.apache.org.
>>>>
>>>> >
>>>> > -Josh
>>>>
>>>>
>>>>
>>>> --
>>>> Born in Roswell... married an alien...
>>>> http://emptyhammock.com/
>>>>
>>>
>>>
>>
>

Re: apr_memcache operation timeouts

Posted by Joshua Marantz <jm...@google.com>.
I think it makes perfect sense to proceed with the patch you have then, and
we can propose one or two more incremental improvements from that.  It's
good that the patch you provided for me helps me gauge a reasonable size of
delta.

So the ball is in my court; I plan to start with a mod_pagespeed snapshot
from the time I made that patch, unpatch my patch, re-patch your patch &
run the tests I had at the time.

-Josh




On Fri, Mar 29, 2013 at 7:53 AM, Jeff Trawick <tr...@gmail.com> wrote:

> (dropping modules-dev)
>
> On Fri, Mar 22, 2013 at 4:04 PM, Joshua Marantz <jm...@google.com>
> wrote:
> > Thanks very much for doing this, Jeff.
> >
> > Unfortunately scanning your diff I think you wound up taking a snapshot
> of
> > a patch I'd uploaded to the bug, but not updated after several subsequent
> > bug fixes.
> >
> > The golden copies are in
> >
> https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil
> :
> >
> >
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h
> >
> > After we released our module we had a few small tweaks and bug-fixes
> based
> > on external feedback for our module.  Those versions have now been stable
> > for 3.5 months, with multiple users reporting success with our memcached
> > integration, so I'm hopeful that they have settled into a stable state.
> >
> > What's the best way to go from here?
>
> The patch I started with, apparently with fewer changes than what
> you're using now, was a bit beyond the number of changes I feel
> comfortable with reviewing/testing/committing in one cycle.
>
> Some possibilities:
>
> * somebody in a Rambo mood decides to commit your current stuff
> largely as-is to trunk (not me; I don't want to commit anything I
> don't understand line by line, and I don't have the time to go through
> even more)
> * you or somebody else split your code into a series of easy to
> understand patches that solve one problem and which can be backported
> back to 1.5.x without breaking anything
> * ???
>
> Before anyone commits, some of the tweaks in my patch need to be
> carried forward, including zapping asserts and the related change to
> the public structure.  (ISTR adding an error check for a pollset
> operation that we have seen fail in httpd-land.)
>
> >
> > -Josh
> >
> >
> > On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <tr...@gmail.com>
> wrote:
> >
> >> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com>
> wrote:
> >> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com>
> >> wrote:
> >> >>
> >> >> ping!
> >> >>
> >> >> Please don't hesitate to push back and tell me if I can supply the
> patch
> >> >> or
> >> >> update in some easier-to-digest form.  In particular, while I have
> >> >> rigorously stress-tested this change using mod_pagespeed's unit test,
> >> >> system-test, and load-test framework, I don't really understand what
> the
> >> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
> >> >> someone points me to a change-list or patch-file that does it
> properly.
> >> >>
> >> >> -Josh
> >> >
> >> >
> >> > I'll try hard to work on this in the next couple of days.  It would be
> >> great
> >> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
> >> week.
> >>
> >> Attached is a first pass at getting your patch to apr trunk.  Can you
> >> confirm that I didn't break your tests?  (It passes the apr tests, but
> >> that may not mean much.)
> >>
> >>
> >>
> >> >
> >> >>
> >> >>
> >> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
> >> >> wrote:
> >> >>
> >> >> > I have completed a solution to this problem, which can be a drop-in
> >> >> > update
> >> >> > for the existing apr_memcache.c.  It is now checked in for my
> module
> >> as
> >> >> >
> >> >> >
> >>
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >> >> > .
> >> >> >
> >> >> > It differs from the solution in
> >> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >> >> >
> >> >> >    - It doesn't require an API change; it but it enforces the 50ms
> >> >> >    timeout that already exists for apr_multgetp for all operations.
> >> >> >    - It works under my load test (which I found is not true of the
> >> patch
> >> >> >    in 51065).
> >> >> >
> >> >> > For my own purposes, I will be shipping my module with
> apr_memcache2
> >> so
> >> >> > I
> >> >> > get the behavior I want regardless of what version of Apache is
> >> >> > installed.
> >> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> >> >> > attached, and I've also submitted it as an alternative patch to bug
> >> >> > 51065.
> >> >> >
> >> >> > If you agree with the strategy I used to solve this problem, then
> >> please
> >> >> > let me know if I can help with any changes required to get this
> into
> >> the
> >> >> > main distribution,
> >> >> >
> >> >> >
> >> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
> >> >> > <jm...@google.com>wrote:
> >> >> >
> >> >> >> I've had some preliminary success with my own variant of
> >> apr_memcache.c
> >> >> >> (creatively called apr_memcache2.c).  Rather than setting the
> socket
> >> >> >> timeout, I've been mimicing the timeout strategy I saw in
> >> >> >> apr_memcache_multgetp, by adding a new helper method:
> >> >> >>
> >> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t*
> temp_pool,
> >> >> >>
>  apr_memcache2_conn_t*
> >> >> >> conn) {
> >> >> >>     apr_pollset_t* pollset;
> >> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool,
> 0);
> >> >> >>     if (rv == APR_SUCCESS) {
> >> >> >>         apr_pollfd_t pollfd;
> >> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >> >> >>         pollfd.reqevents = APR_POLLIN;
> >> >> >>         pollfd.p = temp_pool;
> >> >> >>         pollfd.desc.s = conn->sock;
> >> >> >>         pollfd.client_data = NULL;
> >> >> >>         apr_pollset_add(pollset, &pollfd);
> >> >> >>         apr_int32_t queries_recvd;
> >> >> >>         const apr_pollfd_t* activefds;
> >> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
> >> >> >> &queries_recvd,
> >> >> >>                               &activefds);
> >> >> >>         if (rv == APR_SUCCESS) {
> >> >> >>             assert(queries_recvd == 1);
> >> >> >>             assert(activefds->desc.s == conn->sock);
> >> >> >>             assert(activefds->client_data == NULL);
> >> >> >>         }
> >> >> >>     }
> >> >> >>     return rv;
> >> >> >> }
> >> >> >>
> >> >> >> And calling that before many of the existing calls to
> get_server_line
> >> >> >> as:
> >> >> >>
> >> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >> >> >>     if (rv != APR_SUCCESS) {
> >> >> >>         ms_release_conn(ms, conn);
> >> >> >>         return rv;
> >> >> >>     }
> >> >> >>
> >> >> >> This is just an experiment; I think I can streamline this by
> >> >> >> pre-populating the pollfd structure as part of the
> >> apr_memcache_conn_t
> >> >> >> (actually now apr_memcache2_conn_t).
> >> >> >>
> >> >> >> I have two questions about this:
> >> >> >>
> >> >> >> 1. I noticed the version of apr_memcache.c that ships with Apache
> 2.4
> >> >> >> is
> >> >> >> somewhat different from the one that ships with Apache 2.2.  In
> >> >> >> particular
> >> >> >> the 2.4 version cannot be compiled against the headers that come
> >> with a
> >> >> >> 2.2
> >> >> >> distribution.  Is there any downside to taking my hacked 2.2
> >> >> >> apr_memcache.c
> >> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >> >> >>
> >> >> >> 2. This seems wasteful in terms of system calls.  I am making an
> >> extra
> >> >> >> call to poll, rather than relying on the socket timeout.  The
> socket
> >> >> >> timeout didn't work as well as this though.  Does anyone have any
> >> >> >> theories
> >> >> >> as to why, or what could be done to the patch in
> >> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >> >> >>
> >> >> >> -Josh
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
> >> >> >> <jm...@google.com>wrote:
> >> >> >>
> >> >> >>> Following up: I tried doing what I suggested above: patching that
> >> >> >>> change
> >> >> >>> into my own copy of apr_memcache.c  It was first of all a bad
> idea
> >> to
> >> >> >>> pull
> >> >> >>> in only part of apr_memcache.c because that file changed slightly
> >> >> >>> between
> >> >> >>> 2.2 and 2.4 and our module works in both.
> >> >> >>>
> >> >> >>> I was successful making my own version of apr_memcache (renaming
> >> >> >>> entry-points apr_memcache2*) that I could hack.  But if I changed
> >> the
> >> >> >>> socket timeout from -1 to 10 seconds, then the system behaved
> very
> >> >> >>> poorly
> >> >> >>> under load test (though it worked fine in our unit-tests and
> >> >> >>> system-tests).
> >> >> >>>  In other words, I think the proposed patch that Jeff pointed to
> >> above
> >> >> >>> is
> >> >> >>> not really working (as he predicted).  This test was done without
> >> >> >>> SIGSTOPing the memcached; it would timeout under our load anyway
> and
> >> >> >>> thereafter behave poorly.
> >> >> >>>
> >> >> >>> I'm going to follow up on that bugzilla entry, but for now I'm
> going
> >> >> >>> to
> >> >> >>> pursue my own complicated mechanism of timing out the calls from
> my
> >> >> >>> side.
> >> >> >>>
> >> >> >>> -Josh
> >> >> >>>
> >> >> >>>
> >> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
> >> >> >>> <jm...@google.com>wrote:
> >> >> >>>
> >> >> >>>> Thanks Jeff, that is very helpful.  We are considering a course
> of
> >> >> >>>> action and before doing any work toward this, I'd like to
> >> understand
> >> >> >>>> the
> >> >> >>>> pitfalls from people that understand Apache better than us.
> >> >> >>>>
> >> >> >>>> Here's our reality: we believe we need to incorporate memcached
> for
> >> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively
> for
> >> very
> >> >> >>>> large sites & hosting providers.  We are fairly close (we
> think) to
> >> >> >>>> releasing this functionality as beta.  However, in such large
> >> sites,
> >> >> >>>> stuff
> >> >> >>>> goes wrong: machines crash, power failure, fiber cut, etc.
>  When it
> >> >> >>>> does we
> >> >> >>>> want to fall back to serving partially unoptimized sites rather
> >> than
> >> >> >>>> hanging the servers.
> >> >> >>>>
> >> >> >>>> I understand the realities of backward-compatible APIs.  My
> >> >> >>>> expectation
> >> >> >>>> is that this would take years to make it into an APR
> distribution
> >> we
> >> >> >>>> could
> >> >> >>>> depend on.  We want to deploy this functionality in weeks.  The
> >> >> >>>> workarounds
> >> >> >>>> we have tried backgrounding the apr_memcache calls in a thread
> and
> >> >> >>>> timing
> >> >> >>>> out in mainline are complex and even once they work 100% will be
> >> very
> >> >> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly
> on
> >> >> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >> >> >>>>
> >> >> >>>> Our plan is to copy the patched implementation of
> >> >> >>>> apr_memcache_server_connect and the static methods it calls
> into a
> >> >> >>>> new .c
> >> >> >>>> file we will link into our module, naming the new entry-point
> >> >> >>>> something
> >> >> >>>> else (apr_memcache_server_connect_with_timeout seems good).
>  From a
> >> >> >>>> CS/SE perspective this is offensive and we admit it, but from a
> >> >> >>>> product
> >> >> >>>> quality perspective we believe this beats freezes and
> >> >> >>>> complicated/imperfect
> >> >> >>>> workarounds with threads.
> >> >> >>>>
> >> >> >>>> So I have two questions for the Apache community:
> >> >> >>>>
> >> >> >>>>    1. What are the practical problems with this approach?  Note
> >> that
> >> >> >>>>    in any case a new APR rev would require editing/ifdefing our
> >> code
> >> >> >>>> anyway,
> >> >> >>>>    so I think immunity from APR updates such as this patch being
> >> >> >>>> applied is
> >> >> >>>>    not a distinguishing drawback.
> >> >> >>>>    2. Is there an example of the correct solution to the
> technical
> >> >> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >> >> >>>>    adjust the socket timeout and to discard the connection if
> the
> >> >> >>>>    timeout is reached".  That sounds like a pattern that might
> be
> >> >> >>>>    found elsewhere in the Apache HTTPD code base.
> >> >> >>>>
> >> >> >>>> Thanks in advance for your help!
> >> >> >>>> -Josh
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
> >> >> >>>> <tr...@gmail.com>wrote:
> >> >> >>>>
> >> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
> >> >> >>>>> <jm...@google.com>
> >> >> >>>>> wrote:
> >> >> >>>>> > Is there a mechanism to time out individual operations?
> >> >> >>>>>
> >> >> >>>>> No, the socket connect timeout is hard-coded at 1 second and
> the
> >> >> >>>>> socket I/O timeout is disabled.
> >> >> >>>>>
> >> >> >>>>> Bugzilla bug
> >> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >> >> >>>>> has a patch, though it is apparently missing a call to adjust
> the
> >> >> >>>>> socket timeout and to discard the connection if the timeout is
> >> >> >>>>> reached.  More importantly, the API can only be changed in
> future
> >> >> >>>>> APR
> >> >> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in
> >> future
> >> >> >>>>> APR-Util 1.6.
> >> >> >>>>>
> >> >> >>>>> >
> >> >> >>>>> > If memcached freezes, then it appears my calls to 'get' will
> >> block
> >> >> >>>>> until
> >> >> >>>>> > memcached wakes up.  Is there any way to set a timeout for
> that
> >> >> >>>>> > call?
> >> >> >>>>> >
> >> >> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
> >> >> >>>>> > memcached
> >> >> >>>>> before
> >> >> >>>>> > doing a 'get'?
> >> >> >>>>> >
> >> >> >>>>> > Here are my observations:
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> >> >> >>>>> > SIGSTOP
> >> >> >>>>> the
> >> >> >>>>> > memcached process. Yes!
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP
> the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP
> the
> >> >> >>>>> memcached
> >> >> >>>>> > process.
> >> >> >>>>> >
> >> >> >>>>> > That last one really sucks as I am using that to print the
> >> status
> >> >> >>>>> > of
> >> >> >>>>> all my
> >> >> >>>>> > cache shards to the log file if I detected a problem :(
> >> >> >>>>> >
> >> >> >>>>> >
> >> >> >>>>> > Why does apr_memcache_multgetp do what I want and not the
> >> others?
> >> >> >>>>>  Can I
> >> >> >>>>> > induce the others to have reasonable timeout behavior?
> >> >> >>>>> >
> >> >> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
> >> >> >>>>> > Apache,
> >> >> >>>>> at
> >> >> >>>>> > least with graceful-stop.
> >> >> >>>>> >
> >> >> >>>>> >
> >> >> >>>>> > On a related note, the apr_memcache
> >> >> >>>>> > documentation<
> >> >> >>>>>
> >> >> >>>>>
> >> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >> >> >>>>> >is
> >> >> >>>>> > very thin.  I'd be happy to augment it with my observations
> on
> >> its
> >> >> >>>>> > usage
> >> >> >>>>> > and the meaning of some of the arguments if that was desired.
> >>  How
> >> >> >>>>> would I
> >> >> >>>>> > go about that?
> >> >> >>>>>
> >> >> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in
> >> the
> >> >> >>>>> include files, build (make dox) and inspect the results,
> submit a
> >> >> >>>>> patch to dev@apr.apache.org.
> >> >> >>>>>
> >> >> >>>>> >
> >> >> >>>>> > -Josh
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Born in Roswell... married an alien...
> >> >> >>>>> http://emptyhammock.com/
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Born in Roswell... married an alien...
> >> > http://emptyhammock.com/
> >>
> >>
> >>
> >> --
> >> Born in Roswell... married an alien...
> >> http://emptyhammock.com/
> >>
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Re: apr_memcache operation timeouts

Posted by Jeff Trawick <tr...@gmail.com>.
(dropping modules-dev)

On Fri, Mar 22, 2013 at 4:04 PM, Joshua Marantz <jm...@google.com> wrote:
> Thanks very much for doing this, Jeff.
>
> Unfortunately scanning your diff I think you wound up taking a snapshot of
> a patch I'd uploaded to the bug, but not updated after several subsequent
> bug fixes.
>
> The golden copies are in
> https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil:
>
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h
>
> After we released our module we had a few small tweaks and bug-fixes based
> on external feedback for our module.  Those versions have now been stable
> for 3.5 months, with multiple users reporting success with our memcached
> integration, so I'm hopeful that they have settled into a stable state.
>
> What's the best way to go from here?

The patch I started with, apparently with fewer changes than what
you're using now, was a bit beyond the number of changes I feel
comfortable with reviewing/testing/committing in one cycle.

Some possibilities:

* somebody in a Rambo mood decides to commit your current stuff
largely as-is to trunk (not me; I don't want to commit anything I
don't understand line by line, and I don't have the time to go through
even more)
* you or somebody else split your code into a series of easy to
understand patches that solve one problem and which can be backported
back to 1.5.x without breaking anything
* ???

Before anyone commits, some of the tweaks in my patch need to be
carried forward, including zapping asserts and the related change to
the public structure.  (ISTR adding an error check for a pollset
operation that we have seen fail in httpd-land.)

>
> -Josh
>
>
> On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <tr...@gmail.com> wrote:
>
>> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com> wrote:
>> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com>
>> wrote:
>> >>
>> >> ping!
>> >>
>> >> Please don't hesitate to push back and tell me if I can supply the patch
>> >> or
>> >> update in some easier-to-digest form.  In particular, while I have
>> >> rigorously stress-tested this change using mod_pagespeed's unit test,
>> >> system-test, and load-test framework, I don't really understand what the
>> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
>> >> someone points me to a change-list or patch-file that does it properly.
>> >>
>> >> -Josh
>> >
>> >
>> > I'll try hard to work on this in the next couple of days.  It would be
>> great
>> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
>> week.
>>
>> Attached is a first pass at getting your patch to apr trunk.  Can you
>> confirm that I didn't break your tests?  (It passes the apr tests, but
>> that may not mean much.)
>>
>>
>>
>> >
>> >>
>> >>
>> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
>> >> wrote:
>> >>
>> >> > I have completed a solution to this problem, which can be a drop-in
>> >> > update
>> >> > for the existing apr_memcache.c.  It is now checked in for my module
>> as
>> >> >
>> >> >
>> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
>> >> > .
>> >> >
>> >> > It differs from the solution in
>> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>> >> >
>> >> >    - It doesn't require an API change; it but it enforces the 50ms
>> >> >    timeout that already exists for apr_multgetp for all operations.
>> >> >    - It works under my load test (which I found is not true of the
>> patch
>> >> >    in 51065).
>> >> >
>> >> > For my own purposes, I will be shipping my module with apr_memcache2
>> so
>> >> > I
>> >> > get the behavior I want regardless of what version of Apache is
>> >> > installed.
>> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
>> >> > attached, and I've also submitted it as an alternative patch to bug
>> >> > 51065.
>> >> >
>> >> > If you agree with the strategy I used to solve this problem, then
>> please
>> >> > let me know if I can help with any changes required to get this into
>> the
>> >> > main distribution,
>> >> >
>> >> >
>> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
>> >> > <jm...@google.com>wrote:
>> >> >
>> >> >> I've had some preliminary success with my own variant of
>> apr_memcache.c
>> >> >> (creatively called apr_memcache2.c).  Rather than setting the socket
>> >> >> timeout, I've been mimicing the timeout strategy I saw in
>> >> >> apr_memcache_multgetp, by adding a new helper method:
>> >> >>
>> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>> >> >>                                                apr_memcache2_conn_t*
>> >> >> conn) {
>> >> >>     apr_pollset_t* pollset;
>> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>> >> >>     if (rv == APR_SUCCESS) {
>> >> >>         apr_pollfd_t pollfd;
>> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
>> >> >>         pollfd.reqevents = APR_POLLIN;
>> >> >>         pollfd.p = temp_pool;
>> >> >>         pollfd.desc.s = conn->sock;
>> >> >>         pollfd.client_data = NULL;
>> >> >>         apr_pollset_add(pollset, &pollfd);
>> >> >>         apr_int32_t queries_recvd;
>> >> >>         const apr_pollfd_t* activefds;
>> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
>> >> >> &queries_recvd,
>> >> >>                               &activefds);
>> >> >>         if (rv == APR_SUCCESS) {
>> >> >>             assert(queries_recvd == 1);
>> >> >>             assert(activefds->desc.s == conn->sock);
>> >> >>             assert(activefds->client_data == NULL);
>> >> >>         }
>> >> >>     }
>> >> >>     return rv;
>> >> >> }
>> >> >>
>> >> >> And calling that before many of the existing calls to get_server_line
>> >> >> as:
>> >> >>
>> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
>> >> >>     if (rv != APR_SUCCESS) {
>> >> >>         ms_release_conn(ms, conn);
>> >> >>         return rv;
>> >> >>     }
>> >> >>
>> >> >> This is just an experiment; I think I can streamline this by
>> >> >> pre-populating the pollfd structure as part of the
>> apr_memcache_conn_t
>> >> >> (actually now apr_memcache2_conn_t).
>> >> >>
>> >> >> I have two questions about this:
>> >> >>
>> >> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
>> >> >> is
>> >> >> somewhat different from the one that ships with Apache 2.2.  In
>> >> >> particular
>> >> >> the 2.4 version cannot be compiled against the headers that come
>> with a
>> >> >> 2.2
>> >> >> distribution.  Is there any downside to taking my hacked 2.2
>> >> >> apr_memcache.c
>> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
>> >> >>
>> >> >> 2. This seems wasteful in terms of system calls.  I am making an
>> extra
>> >> >> call to poll, rather than relying on the socket timeout.  The socket
>> >> >> timeout didn't work as well as this though.  Does anyone have any
>> >> >> theories
>> >> >> as to why, or what could be done to the patch in
>> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>> >> >>
>> >> >> -Josh
>> >> >>
>> >> >>
>> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
>> >> >> <jm...@google.com>wrote:
>> >> >>
>> >> >>> Following up: I tried doing what I suggested above: patching that
>> >> >>> change
>> >> >>> into my own copy of apr_memcache.c  It was first of all a bad idea
>> to
>> >> >>> pull
>> >> >>> in only part of apr_memcache.c because that file changed slightly
>> >> >>> between
>> >> >>> 2.2 and 2.4 and our module works in both.
>> >> >>>
>> >> >>> I was successful making my own version of apr_memcache (renaming
>> >> >>> entry-points apr_memcache2*) that I could hack.  But if I changed
>> the
>> >> >>> socket timeout from -1 to 10 seconds, then the system behaved very
>> >> >>> poorly
>> >> >>> under load test (though it worked fine in our unit-tests and
>> >> >>> system-tests).
>> >> >>>  In other words, I think the proposed patch that Jeff pointed to
>> above
>> >> >>> is
>> >> >>> not really working (as he predicted).  This test was done without
>> >> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> >> >>> thereafter behave poorly.
>> >> >>>
>> >> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
>> >> >>> to
>> >> >>> pursue my own complicated mechanism of timing out the calls from my
>> >> >>> side.
>> >> >>>
>> >> >>> -Josh
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
>> >> >>> <jm...@google.com>wrote:
>> >> >>>
>> >> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
>> >> >>>> action and before doing any work toward this, I'd like to
>> understand
>> >> >>>> the
>> >> >>>> pitfalls from people that understand Apache better than us.
>> >> >>>>
>> >> >>>> Here's our reality: we believe we need to incorporate memcached for
>> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for
>> very
>> >> >>>> large sites & hosting providers.  We are fairly close (we think) to
>> >> >>>> releasing this functionality as beta.  However, in such large
>> sites,
>> >> >>>> stuff
>> >> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
>> >> >>>> does we
>> >> >>>> want to fall back to serving partially unoptimized sites rather
>> than
>> >> >>>> hanging the servers.
>> >> >>>>
>> >> >>>> I understand the realities of backward-compatible APIs.  My
>> >> >>>> expectation
>> >> >>>> is that this would take years to make it into an APR distribution
>> we
>> >> >>>> could
>> >> >>>> depend on.  We want to deploy this functionality in weeks.  The
>> >> >>>> workarounds
>> >> >>>> we have tried backgrounding the apr_memcache calls in a thread and
>> >> >>>> timing
>> >> >>>> out in mainline are complex and even once they work 100% will be
>> very
>> >> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>> >> >>>> 'apachectl stop') if this happens more than (say) once a month.
>> >> >>>>
>> >> >>>> Our plan is to copy the patched implementation of
>> >> >>>> apr_memcache_server_connect and the static methods it calls into a
>> >> >>>> new .c
>> >> >>>> file we will link into our module, naming the new entry-point
>> >> >>>> something
>> >> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>> >> >>>> CS/SE perspective this is offensive and we admit it, but from a
>> >> >>>> product
>> >> >>>> quality perspective we believe this beats freezes and
>> >> >>>> complicated/imperfect
>> >> >>>> workarounds with threads.
>> >> >>>>
>> >> >>>> So I have two questions for the Apache community:
>> >> >>>>
>> >> >>>>    1. What are the practical problems with this approach?  Note
>> that
>> >> >>>>    in any case a new APR rev would require editing/ifdefing our
>> code
>> >> >>>> anyway,
>> >> >>>>    so I think immunity from APR updates such as this patch being
>> >> >>>> applied is
>> >> >>>>    not a distinguishing drawback.
>> >> >>>>    2. Is there an example of the correct solution to the technical
>> >> >>>>    problem Jeff highlighted: "it is apparently missing a call to
>> >> >>>>    adjust the socket timeout and to discard the connection if the
>> >> >>>>    timeout is reached".  That sounds like a pattern that might be
>> >> >>>>    found elsewhere in the Apache HTTPD code base.
>> >> >>>>
>> >> >>>> Thanks in advance for your help!
>> >> >>>> -Josh
>> >> >>>>
>> >> >>>>
>> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
>> >> >>>> <tr...@gmail.com>wrote:
>> >> >>>>
>> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
>> >> >>>>> <jm...@google.com>
>> >> >>>>> wrote:
>> >> >>>>> > Is there a mechanism to time out individual operations?
>> >> >>>>>
>> >> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
>> >> >>>>> socket I/O timeout is disabled.
>> >> >>>>>
>> >> >>>>> Bugzilla bug
>> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>> >> >>>>> has a patch, though it is apparently missing a call to adjust the
>> >> >>>>> socket timeout and to discard the connection if the timeout is
>> >> >>>>> reached.  More importantly, the API can only be changed in future
>> >> >>>>> APR
>> >> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in
>> future
>> >> >>>>> APR-Util 1.6.
>> >> >>>>>
>> >> >>>>> >
>> >> >>>>> > If memcached freezes, then it appears my calls to 'get' will
>> block
>> >> >>>>> until
>> >> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
>> >> >>>>> > call?
>> >> >>>>> >
>> >> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
>> >> >>>>> > memcached
>> >> >>>>> before
>> >> >>>>> > doing a 'get'?
>> >> >>>>> >
>> >> >>>>> > Here are my observations:
>> >> >>>>> >
>> >> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
>> >> >>>>> > SIGSTOP
>> >> >>>>> the
>> >> >>>>> > memcached process. Yes!
>> >> >>>>> >
>> >> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>> >> >>>>> memcached
>> >> >>>>> > process.
>> >> >>>>> >
>> >> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>> >> >>>>> memcached
>> >> >>>>> > process.
>> >> >>>>> >
>> >> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>> >> >>>>> memcached
>> >> >>>>> > process.
>> >> >>>>> >
>> >> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>> >> >>>>> memcached
>> >> >>>>> > process.
>> >> >>>>> >
>> >> >>>>> > That last one really sucks as I am using that to print the
>> status
>> >> >>>>> > of
>> >> >>>>> all my
>> >> >>>>> > cache shards to the log file if I detected a problem :(
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > Why does apr_memcache_multgetp do what I want and not the
>> others?
>> >> >>>>>  Can I
>> >> >>>>> > induce the others to have reasonable timeout behavior?
>> >> >>>>> >
>> >> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
>> >> >>>>> > Apache,
>> >> >>>>> at
>> >> >>>>> > least with graceful-stop.
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > On a related note, the apr_memcache
>> >> >>>>> > documentation<
>> >> >>>>>
>> >> >>>>>
>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>> >> >>>>> >is
>> >> >>>>> > very thin.  I'd be happy to augment it with my observations on
>> its
>> >> >>>>> > usage
>> >> >>>>> > and the meaning of some of the arguments if that was desired.
>>  How
>> >> >>>>> would I
>> >> >>>>> > go about that?
>> >> >>>>>
>> >> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in
>> the
>> >> >>>>> include files, build (make dox) and inspect the results, submit a
>> >> >>>>> patch to dev@apr.apache.org.
>> >> >>>>>
>> >> >>>>> >
>> >> >>>>> > -Josh
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Born in Roswell... married an alien...
>> >> >>>>> http://emptyhammock.com/
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Born in Roswell... married an alien...
>> > http://emptyhammock.com/
>>
>>
>>
>> --
>> Born in Roswell... married an alien...
>> http://emptyhammock.com/
>>



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: apr_memcache operation timeouts

Posted by Joshua Marantz <jm...@google.com>.
Thanks very much for doing this, Jeff.

Unfortunately scanning your diff I think you wound up taking a snapshot of
a patch I'd uploaded to the bug, but not updated after several subsequent
bug fixes.

The golden copies are in
https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil:

https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h

After we released our module we had a few small tweaks and bug-fixes based
on external feedback for our module.  Those versions have now been stable
for 3.5 months, with multiple users reporting success with our memcached
integration, so I'm hopeful that they have settled into a stable state.

What's the best way to go from here?

-Josh


On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <tr...@gmail.com> wrote:

> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com> wrote:
> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com>
> wrote:
> >>
> >> ping!
> >>
> >> Please don't hesitate to push back and tell me if I can supply the patch
> >> or
> >> update in some easier-to-digest form.  In particular, while I have
> >> rigorously stress-tested this change using mod_pagespeed's unit test,
> >> system-test, and load-test framework, I don't really understand what the
> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
> >> someone points me to a change-list or patch-file that does it properly.
> >>
> >> -Josh
> >
> >
> > I'll try hard to work on this in the next couple of days.  It would be
> great
> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
> week.
>
> Attached is a first pass at getting your patch to apr trunk.  Can you
> confirm that I didn't break your tests?  (It passes the apr tests, but
> that may not mean much.)
>
>
>
> >
> >>
> >>
> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
> >> wrote:
> >>
> >> > I have completed a solution to this problem, which can be a drop-in
> >> > update
> >> > for the existing apr_memcache.c.  It is now checked in for my module
> as
> >> >
> >> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >> > .
> >> >
> >> > It differs from the solution in
> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >> >
> >> >    - It doesn't require an API change; it but it enforces the 50ms
> >> >    timeout that already exists for apr_multgetp for all operations.
> >> >    - It works under my load test (which I found is not true of the
> patch
> >> >    in 51065).
> >> >
> >> > For my own purposes, I will be shipping my module with apr_memcache2
> so
> >> > I
> >> > get the behavior I want regardless of what version of Apache is
> >> > installed.
> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> >> > attached, and I've also submitted it as an alternative patch to bug
> >> > 51065.
> >> >
> >> > If you agree with the strategy I used to solve this problem, then
> please
> >> > let me know if I can help with any changes required to get this into
> the
> >> > main distribution,
> >> >
> >> >
> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
> >> > <jm...@google.com>wrote:
> >> >
> >> >> I've had some preliminary success with my own variant of
> apr_memcache.c
> >> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> >> timeout, I've been mimicing the timeout strategy I saw in
> >> >> apr_memcache_multgetp, by adding a new helper method:
> >> >>
> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >> >>                                                apr_memcache2_conn_t*
> >> >> conn) {
> >> >>     apr_pollset_t* pollset;
> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
> >> >>     if (rv == APR_SUCCESS) {
> >> >>         apr_pollfd_t pollfd;
> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >> >>         pollfd.reqevents = APR_POLLIN;
> >> >>         pollfd.p = temp_pool;
> >> >>         pollfd.desc.s = conn->sock;
> >> >>         pollfd.client_data = NULL;
> >> >>         apr_pollset_add(pollset, &pollfd);
> >> >>         apr_int32_t queries_recvd;
> >> >>         const apr_pollfd_t* activefds;
> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
> >> >> &queries_recvd,
> >> >>                               &activefds);
> >> >>         if (rv == APR_SUCCESS) {
> >> >>             assert(queries_recvd == 1);
> >> >>             assert(activefds->desc.s == conn->sock);
> >> >>             assert(activefds->client_data == NULL);
> >> >>         }
> >> >>     }
> >> >>     return rv;
> >> >> }
> >> >>
> >> >> And calling that before many of the existing calls to get_server_line
> >> >> as:
> >> >>
> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >> >>     if (rv != APR_SUCCESS) {
> >> >>         ms_release_conn(ms, conn);
> >> >>         return rv;
> >> >>     }
> >> >>
> >> >> This is just an experiment; I think I can streamline this by
> >> >> pre-populating the pollfd structure as part of the
> apr_memcache_conn_t
> >> >> (actually now apr_memcache2_conn_t).
> >> >>
> >> >> I have two questions about this:
> >> >>
> >> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
> >> >> is
> >> >> somewhat different from the one that ships with Apache 2.2.  In
> >> >> particular
> >> >> the 2.4 version cannot be compiled against the headers that come
> with a
> >> >> 2.2
> >> >> distribution.  Is there any downside to taking my hacked 2.2
> >> >> apr_memcache.c
> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >> >>
> >> >> 2. This seems wasteful in terms of system calls.  I am making an
> extra
> >> >> call to poll, rather than relying on the socket timeout.  The socket
> >> >> timeout didn't work as well as this though.  Does anyone have any
> >> >> theories
> >> >> as to why, or what could be done to the patch in
> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >> >>
> >> >> -Josh
> >> >>
> >> >>
> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
> >> >> <jm...@google.com>wrote:
> >> >>
> >> >>> Following up: I tried doing what I suggested above: patching that
> >> >>> change
> >> >>> into my own copy of apr_memcache.c  It was first of all a bad idea
> to
> >> >>> pull
> >> >>> in only part of apr_memcache.c because that file changed slightly
> >> >>> between
> >> >>> 2.2 and 2.4 and our module works in both.
> >> >>>
> >> >>> I was successful making my own version of apr_memcache (renaming
> >> >>> entry-points apr_memcache2*) that I could hack.  But if I changed
> the
> >> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> >> >>> poorly
> >> >>> under load test (though it worked fine in our unit-tests and
> >> >>> system-tests).
> >> >>>  In other words, I think the proposed patch that Jeff pointed to
> above
> >> >>> is
> >> >>> not really working (as he predicted).  This test was done without
> >> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
> >> >>> thereafter behave poorly.
> >> >>>
> >> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
> >> >>> to
> >> >>> pursue my own complicated mechanism of timing out the calls from my
> >> >>> side.
> >> >>>
> >> >>> -Josh
> >> >>>
> >> >>>
> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
> >> >>> <jm...@google.com>wrote:
> >> >>>
> >> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
> >> >>>> action and before doing any work toward this, I'd like to
> understand
> >> >>>> the
> >> >>>> pitfalls from people that understand Apache better than us.
> >> >>>>
> >> >>>> Here's our reality: we believe we need to incorporate memcached for
> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for
> very
> >> >>>> large sites & hosting providers.  We are fairly close (we think) to
> >> >>>> releasing this functionality as beta.  However, in such large
> sites,
> >> >>>> stuff
> >> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
> >> >>>> does we
> >> >>>> want to fall back to serving partially unoptimized sites rather
> than
> >> >>>> hanging the servers.
> >> >>>>
> >> >>>> I understand the realities of backward-compatible APIs.  My
> >> >>>> expectation
> >> >>>> is that this would take years to make it into an APR distribution
> we
> >> >>>> could
> >> >>>> depend on.  We want to deploy this functionality in weeks.  The
> >> >>>> workarounds
> >> >>>> we have tried backgrounding the apr_memcache calls in a thread and
> >> >>>> timing
> >> >>>> out in mainline are complex and even once they work 100% will be
> very
> >> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
> >> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >> >>>>
> >> >>>> Our plan is to copy the patched implementation of
> >> >>>> apr_memcache_server_connect and the static methods it calls into a
> >> >>>> new .c
> >> >>>> file we will link into our module, naming the new entry-point
> >> >>>> something
> >> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
> >> >>>> CS/SE perspective this is offensive and we admit it, but from a
> >> >>>> product
> >> >>>> quality perspective we believe this beats freezes and
> >> >>>> complicated/imperfect
> >> >>>> workarounds with threads.
> >> >>>>
> >> >>>> So I have two questions for the Apache community:
> >> >>>>
> >> >>>>    1. What are the practical problems with this approach?  Note
> that
> >> >>>>    in any case a new APR rev would require editing/ifdefing our
> code
> >> >>>> anyway,
> >> >>>>    so I think immunity from APR updates such as this patch being
> >> >>>> applied is
> >> >>>>    not a distinguishing drawback.
> >> >>>>    2. Is there an example of the correct solution to the technical
> >> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >> >>>>    adjust the socket timeout and to discard the connection if the
> >> >>>>    timeout is reached".  That sounds like a pattern that might be
> >> >>>>    found elsewhere in the Apache HTTPD code base.
> >> >>>>
> >> >>>> Thanks in advance for your help!
> >> >>>> -Josh
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
> >> >>>> <tr...@gmail.com>wrote:
> >> >>>>
> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
> >> >>>>> <jm...@google.com>
> >> >>>>> wrote:
> >> >>>>> > Is there a mechanism to time out individual operations?
> >> >>>>>
> >> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
> >> >>>>> socket I/O timeout is disabled.
> >> >>>>>
> >> >>>>> Bugzilla bug
> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >> >>>>> has a patch, though it is apparently missing a call to adjust the
> >> >>>>> socket timeout and to discard the connection if the timeout is
> >> >>>>> reached.  More importantly, the API can only be changed in future
> >> >>>>> APR
> >> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in
> future
> >> >>>>> APR-Util 1.6.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > If memcached freezes, then it appears my calls to 'get' will
> block
> >> >>>>> until
> >> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
> >> >>>>> > call?
> >> >>>>> >
> >> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
> >> >>>>> > memcached
> >> >>>>> before
> >> >>>>> > doing a 'get'?
> >> >>>>> >
> >> >>>>> > Here are my observations:
> >> >>>>> >
> >> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> >> >>>>> > SIGSTOP
> >> >>>>> the
> >> >>>>> > memcached process. Yes!
> >> >>>>> >
> >> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > That last one really sucks as I am using that to print the
> status
> >> >>>>> > of
> >> >>>>> all my
> >> >>>>> > cache shards to the log file if I detected a problem :(
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > Why does apr_memcache_multgetp do what I want and not the
> others?
> >> >>>>>  Can I
> >> >>>>> > induce the others to have reasonable timeout behavior?
> >> >>>>> >
> >> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
> >> >>>>> > Apache,
> >> >>>>> at
> >> >>>>> > least with graceful-stop.
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > On a related note, the apr_memcache
> >> >>>>> > documentation<
> >> >>>>>
> >> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >> >>>>> >is
> >> >>>>> > very thin.  I'd be happy to augment it with my observations on
> its
> >> >>>>> > usage
> >> >>>>> > and the meaning of some of the arguments if that was desired.
>  How
> >> >>>>> would I
> >> >>>>> > go about that?
> >> >>>>>
> >> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in
> the
> >> >>>>> include files, build (make dox) and inspect the results, submit a
> >> >>>>> patch to dev@apr.apache.org.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > -Josh
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Born in Roswell... married an alien...
> >> >>>>> http://emptyhammock.com/
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
> >
> >
> > --
> > Born in Roswell... married an alien...
> > http://emptyhammock.com/
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Re: apr_memcache operation timeouts

Posted by Joshua Marantz <jm...@google.com>.
Thanks very much for doing this, Jeff.

Unfortunately scanning your diff I think you wound up taking a snapshot of
a patch I'd uploaded to the bug, but not updated after several subsequent
bug fixes.

The golden copies are in
https://code.google.com/p/modpagespeed/source/browse/#svn%2Ftrunk%2Fsrc%2Fthird_party%2Faprutil:

https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
https://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.h

After we released our module we had a few small tweaks and bug-fixes based
on external feedback for our module.  Those versions have now been stable
for 3.5 months, with multiple users reporting success with our memcached
integration, so I'm hopeful that they have settled into a stable state.

What's the best way to go from here?

-Josh


On Thu, Mar 14, 2013 at 10:45 AM, Jeff Trawick <tr...@gmail.com> wrote:

> On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com> wrote:
> > On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com>
> wrote:
> >>
> >> ping!
> >>
> >> Please don't hesitate to push back and tell me if I can supply the patch
> >> or
> >> update in some easier-to-digest form.  In particular, while I have
> >> rigorously stress-tested this change using mod_pagespeed's unit test,
> >> system-test, and load-test framework, I don't really understand what the
> >> testing flow is for APR.  I'd be happy to add unit-tests for that if
> >> someone points me to a change-list or patch-file that does it properly.
> >>
> >> -Josh
> >
> >
> > I'll try hard to work on this in the next couple of days.  It would be
> great
> > to have fixes in APR-Util 1.5.x, which we hope to work on later this
> week.
>
> Attached is a first pass at getting your patch to apr trunk.  Can you
> confirm that I didn't break your tests?  (It passes the apr tests, but
> that may not mean much.)
>
>
>
> >
> >>
> >>
> >> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
> >> wrote:
> >>
> >> > I have completed a solution to this problem, which can be a drop-in
> >> > update
> >> > for the existing apr_memcache.c.  It is now checked in for my module
> as
> >> >
> >> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> >> > .
> >> >
> >> > It differs from the solution in
> >> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >> >
> >> >    - It doesn't require an API change; it but it enforces the 50ms
> >> >    timeout that already exists for apr_multgetp for all operations.
> >> >    - It works under my load test (which I found is not true of the
> patch
> >> >    in 51065).
> >> >
> >> > For my own purposes, I will be shipping my module with apr_memcache2
> so
> >> > I
> >> > get the behavior I want regardless of what version of Apache is
> >> > installed.
> >> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> >> > attached, and I've also submitted it as an alternative patch to bug
> >> > 51065.
> >> >
> >> > If you agree with the strategy I used to solve this problem, then
> please
> >> > let me know if I can help with any changes required to get this into
> the
> >> > main distribution,
> >> >
> >> >
> >> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
> >> > <jm...@google.com>wrote:
> >> >
> >> >> I've had some preliminary success with my own variant of
> apr_memcache.c
> >> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> >> timeout, I've been mimicing the timeout strategy I saw in
> >> >> apr_memcache_multgetp, by adding a new helper method:
> >> >>
> >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >> >>                                                apr_memcache2_conn_t*
> >> >> conn) {
> >> >>     apr_pollset_t* pollset;
> >> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
> >> >>     if (rv == APR_SUCCESS) {
> >> >>         apr_pollfd_t pollfd;
> >> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >> >>         pollfd.reqevents = APR_POLLIN;
> >> >>         pollfd.p = temp_pool;
> >> >>         pollfd.desc.s = conn->sock;
> >> >>         pollfd.client_data = NULL;
> >> >>         apr_pollset_add(pollset, &pollfd);
> >> >>         apr_int32_t queries_recvd;
> >> >>         const apr_pollfd_t* activefds;
> >> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
> >> >> &queries_recvd,
> >> >>                               &activefds);
> >> >>         if (rv == APR_SUCCESS) {
> >> >>             assert(queries_recvd == 1);
> >> >>             assert(activefds->desc.s == conn->sock);
> >> >>             assert(activefds->client_data == NULL);
> >> >>         }
> >> >>     }
> >> >>     return rv;
> >> >> }
> >> >>
> >> >> And calling that before many of the existing calls to get_server_line
> >> >> as:
> >> >>
> >> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >> >>     if (rv != APR_SUCCESS) {
> >> >>         ms_release_conn(ms, conn);
> >> >>         return rv;
> >> >>     }
> >> >>
> >> >> This is just an experiment; I think I can streamline this by
> >> >> pre-populating the pollfd structure as part of the
> apr_memcache_conn_t
> >> >> (actually now apr_memcache2_conn_t).
> >> >>
> >> >> I have two questions about this:
> >> >>
> >> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
> >> >> is
> >> >> somewhat different from the one that ships with Apache 2.2.  In
> >> >> particular
> >> >> the 2.4 version cannot be compiled against the headers that come
> with a
> >> >> 2.2
> >> >> distribution.  Is there any downside to taking my hacked 2.2
> >> >> apr_memcache.c
> >> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >> >>
> >> >> 2. This seems wasteful in terms of system calls.  I am making an
> extra
> >> >> call to poll, rather than relying on the socket timeout.  The socket
> >> >> timeout didn't work as well as this though.  Does anyone have any
> >> >> theories
> >> >> as to why, or what could be done to the patch in
> >> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >> >>
> >> >> -Josh
> >> >>
> >> >>
> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
> >> >> <jm...@google.com>wrote:
> >> >>
> >> >>> Following up: I tried doing what I suggested above: patching that
> >> >>> change
> >> >>> into my own copy of apr_memcache.c  It was first of all a bad idea
> to
> >> >>> pull
> >> >>> in only part of apr_memcache.c because that file changed slightly
> >> >>> between
> >> >>> 2.2 and 2.4 and our module works in both.
> >> >>>
> >> >>> I was successful making my own version of apr_memcache (renaming
> >> >>> entry-points apr_memcache2*) that I could hack.  But if I changed
> the
> >> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> >> >>> poorly
> >> >>> under load test (though it worked fine in our unit-tests and
> >> >>> system-tests).
> >> >>>  In other words, I think the proposed patch that Jeff pointed to
> above
> >> >>> is
> >> >>> not really working (as he predicted).  This test was done without
> >> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
> >> >>> thereafter behave poorly.
> >> >>>
> >> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
> >> >>> to
> >> >>> pursue my own complicated mechanism of timing out the calls from my
> >> >>> side.
> >> >>>
> >> >>> -Josh
> >> >>>
> >> >>>
> >> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
> >> >>> <jm...@google.com>wrote:
> >> >>>
> >> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
> >> >>>> action and before doing any work toward this, I'd like to
> understand
> >> >>>> the
> >> >>>> pitfalls from people that understand Apache better than us.
> >> >>>>
> >> >>>> Here's our reality: we believe we need to incorporate memcached for
> >> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for
> very
> >> >>>> large sites & hosting providers.  We are fairly close (we think) to
> >> >>>> releasing this functionality as beta.  However, in such large
> sites,
> >> >>>> stuff
> >> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
> >> >>>> does we
> >> >>>> want to fall back to serving partially unoptimized sites rather
> than
> >> >>>> hanging the servers.
> >> >>>>
> >> >>>> I understand the realities of backward-compatible APIs.  My
> >> >>>> expectation
> >> >>>> is that this would take years to make it into an APR distribution
> we
> >> >>>> could
> >> >>>> depend on.  We want to deploy this functionality in weeks.  The
> >> >>>> workarounds
> >> >>>> we have tried backgrounding the apr_memcache calls in a thread and
> >> >>>> timing
> >> >>>> out in mainline are complex and even once they work 100% will be
> very
> >> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
> >> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >> >>>>
> >> >>>> Our plan is to copy the patched implementation of
> >> >>>> apr_memcache_server_connect and the static methods it calls into a
> >> >>>> new .c
> >> >>>> file we will link into our module, naming the new entry-point
> >> >>>> something
> >> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
> >> >>>> CS/SE perspective this is offensive and we admit it, but from a
> >> >>>> product
> >> >>>> quality perspective we believe this beats freezes and
> >> >>>> complicated/imperfect
> >> >>>> workarounds with threads.
> >> >>>>
> >> >>>> So I have two questions for the Apache community:
> >> >>>>
> >> >>>>    1. What are the practical problems with this approach?  Note
> that
> >> >>>>    in any case a new APR rev would require editing/ifdefing our
> code
> >> >>>> anyway,
> >> >>>>    so I think immunity from APR updates such as this patch being
> >> >>>> applied is
> >> >>>>    not a distinguishing drawback.
> >> >>>>    2. Is there an example of the correct solution to the technical
> >> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >> >>>>    adjust the socket timeout and to discard the connection if the
> >> >>>>    timeout is reached".  That sounds like a pattern that might be
> >> >>>>    found elsewhere in the Apache HTTPD code base.
> >> >>>>
> >> >>>> Thanks in advance for your help!
> >> >>>> -Josh
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
> >> >>>> <tr...@gmail.com>wrote:
> >> >>>>
> >> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
> >> >>>>> <jm...@google.com>
> >> >>>>> wrote:
> >> >>>>> > Is there a mechanism to time out individual operations?
> >> >>>>>
> >> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
> >> >>>>> socket I/O timeout is disabled.
> >> >>>>>
> >> >>>>> Bugzilla bug
> >> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >> >>>>> has a patch, though it is apparently missing a call to adjust the
> >> >>>>> socket timeout and to discard the connection if the timeout is
> >> >>>>> reached.  More importantly, the API can only be changed in future
> >> >>>>> APR
> >> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in
> future
> >> >>>>> APR-Util 1.6.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > If memcached freezes, then it appears my calls to 'get' will
> block
> >> >>>>> until
> >> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
> >> >>>>> > call?
> >> >>>>> >
> >> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
> >> >>>>> > memcached
> >> >>>>> before
> >> >>>>> > doing a 'get'?
> >> >>>>> >
> >> >>>>> > Here are my observations:
> >> >>>>> >
> >> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> >> >>>>> > SIGSTOP
> >> >>>>> the
> >> >>>>> > memcached process. Yes!
> >> >>>>> >
> >> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
> >> >>>>> memcached
> >> >>>>> > process.
> >> >>>>> >
> >> >>>>> > That last one really sucks as I am using that to print the
> status
> >> >>>>> > of
> >> >>>>> all my
> >> >>>>> > cache shards to the log file if I detected a problem :(
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > Why does apr_memcache_multgetp do what I want and not the
> others?
> >> >>>>>  Can I
> >> >>>>> > induce the others to have reasonable timeout behavior?
> >> >>>>> >
> >> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
> >> >>>>> > Apache,
> >> >>>>> at
> >> >>>>> > least with graceful-stop.
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > On a related note, the apr_memcache
> >> >>>>> > documentation<
> >> >>>>>
> >> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >> >>>>> >is
> >> >>>>> > very thin.  I'd be happy to augment it with my observations on
> its
> >> >>>>> > usage
> >> >>>>> > and the meaning of some of the arguments if that was desired.
>  How
> >> >>>>> would I
> >> >>>>> > go about that?
> >> >>>>>
> >> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in
> the
> >> >>>>> include files, build (make dox) and inspect the results, submit a
> >> >>>>> patch to dev@apr.apache.org.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > -Josh
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Born in Roswell... married an alien...
> >> >>>>> http://emptyhammock.com/
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
> >
> >
> > --
> > Born in Roswell... married an alien...
> > http://emptyhammock.com/
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Re: apr_memcache operation timeouts

Posted by Jeff Trawick <tr...@gmail.com>.
On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com> wrote:
>>
>> ping!
>>
>> Please don't hesitate to push back and tell me if I can supply the patch
>> or
>> update in some easier-to-digest form.  In particular, while I have
>> rigorously stress-tested this change using mod_pagespeed's unit test,
>> system-test, and load-test framework, I don't really understand what the
>> testing flow is for APR.  I'd be happy to add unit-tests for that if
>> someone points me to a change-list or patch-file that does it properly.
>>
>> -Josh
>
>
> I'll try hard to work on this in the next couple of days.  It would be great
> to have fixes in APR-Util 1.5.x, which we hope to work on later this week.

Attached is a first pass at getting your patch to apr trunk.  Can you
confirm that I didn't break your tests?  (It passes the apr tests, but
that may not mean much.)



>
>>
>>
>> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
>> wrote:
>>
>> > I have completed a solution to this problem, which can be a drop-in
>> > update
>> > for the existing apr_memcache.c.  It is now checked in for my module as
>> >
>> > http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
>> > .
>> >
>> > It differs from the solution in
>> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>> >
>> >    - It doesn't require an API change; it but it enforces the 50ms
>> >    timeout that already exists for apr_multgetp for all operations.
>> >    - It works under my load test (which I found is not true of the patch
>> >    in 51065).
>> >
>> > For my own purposes, I will be shipping my module with apr_memcache2 so
>> > I
>> > get the behavior I want regardless of what version of Apache is
>> > installed.
>> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
>> > attached, and I've also submitted it as an alternative patch to bug
>> > 51065.
>> >
>> > If you agree with the strategy I used to solve this problem, then please
>> > let me know if I can help with any changes required to get this into the
>> > main distribution,
>> >
>> >
>> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
>> > <jm...@google.com>wrote:
>> >
>> >> I've had some preliminary success with my own variant of apr_memcache.c
>> >> (creatively called apr_memcache2.c).  Rather than setting the socket
>> >> timeout, I've been mimicing the timeout strategy I saw in
>> >> apr_memcache_multgetp, by adding a new helper method:
>> >>
>> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>> >>                                                apr_memcache2_conn_t*
>> >> conn) {
>> >>     apr_pollset_t* pollset;
>> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>> >>     if (rv == APR_SUCCESS) {
>> >>         apr_pollfd_t pollfd;
>> >>         pollfd.desc_type = APR_POLL_SOCKET;
>> >>         pollfd.reqevents = APR_POLLIN;
>> >>         pollfd.p = temp_pool;
>> >>         pollfd.desc.s = conn->sock;
>> >>         pollfd.client_data = NULL;
>> >>         apr_pollset_add(pollset, &pollfd);
>> >>         apr_int32_t queries_recvd;
>> >>         const apr_pollfd_t* activefds;
>> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
>> >> &queries_recvd,
>> >>                               &activefds);
>> >>         if (rv == APR_SUCCESS) {
>> >>             assert(queries_recvd == 1);
>> >>             assert(activefds->desc.s == conn->sock);
>> >>             assert(activefds->client_data == NULL);
>> >>         }
>> >>     }
>> >>     return rv;
>> >> }
>> >>
>> >> And calling that before many of the existing calls to get_server_line
>> >> as:
>> >>
>> >>     rv = wait_for_server_or_timeout_no_pool(conn);
>> >>     if (rv != APR_SUCCESS) {
>> >>         ms_release_conn(ms, conn);
>> >>         return rv;
>> >>     }
>> >>
>> >> This is just an experiment; I think I can streamline this by
>> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
>> >> (actually now apr_memcache2_conn_t).
>> >>
>> >> I have two questions about this:
>> >>
>> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
>> >> is
>> >> somewhat different from the one that ships with Apache 2.2.  In
>> >> particular
>> >> the 2.4 version cannot be compiled against the headers that come with a
>> >> 2.2
>> >> distribution.  Is there any downside to taking my hacked 2.2
>> >> apr_memcache.c
>> >> and running it in Apache 2.4?  Or should I maintain two hacks?
>> >>
>> >> 2. This seems wasteful in terms of system calls.  I am making an extra
>> >> call to poll, rather than relying on the socket timeout.  The socket
>> >> timeout didn't work as well as this though.  Does anyone have any
>> >> theories
>> >> as to why, or what could be done to the patch in
>> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>> >>
>> >> -Josh
>> >>
>> >>
>> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
>> >> <jm...@google.com>wrote:
>> >>
>> >>> Following up: I tried doing what I suggested above: patching that
>> >>> change
>> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
>> >>> pull
>> >>> in only part of apr_memcache.c because that file changed slightly
>> >>> between
>> >>> 2.2 and 2.4 and our module works in both.
>> >>>
>> >>> I was successful making my own version of apr_memcache (renaming
>> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>> >>> socket timeout from -1 to 10 seconds, then the system behaved very
>> >>> poorly
>> >>> under load test (though it worked fine in our unit-tests and
>> >>> system-tests).
>> >>>  In other words, I think the proposed patch that Jeff pointed to above
>> >>> is
>> >>> not really working (as he predicted).  This test was done without
>> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> >>> thereafter behave poorly.
>> >>>
>> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
>> >>> to
>> >>> pursue my own complicated mechanism of timing out the calls from my
>> >>> side.
>> >>>
>> >>> -Josh
>> >>>
>> >>>
>> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
>> >>> <jm...@google.com>wrote:
>> >>>
>> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
>> >>>> action and before doing any work toward this, I'd like to understand
>> >>>> the
>> >>>> pitfalls from people that understand Apache better than us.
>> >>>>
>> >>>> Here's our reality: we believe we need to incorporate memcached for
>> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>> >>>> large sites & hosting providers.  We are fairly close (we think) to
>> >>>> releasing this functionality as beta.  However, in such large sites,
>> >>>> stuff
>> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
>> >>>> does we
>> >>>> want to fall back to serving partially unoptimized sites rather than
>> >>>> hanging the servers.
>> >>>>
>> >>>> I understand the realities of backward-compatible APIs.  My
>> >>>> expectation
>> >>>> is that this would take years to make it into an APR distribution we
>> >>>> could
>> >>>> depend on.  We want to deploy this functionality in weeks.  The
>> >>>> workarounds
>> >>>> we have tried backgrounding the apr_memcache calls in a thread and
>> >>>> timing
>> >>>> out in mainline are complex and even once they work 100% will be very
>> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>> >>>> 'apachectl stop') if this happens more than (say) once a month.
>> >>>>
>> >>>> Our plan is to copy the patched implementation of
>> >>>> apr_memcache_server_connect and the static methods it calls into a
>> >>>> new .c
>> >>>> file we will link into our module, naming the new entry-point
>> >>>> something
>> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>> >>>> CS/SE perspective this is offensive and we admit it, but from a
>> >>>> product
>> >>>> quality perspective we believe this beats freezes and
>> >>>> complicated/imperfect
>> >>>> workarounds with threads.
>> >>>>
>> >>>> So I have two questions for the Apache community:
>> >>>>
>> >>>>    1. What are the practical problems with this approach?  Note that
>> >>>>    in any case a new APR rev would require editing/ifdefing our code
>> >>>> anyway,
>> >>>>    so I think immunity from APR updates such as this patch being
>> >>>> applied is
>> >>>>    not a distinguishing drawback.
>> >>>>    2. Is there an example of the correct solution to the technical
>> >>>>    problem Jeff highlighted: "it is apparently missing a call to
>> >>>>    adjust the socket timeout and to discard the connection if the
>> >>>>    timeout is reached".  That sounds like a pattern that might be
>> >>>>    found elsewhere in the Apache HTTPD code base.
>> >>>>
>> >>>> Thanks in advance for your help!
>> >>>> -Josh
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
>> >>>> <tr...@gmail.com>wrote:
>> >>>>
>> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
>> >>>>> <jm...@google.com>
>> >>>>> wrote:
>> >>>>> > Is there a mechanism to time out individual operations?
>> >>>>>
>> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
>> >>>>> socket I/O timeout is disabled.
>> >>>>>
>> >>>>> Bugzilla bug
>> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>> >>>>> has a patch, though it is apparently missing a call to adjust the
>> >>>>> socket timeout and to discard the connection if the timeout is
>> >>>>> reached.  More importantly, the API can only be changed in future
>> >>>>> APR
>> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>> >>>>> APR-Util 1.6.
>> >>>>>
>> >>>>> >
>> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
>> >>>>> until
>> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
>> >>>>> > call?
>> >>>>> >
>> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
>> >>>>> > memcached
>> >>>>> before
>> >>>>> > doing a 'get'?
>> >>>>> >
>> >>>>> > Here are my observations:
>> >>>>> >
>> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
>> >>>>> > SIGSTOP
>> >>>>> the
>> >>>>> > memcached process. Yes!
>> >>>>> >
>> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > That last one really sucks as I am using that to print the status
>> >>>>> > of
>> >>>>> all my
>> >>>>> > cache shards to the log file if I detected a problem :(
>> >>>>> >
>> >>>>> >
>> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>> >>>>>  Can I
>> >>>>> > induce the others to have reasonable timeout behavior?
>> >>>>> >
>> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
>> >>>>> > Apache,
>> >>>>> at
>> >>>>> > least with graceful-stop.
>> >>>>> >
>> >>>>> >
>> >>>>> > On a related note, the apr_memcache
>> >>>>> > documentation<
>> >>>>>
>> >>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>> >>>>> >is
>> >>>>> > very thin.  I'd be happy to augment it with my observations on its
>> >>>>> > usage
>> >>>>> > and the meaning of some of the arguments if that was desired.  How
>> >>>>> would I
>> >>>>> > go about that?
>> >>>>>
>> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>> >>>>> include files, build (make dox) and inspect the results, submit a
>> >>>>> patch to dev@apr.apache.org.
>> >>>>>
>> >>>>> >
>> >>>>> > -Josh
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Born in Roswell... married an alien...
>> >>>>> http://emptyhammock.com/
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: apr_memcache operation timeouts

Posted by Jeff Trawick <tr...@gmail.com>.
On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <tr...@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com> wrote:
>>
>> ping!
>>
>> Please don't hesitate to push back and tell me if I can supply the patch
>> or
>> update in some easier-to-digest form.  In particular, while I have
>> rigorously stress-tested this change using mod_pagespeed's unit test,
>> system-test, and load-test framework, I don't really understand what the
>> testing flow is for APR.  I'd be happy to add unit-tests for that if
>> someone points me to a change-list or patch-file that does it properly.
>>
>> -Josh
>
>
> I'll try hard to work on this in the next couple of days.  It would be great
> to have fixes in APR-Util 1.5.x, which we hope to work on later this week.

Attached is a first pass at getting your patch to apr trunk.  Can you
confirm that I didn't break your tests?  (It passes the apr tests, but
that may not mean much.)



>
>>
>>
>> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
>> wrote:
>>
>> > I have completed a solution to this problem, which can be a drop-in
>> > update
>> > for the existing apr_memcache.c.  It is now checked in for my module as
>> >
>> > http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
>> > .
>> >
>> > It differs from the solution in
>> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>> >
>> >    - It doesn't require an API change; it but it enforces the 50ms
>> >    timeout that already exists for apr_multgetp for all operations.
>> >    - It works under my load test (which I found is not true of the patch
>> >    in 51065).
>> >
>> > For my own purposes, I will be shipping my module with apr_memcache2 so
>> > I
>> > get the behavior I want regardless of what version of Apache is
>> > installed.
>> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
>> > attached, and I've also submitted it as an alternative patch to bug
>> > 51065.
>> >
>> > If you agree with the strategy I used to solve this problem, then please
>> > let me know if I can help with any changes required to get this into the
>> > main distribution,
>> >
>> >
>> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
>> > <jm...@google.com>wrote:
>> >
>> >> I've had some preliminary success with my own variant of apr_memcache.c
>> >> (creatively called apr_memcache2.c).  Rather than setting the socket
>> >> timeout, I've been mimicing the timeout strategy I saw in
>> >> apr_memcache_multgetp, by adding a new helper method:
>> >>
>> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>> >>                                                apr_memcache2_conn_t*
>> >> conn) {
>> >>     apr_pollset_t* pollset;
>> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>> >>     if (rv == APR_SUCCESS) {
>> >>         apr_pollfd_t pollfd;
>> >>         pollfd.desc_type = APR_POLL_SOCKET;
>> >>         pollfd.reqevents = APR_POLLIN;
>> >>         pollfd.p = temp_pool;
>> >>         pollfd.desc.s = conn->sock;
>> >>         pollfd.client_data = NULL;
>> >>         apr_pollset_add(pollset, &pollfd);
>> >>         apr_int32_t queries_recvd;
>> >>         const apr_pollfd_t* activefds;
>> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
>> >> &queries_recvd,
>> >>                               &activefds);
>> >>         if (rv == APR_SUCCESS) {
>> >>             assert(queries_recvd == 1);
>> >>             assert(activefds->desc.s == conn->sock);
>> >>             assert(activefds->client_data == NULL);
>> >>         }
>> >>     }
>> >>     return rv;
>> >> }
>> >>
>> >> And calling that before many of the existing calls to get_server_line
>> >> as:
>> >>
>> >>     rv = wait_for_server_or_timeout_no_pool(conn);
>> >>     if (rv != APR_SUCCESS) {
>> >>         ms_release_conn(ms, conn);
>> >>         return rv;
>> >>     }
>> >>
>> >> This is just an experiment; I think I can streamline this by
>> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
>> >> (actually now apr_memcache2_conn_t).
>> >>
>> >> I have two questions about this:
>> >>
>> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
>> >> is
>> >> somewhat different from the one that ships with Apache 2.2.  In
>> >> particular
>> >> the 2.4 version cannot be compiled against the headers that come with a
>> >> 2.2
>> >> distribution.  Is there any downside to taking my hacked 2.2
>> >> apr_memcache.c
>> >> and running it in Apache 2.4?  Or should I maintain two hacks?
>> >>
>> >> 2. This seems wasteful in terms of system calls.  I am making an extra
>> >> call to poll, rather than relying on the socket timeout.  The socket
>> >> timeout didn't work as well as this though.  Does anyone have any
>> >> theories
>> >> as to why, or what could be done to the patch in
>> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>> >>
>> >> -Josh
>> >>
>> >>
>> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
>> >> <jm...@google.com>wrote:
>> >>
>> >>> Following up: I tried doing what I suggested above: patching that
>> >>> change
>> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
>> >>> pull
>> >>> in only part of apr_memcache.c because that file changed slightly
>> >>> between
>> >>> 2.2 and 2.4 and our module works in both.
>> >>>
>> >>> I was successful making my own version of apr_memcache (renaming
>> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>> >>> socket timeout from -1 to 10 seconds, then the system behaved very
>> >>> poorly
>> >>> under load test (though it worked fine in our unit-tests and
>> >>> system-tests).
>> >>>  In other words, I think the proposed patch that Jeff pointed to above
>> >>> is
>> >>> not really working (as he predicted).  This test was done without
>> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> >>> thereafter behave poorly.
>> >>>
>> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
>> >>> to
>> >>> pursue my own complicated mechanism of timing out the calls from my
>> >>> side.
>> >>>
>> >>> -Josh
>> >>>
>> >>>
>> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
>> >>> <jm...@google.com>wrote:
>> >>>
>> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
>> >>>> action and before doing any work toward this, I'd like to understand
>> >>>> the
>> >>>> pitfalls from people that understand Apache better than us.
>> >>>>
>> >>>> Here's our reality: we believe we need to incorporate memcached for
>> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>> >>>> large sites & hosting providers.  We are fairly close (we think) to
>> >>>> releasing this functionality as beta.  However, in such large sites,
>> >>>> stuff
>> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
>> >>>> does we
>> >>>> want to fall back to serving partially unoptimized sites rather than
>> >>>> hanging the servers.
>> >>>>
>> >>>> I understand the realities of backward-compatible APIs.  My
>> >>>> expectation
>> >>>> is that this would take years to make it into an APR distribution we
>> >>>> could
>> >>>> depend on.  We want to deploy this functionality in weeks.  The
>> >>>> workarounds
>> >>>> we have tried backgrounding the apr_memcache calls in a thread and
>> >>>> timing
>> >>>> out in mainline are complex and even once they work 100% will be very
>> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>> >>>> 'apachectl stop') if this happens more than (say) once a month.
>> >>>>
>> >>>> Our plan is to copy the patched implementation of
>> >>>> apr_memcache_server_connect and the static methods it calls into a
>> >>>> new .c
>> >>>> file we will link into our module, naming the new entry-point
>> >>>> something
>> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>> >>>> CS/SE perspective this is offensive and we admit it, but from a
>> >>>> product
>> >>>> quality perspective we believe this beats freezes and
>> >>>> complicated/imperfect
>> >>>> workarounds with threads.
>> >>>>
>> >>>> So I have two questions for the Apache community:
>> >>>>
>> >>>>    1. What are the practical problems with this approach?  Note that
>> >>>>    in any case a new APR rev would require editing/ifdefing our code
>> >>>> anyway,
>> >>>>    so I think immunity from APR updates such as this patch being
>> >>>> applied is
>> >>>>    not a distinguishing drawback.
>> >>>>    2. Is there an example of the correct solution to the technical
>> >>>>    problem Jeff highlighted: "it is apparently missing a call to
>> >>>>    adjust the socket timeout and to discard the connection if the
>> >>>>    timeout is reached".  That sounds like a pattern that might be
>> >>>>    found elsewhere in the Apache HTTPD code base.
>> >>>>
>> >>>> Thanks in advance for your help!
>> >>>> -Josh
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
>> >>>> <tr...@gmail.com>wrote:
>> >>>>
>> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
>> >>>>> <jm...@google.com>
>> >>>>> wrote:
>> >>>>> > Is there a mechanism to time out individual operations?
>> >>>>>
>> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
>> >>>>> socket I/O timeout is disabled.
>> >>>>>
>> >>>>> Bugzilla bug
>> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>> >>>>> has a patch, though it is apparently missing a call to adjust the
>> >>>>> socket timeout and to discard the connection if the timeout is
>> >>>>> reached.  More importantly, the API can only be changed in future
>> >>>>> APR
>> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>> >>>>> APR-Util 1.6.
>> >>>>>
>> >>>>> >
>> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
>> >>>>> until
>> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
>> >>>>> > call?
>> >>>>> >
>> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
>> >>>>> > memcached
>> >>>>> before
>> >>>>> > doing a 'get'?
>> >>>>> >
>> >>>>> > Here are my observations:
>> >>>>> >
>> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
>> >>>>> > SIGSTOP
>> >>>>> the
>> >>>>> > memcached process. Yes!
>> >>>>> >
>> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > That last one really sucks as I am using that to print the status
>> >>>>> > of
>> >>>>> all my
>> >>>>> > cache shards to the log file if I detected a problem :(
>> >>>>> >
>> >>>>> >
>> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>> >>>>>  Can I
>> >>>>> > induce the others to have reasonable timeout behavior?
>> >>>>> >
>> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
>> >>>>> > Apache,
>> >>>>> at
>> >>>>> > least with graceful-stop.
>> >>>>> >
>> >>>>> >
>> >>>>> > On a related note, the apr_memcache
>> >>>>> > documentation<
>> >>>>>
>> >>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>> >>>>> >is
>> >>>>> > very thin.  I'd be happy to augment it with my observations on its
>> >>>>> > usage
>> >>>>> > and the meaning of some of the arguments if that was desired.  How
>> >>>>> would I
>> >>>>> > go about that?
>> >>>>>
>> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>> >>>>> include files, build (make dox) and inspect the results, submit a
>> >>>>> patch to dev@apr.apache.org.
>> >>>>>
>> >>>>> >
>> >>>>> > -Josh
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Born in Roswell... married an alien...
>> >>>>> http://emptyhammock.com/
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: apr_memcache operation timeouts

Posted by Jeff Trawick <tr...@gmail.com>.
On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com> wrote:

> ping!
>
> Please don't hesitate to push back and tell me if I can supply the patch or
> update in some easier-to-digest form.  In particular, while I have
> rigorously stress-tested this change using mod_pagespeed's unit test,
> system-test, and load-test framework, I don't really understand what the
> testing flow is for APR.  I'd be happy to add unit-tests for that if
> someone points me to a change-list or patch-file that does it properly.
>
> -Josh
>

I'll try hard to work on this in the next couple of days.  It would be
great to have fixes in APR-Util 1.5.x, which we hope to work on later this
week.


>
> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
> wrote:
>
> > I have completed a solution to this problem, which can be a drop-in
> update
> > for the existing apr_memcache.c.  It is now checked in for my module as
> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> > .
> >
> > It differs from the solution in
> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >
> >    - It doesn't require an API change; it but it enforces the 50ms
> >    timeout that already exists for apr_multgetp for all operations.
> >    - It works under my load test (which I found is not true of the patch
> >    in 51065).
> >
> > For my own purposes, I will be shipping my module with apr_memcache2 so I
> > get the behavior I want regardless of what version of Apache is
> installed.
> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> > attached, and I've also submitted it as an alternative patch to bug
> 51065.
> >
> > If you agree with the strategy I used to solve this problem, then please
> > let me know if I can help with any changes required to get this into the
> > main distribution,
> >
> >
> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >
> >> I've had some preliminary success with my own variant of apr_memcache.c
> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> timeout, I've been mimicing the timeout strategy I saw in
> >> apr_memcache_multgetp, by adding a new helper method:
> >>
> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >>                                                apr_memcache2_conn_t*
> >> conn) {
> >>     apr_pollset_t* pollset;
> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
> >>     if (rv == APR_SUCCESS) {
> >>         apr_pollfd_t pollfd;
> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >>         pollfd.reqevents = APR_POLLIN;
> >>         pollfd.p = temp_pool;
> >>         pollfd.desc.s = conn->sock;
> >>         pollfd.client_data = NULL;
> >>         apr_pollset_add(pollset, &pollfd);
> >>         apr_int32_t queries_recvd;
> >>         const apr_pollfd_t* activefds;
> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
> >>                               &activefds);
> >>         if (rv == APR_SUCCESS) {
> >>             assert(queries_recvd == 1);
> >>             assert(activefds->desc.s == conn->sock);
> >>             assert(activefds->client_data == NULL);
> >>         }
> >>     }
> >>     return rv;
> >> }
> >>
> >> And calling that before many of the existing calls to get_server_line
> as:
> >>
> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >>     if (rv != APR_SUCCESS) {
> >>         ms_release_conn(ms, conn);
> >>         return rv;
> >>     }
> >>
> >> This is just an experiment; I think I can streamline this by
> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
> >> (actually now apr_memcache2_conn_t).
> >>
> >> I have two questions about this:
> >>
> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
> >> somewhat different from the one that ships with Apache 2.2.  In
> particular
> >> the 2.4 version cannot be compiled against the headers that come with a
> 2.2
> >> distribution.  Is there any downside to taking my hacked 2.2
> apr_memcache.c
> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >>
> >> 2. This seems wasteful in terms of system calls.  I am making an extra
> >> call to poll, rather than relying on the socket timeout.  The socket
> >> timeout didn't work as well as this though.  Does anyone have any
> theories
> >> as to why, or what could be done to the patch in
> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >>
> >> -Josh
> >>
> >>
> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >>
> >>> Following up: I tried doing what I suggested above: patching that
> change
> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
> pull
> >>> in only part of apr_memcache.c because that file changed slightly
> between
> >>> 2.2 and 2.4 and our module works in both.
> >>>
> >>> I was successful making my own version of apr_memcache (renaming
> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> poorly
> >>> under load test (though it worked fine in our unit-tests and
> system-tests).
> >>>  In other words, I think the proposed patch that Jeff pointed to above
> is
> >>> not really working (as he predicted).  This test was done without
> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
> >>> thereafter behave poorly.
> >>>
> >>> I'm going to follow up on that bugzilla entry, but for now I'm going to
> >>> pursue my own complicated mechanism of timing out the calls from my
> side.
> >>>
> >>> -Josh
> >>>
> >>>
> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >>>
> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
> >>>> action and before doing any work toward this, I'd like to understand
> the
> >>>> pitfalls from people that understand Apache better than us.
> >>>>
> >>>> Here's our reality: we believe we need to incorporate memcached for
> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
> >>>> large sites & hosting providers.  We are fairly close (we think) to
> >>>> releasing this functionality as beta.  However, in such large sites,
> stuff
> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
> does we
> >>>> want to fall back to serving partially unoptimized sites rather than
> >>>> hanging the servers.
> >>>>
> >>>> I understand the realities of backward-compatible APIs.  My
> expectation
> >>>> is that this would take years to make it into an APR distribution we
> could
> >>>> depend on.  We want to deploy this functionality in weeks.  The
> workarounds
> >>>> we have tried backgrounding the apr_memcache calls in a thread and
> timing
> >>>> out in mainline are complex and even once they work 100% will be very
> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >>>>
> >>>> Our plan is to copy the patched implementation of
> >>>> apr_memcache_server_connect and the static methods it calls into a
> new .c
> >>>> file we will link into our module, naming the new entry-point
> something
> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
> >>>> CS/SE perspective this is offensive and we admit it, but from a
> product
> >>>> quality perspective we believe this beats freezes and
> complicated/imperfect
> >>>> workarounds with threads.
> >>>>
> >>>> So I have two questions for the Apache community:
> >>>>
> >>>>    1. What are the practical problems with this approach?  Note that
> >>>>    in any case a new APR rev would require editing/ifdefing our code
> anyway,
> >>>>    so I think immunity from APR updates such as this patch being
> applied is
> >>>>    not a distinguishing drawback.
> >>>>    2. Is there an example of the correct solution to the technical
> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >>>>    adjust the socket timeout and to discard the connection if the
> >>>>    timeout is reached".  That sounds like a pattern that might be
> >>>>    found elsewhere in the Apache HTTPD code base.
> >>>>
> >>>> Thanks in advance for your help!
> >>>> -Josh
> >>>>
> >>>>
> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <trawick@gmail.com
> >wrote:
> >>>>
> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmarantz@google.com
> >
> >>>>> wrote:
> >>>>> > Is there a mechanism to time out individual operations?
> >>>>>
> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
> >>>>> socket I/O timeout is disabled.
> >>>>>
> >>>>> Bugzilla bug
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >>>>> has a patch, though it is apparently missing a call to adjust the
> >>>>> socket timeout and to discard the connection if the timeout is
> >>>>> reached.  More importantly, the API can only be changed in future APR
> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
> >>>>> APR-Util 1.6.
> >>>>>
> >>>>> >
> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
> >>>>> until
> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
> call?
> >>>>> >
> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
> >>>>> before
> >>>>> > doing a 'get'?
> >>>>> >
> >>>>> > Here are my observations:
> >>>>> >
> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> SIGSTOP
> >>>>> the
> >>>>> > memcached process. Yes!
> >>>>> >
> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > That last one really sucks as I am using that to print the status
> of
> >>>>> all my
> >>>>> > cache shards to the log file if I detected a problem :(
> >>>>> >
> >>>>> >
> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
> >>>>>  Can I
> >>>>> > induce the others to have reasonable timeout behavior?
> >>>>> >
> >>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache,
> >>>>> at
> >>>>> > least with graceful-stop.
> >>>>> >
> >>>>> >
> >>>>> > On a related note, the apr_memcache
> >>>>> > documentation<
> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >>>>> >is
> >>>>> > very thin.  I'd be happy to augment it with my observations on its
> >>>>> > usage
> >>>>> > and the meaning of some of the arguments if that was desired.  How
> >>>>> would I
> >>>>> > go about that?
> >>>>>
> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
> >>>>> include files, build (make dox) and inspect the results, submit a
> >>>>> patch to dev@apr.apache.org.
> >>>>>
> >>>>> >
> >>>>> > -Josh
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Born in Roswell... married an alien...
> >>>>> http://emptyhammock.com/
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: apr_memcache operation timeouts

Posted by Jeff Trawick <tr...@gmail.com>.
On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jm...@google.com> wrote:

> ping!
>
> Please don't hesitate to push back and tell me if I can supply the patch or
> update in some easier-to-digest form.  In particular, while I have
> rigorously stress-tested this change using mod_pagespeed's unit test,
> system-test, and load-test framework, I don't really understand what the
> testing flow is for APR.  I'd be happy to add unit-tests for that if
> someone points me to a change-list or patch-file that does it properly.
>
> -Josh
>

I'll try hard to work on this in the next couple of days.  It would be
great to have fixes in APR-Util 1.5.x, which we hope to work on later this
week.


>
> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com>
> wrote:
>
> > I have completed a solution to this problem, which can be a drop-in
> update
> > for the existing apr_memcache.c.  It is now checked in for my module as
> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> > .
> >
> > It differs from the solution in
> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >
> >    - It doesn't require an API change; it but it enforces the 50ms
> >    timeout that already exists for apr_multgetp for all operations.
> >    - It works under my load test (which I found is not true of the patch
> >    in 51065).
> >
> > For my own purposes, I will be shipping my module with apr_memcache2 so I
> > get the behavior I want regardless of what version of Apache is
> installed.
> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> > attached, and I've also submitted it as an alternative patch to bug
> 51065.
> >
> > If you agree with the strategy I used to solve this problem, then please
> > let me know if I can help with any changes required to get this into the
> > main distribution,
> >
> >
> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >
> >> I've had some preliminary success with my own variant of apr_memcache.c
> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> timeout, I've been mimicing the timeout strategy I saw in
> >> apr_memcache_multgetp, by adding a new helper method:
> >>
> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >>                                                apr_memcache2_conn_t*
> >> conn) {
> >>     apr_pollset_t* pollset;
> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
> >>     if (rv == APR_SUCCESS) {
> >>         apr_pollfd_t pollfd;
> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >>         pollfd.reqevents = APR_POLLIN;
> >>         pollfd.p = temp_pool;
> >>         pollfd.desc.s = conn->sock;
> >>         pollfd.client_data = NULL;
> >>         apr_pollset_add(pollset, &pollfd);
> >>         apr_int32_t queries_recvd;
> >>         const apr_pollfd_t* activefds;
> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
> >>                               &activefds);
> >>         if (rv == APR_SUCCESS) {
> >>             assert(queries_recvd == 1);
> >>             assert(activefds->desc.s == conn->sock);
> >>             assert(activefds->client_data == NULL);
> >>         }
> >>     }
> >>     return rv;
> >> }
> >>
> >> And calling that before many of the existing calls to get_server_line
> as:
> >>
> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >>     if (rv != APR_SUCCESS) {
> >>         ms_release_conn(ms, conn);
> >>         return rv;
> >>     }
> >>
> >> This is just an experiment; I think I can streamline this by
> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
> >> (actually now apr_memcache2_conn_t).
> >>
> >> I have two questions about this:
> >>
> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
> >> somewhat different from the one that ships with Apache 2.2.  In
> particular
> >> the 2.4 version cannot be compiled against the headers that come with a
> 2.2
> >> distribution.  Is there any downside to taking my hacked 2.2
> apr_memcache.c
> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >>
> >> 2. This seems wasteful in terms of system calls.  I am making an extra
> >> call to poll, rather than relying on the socket timeout.  The socket
> >> timeout didn't work as well as this though.  Does anyone have any
> theories
> >> as to why, or what could be done to the patch in
> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >>
> >> -Josh
> >>
> >>
> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >>
> >>> Following up: I tried doing what I suggested above: patching that
> change
> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
> pull
> >>> in only part of apr_memcache.c because that file changed slightly
> between
> >>> 2.2 and 2.4 and our module works in both.
> >>>
> >>> I was successful making my own version of apr_memcache (renaming
> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> poorly
> >>> under load test (though it worked fine in our unit-tests and
> system-tests).
> >>>  In other words, I think the proposed patch that Jeff pointed to above
> is
> >>> not really working (as he predicted).  This test was done without
> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
> >>> thereafter behave poorly.
> >>>
> >>> I'm going to follow up on that bugzilla entry, but for now I'm going to
> >>> pursue my own complicated mechanism of timing out the calls from my
> side.
> >>>
> >>> -Josh
> >>>
> >>>
> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmarantz@google.com
> >wrote:
> >>>
> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
> >>>> action and before doing any work toward this, I'd like to understand
> the
> >>>> pitfalls from people that understand Apache better than us.
> >>>>
> >>>> Here's our reality: we believe we need to incorporate memcached for
> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
> >>>> large sites & hosting providers.  We are fairly close (we think) to
> >>>> releasing this functionality as beta.  However, in such large sites,
> stuff
> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
> does we
> >>>> want to fall back to serving partially unoptimized sites rather than
> >>>> hanging the servers.
> >>>>
> >>>> I understand the realities of backward-compatible APIs.  My
> expectation
> >>>> is that this would take years to make it into an APR distribution we
> could
> >>>> depend on.  We want to deploy this functionality in weeks.  The
> workarounds
> >>>> we have tried backgrounding the apr_memcache calls in a thread and
> timing
> >>>> out in mainline are complex and even once they work 100% will be very
> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >>>>
> >>>> Our plan is to copy the patched implementation of
> >>>> apr_memcache_server_connect and the static methods it calls into a
> new .c
> >>>> file we will link into our module, naming the new entry-point
> something
> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
> >>>> CS/SE perspective this is offensive and we admit it, but from a
> product
> >>>> quality perspective we believe this beats freezes and
> complicated/imperfect
> >>>> workarounds with threads.
> >>>>
> >>>> So I have two questions for the Apache community:
> >>>>
> >>>>    1. What are the practical problems with this approach?  Note that
> >>>>    in any case a new APR rev would require editing/ifdefing our code
> anyway,
> >>>>    so I think immunity from APR updates such as this patch being
> applied is
> >>>>    not a distinguishing drawback.
> >>>>    2. Is there an example of the correct solution to the technical
> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >>>>    adjust the socket timeout and to discard the connection if the
> >>>>    timeout is reached".  That sounds like a pattern that might be
> >>>>    found elsewhere in the Apache HTTPD code base.
> >>>>
> >>>> Thanks in advance for your help!
> >>>> -Josh
> >>>>
> >>>>
> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <trawick@gmail.com
> >wrote:
> >>>>
> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmarantz@google.com
> >
> >>>>> wrote:
> >>>>> > Is there a mechanism to time out individual operations?
> >>>>>
> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
> >>>>> socket I/O timeout is disabled.
> >>>>>
> >>>>> Bugzilla bug
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >>>>> has a patch, though it is apparently missing a call to adjust the
> >>>>> socket timeout and to discard the connection if the timeout is
> >>>>> reached.  More importantly, the API can only be changed in future APR
> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
> >>>>> APR-Util 1.6.
> >>>>>
> >>>>> >
> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
> >>>>> until
> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
> call?
> >>>>> >
> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
> >>>>> before
> >>>>> > doing a 'get'?
> >>>>> >
> >>>>> > Here are my observations:
> >>>>> >
> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> SIGSTOP
> >>>>> the
> >>>>> > memcached process. Yes!
> >>>>> >
> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > That last one really sucks as I am using that to print the status
> of
> >>>>> all my
> >>>>> > cache shards to the log file if I detected a problem :(
> >>>>> >
> >>>>> >
> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
> >>>>>  Can I
> >>>>> > induce the others to have reasonable timeout behavior?
> >>>>> >
> >>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache,
> >>>>> at
> >>>>> > least with graceful-stop.
> >>>>> >
> >>>>> >
> >>>>> > On a related note, the apr_memcache
> >>>>> > documentation<
> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >>>>> >is
> >>>>> > very thin.  I'd be happy to augment it with my observations on its
> >>>>> > usage
> >>>>> > and the meaning of some of the arguments if that was desired.  How
> >>>>> would I
> >>>>> > go about that?
> >>>>>
> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
> >>>>> include files, build (make dox) and inspect the results, submit a
> >>>>> patch to dev@apr.apache.org.
> >>>>>
> >>>>> >
> >>>>> > -Josh
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Born in Roswell... married an alien...
> >>>>> http://emptyhammock.com/
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: apr_memcache operation timeouts

Posted by Joshua Marantz <jm...@google.com>.
ping!

Please don't hesitate to push back and tell me if I can supply the patch or
update in some easier-to-digest form.  In particular, while I have
rigorously stress-tested this change using mod_pagespeed's unit test,
system-test, and load-test framework, I don't really understand what the
testing flow is for APR.  I'd be happy to add unit-tests for that if
someone points me to a change-list or patch-file that does it properly.

-Josh


On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com> wrote:

> I have completed a solution to this problem, which can be a drop-in update
> for the existing apr_memcache.c.  It is now checked in for my module as
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> .
>
> It differs from the solution in
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>
>    - It doesn't require an API change; it but it enforces the 50ms
>    timeout that already exists for apr_multgetp for all operations.
>    - It works under my load test (which I found is not true of the patch
>    in 51065).
>
> For my own purposes, I will be shipping my module with apr_memcache2 so I
> get the behavior I want regardless of what version of Apache is installed.
>  But I'd like to propose my patch for apr_memcache.c.  The patch is
> attached, and I've also submitted it as an alternative patch to bug 51065.
>
> If you agree with the strategy I used to solve this problem, then please
> let me know if I can help with any changes required to get this into the
> main distribution,
>
>
> On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jm...@google.com>wrote:
>
>> I've had some preliminary success with my own variant of apr_memcache.c
>> (creatively called apr_memcache2.c).  Rather than setting the socket
>> timeout, I've been mimicing the timeout strategy I saw in
>> apr_memcache_multgetp, by adding a new helper method:
>>
>> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>>                                                apr_memcache2_conn_t*
>> conn) {
>>     apr_pollset_t* pollset;
>>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>>     if (rv == APR_SUCCESS) {
>>         apr_pollfd_t pollfd;
>>         pollfd.desc_type = APR_POLL_SOCKET;
>>         pollfd.reqevents = APR_POLLIN;
>>         pollfd.p = temp_pool;
>>         pollfd.desc.s = conn->sock;
>>         pollfd.client_data = NULL;
>>         apr_pollset_add(pollset, &pollfd);
>>         apr_int32_t queries_recvd;
>>         const apr_pollfd_t* activefds;
>>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
>>                               &activefds);
>>         if (rv == APR_SUCCESS) {
>>             assert(queries_recvd == 1);
>>             assert(activefds->desc.s == conn->sock);
>>             assert(activefds->client_data == NULL);
>>         }
>>     }
>>     return rv;
>> }
>>
>> And calling that before many of the existing calls to get_server_line as:
>>
>>     rv = wait_for_server_or_timeout_no_pool(conn);
>>     if (rv != APR_SUCCESS) {
>>         ms_release_conn(ms, conn);
>>         return rv;
>>     }
>>
>> This is just an experiment; I think I can streamline this by
>> pre-populating the pollfd structure as part of the apr_memcache_conn_t
>> (actually now apr_memcache2_conn_t).
>>
>> I have two questions about this:
>>
>> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
>> somewhat different from the one that ships with Apache 2.2.  In particular
>> the 2.4 version cannot be compiled against the headers that come with a 2.2
>> distribution.  Is there any downside to taking my hacked 2.2 apr_memcache.c
>> and running it in Apache 2.4?  Or should I maintain two hacks?
>>
>> 2. This seems wasteful in terms of system calls.  I am making an extra
>> call to poll, rather than relying on the socket timeout.  The socket
>> timeout didn't work as well as this though.  Does anyone have any theories
>> as to why, or what could be done to the patch in
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>>
>> -Josh
>>
>>
>> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jm...@google.com>wrote:
>>
>>> Following up: I tried doing what I suggested above: patching that change
>>> into my own copy of apr_memcache.c  It was first of all a bad idea to pull
>>> in only part of apr_memcache.c because that file changed slightly between
>>> 2.2 and 2.4 and our module works in both.
>>>
>>> I was successful making my own version of apr_memcache (renaming
>>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>>> socket timeout from -1 to 10 seconds, then the system behaved very poorly
>>> under load test (though it worked fine in our unit-tests and system-tests).
>>>  In other words, I think the proposed patch that Jeff pointed to above is
>>> not really working (as he predicted).  This test was done without
>>> SIGSTOPing the memcached; it would timeout under our load anyway and
>>> thereafter behave poorly.
>>>
>>> I'm going to follow up on that bugzilla entry, but for now I'm going to
>>> pursue my own complicated mechanism of timing out the calls from my side.
>>>
>>> -Josh
>>>
>>>
>>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jm...@google.com>wrote:
>>>
>>>> Thanks Jeff, that is very helpful.  We are considering a course of
>>>> action and before doing any work toward this, I'd like to understand the
>>>> pitfalls from people that understand Apache better than us.
>>>>
>>>> Here's our reality: we believe we need to incorporate memcached for
>>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>>>> large sites & hosting providers.  We are fairly close (we think) to
>>>> releasing this functionality as beta.  However, in such large sites, stuff
>>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it does we
>>>> want to fall back to serving partially unoptimized sites rather than
>>>> hanging the servers.
>>>>
>>>> I understand the realities of backward-compatible APIs.  My expectation
>>>> is that this would take years to make it into an APR distribution we could
>>>> depend on.  We want to deploy this functionality in weeks.  The workarounds
>>>> we have tried backgrounding the apr_memcache calls in a thread and timing
>>>> out in mainline are complex and even once they work 100% will be very
>>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>>>> 'apachectl stop') if this happens more than (say) once a month.
>>>>
>>>> Our plan is to copy the patched implementation of
>>>> apr_memcache_server_connect and the static methods it calls into a new .c
>>>> file we will link into our module, naming the new entry-point something
>>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>>>> CS/SE perspective this is offensive and we admit it, but from a product
>>>> quality perspective we believe this beats freezes and complicated/imperfect
>>>> workarounds with threads.
>>>>
>>>> So I have two questions for the Apache community:
>>>>
>>>>    1. What are the practical problems with this approach?  Note that
>>>>    in any case a new APR rev would require editing/ifdefing our code anyway,
>>>>    so I think immunity from APR updates such as this patch being applied is
>>>>    not a distinguishing drawback.
>>>>    2. Is there an example of the correct solution to the technical
>>>>    problem Jeff highlighted: "it is apparently missing a call to
>>>>    adjust the socket timeout and to discard the connection if the
>>>>    timeout is reached".  That sounds like a pattern that might be
>>>>    found elsewhere in the Apache HTTPD code base.
>>>>
>>>> Thanks in advance for your help!
>>>> -Josh
>>>>
>>>>
>>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <tr...@gmail.com>wrote:
>>>>
>>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jm...@google.com>
>>>>> wrote:
>>>>> > Is there a mechanism to time out individual operations?
>>>>>
>>>>> No, the socket connect timeout is hard-coded at 1 second and the
>>>>> socket I/O timeout is disabled.
>>>>>
>>>>> Bugzilla bug https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>>>>> has a patch, though it is apparently missing a call to adjust the
>>>>> socket timeout and to discard the connection if the timeout is
>>>>> reached.  More importantly, the API can only be changed in future APR
>>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>>>>> APR-Util 1.6.
>>>>>
>>>>> >
>>>>> > If memcached freezes, then it appears my calls to 'get' will block
>>>>> until
>>>>> > memcached wakes up.  Is there any way to set a timeout for that call?
>>>>> >
>>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
>>>>> before
>>>>> > doing a 'get'?
>>>>> >
>>>>> > Here are my observations:
>>>>> >
>>>>> > apr_memcache_multgetp seems to time out in bounded time if I SIGSTOP
>>>>> the
>>>>> > memcached process. Yes!
>>>>> >
>>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > That last one really sucks as I am using that to print the status of
>>>>> all my
>>>>> > cache shards to the log file if I detected a problem :(
>>>>> >
>>>>> >
>>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>>>>>  Can I
>>>>> > induce the others to have reasonable timeout behavior?
>>>>> >
>>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache,
>>>>> at
>>>>> > least with graceful-stop.
>>>>> >
>>>>> >
>>>>> > On a related note, the apr_memcache
>>>>> > documentation<
>>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>>>>> >is
>>>>> > very thin.  I'd be happy to augment it with my observations on its
>>>>> > usage
>>>>> > and the meaning of some of the arguments if that was desired.  How
>>>>> would I
>>>>> > go about that?
>>>>>
>>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>>>>> include files, build (make dox) and inspect the results, submit a
>>>>> patch to dev@apr.apache.org.
>>>>>
>>>>> >
>>>>> > -Josh
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Born in Roswell... married an alien...
>>>>> http://emptyhammock.com/
>>>>>
>>>>
>>>>
>>>
>>
>

Re: apr_memcache operation timeouts

Posted by Joshua Marantz <jm...@google.com>.
ping!

Please don't hesitate to push back and tell me if I can supply the patch or
update in some easier-to-digest form.  In particular, while I have
rigorously stress-tested this change using mod_pagespeed's unit test,
system-test, and load-test framework, I don't really understand what the
testing flow is for APR.  I'd be happy to add unit-tests for that if
someone points me to a change-list or patch-file that does it properly.

-Josh


On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jm...@google.com> wrote:

> I have completed a solution to this problem, which can be a drop-in update
> for the existing apr_memcache.c.  It is now checked in for my module as
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> .
>
> It differs from the solution in
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>
>    - It doesn't require an API change; it but it enforces the 50ms
>    timeout that already exists for apr_multgetp for all operations.
>    - It works under my load test (which I found is not true of the patch
>    in 51065).
>
> For my own purposes, I will be shipping my module with apr_memcache2 so I
> get the behavior I want regardless of what version of Apache is installed.
>  But I'd like to propose my patch for apr_memcache.c.  The patch is
> attached, and I've also submitted it as an alternative patch to bug 51065.
>
> If you agree with the strategy I used to solve this problem, then please
> let me know if I can help with any changes required to get this into the
> main distribution,
>
>
> On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jm...@google.com>wrote:
>
>> I've had some preliminary success with my own variant of apr_memcache.c
>> (creatively called apr_memcache2.c).  Rather than setting the socket
>> timeout, I've been mimicing the timeout strategy I saw in
>> apr_memcache_multgetp, by adding a new helper method:
>>
>> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>>                                                apr_memcache2_conn_t*
>> conn) {
>>     apr_pollset_t* pollset;
>>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>>     if (rv == APR_SUCCESS) {
>>         apr_pollfd_t pollfd;
>>         pollfd.desc_type = APR_POLL_SOCKET;
>>         pollfd.reqevents = APR_POLLIN;
>>         pollfd.p = temp_pool;
>>         pollfd.desc.s = conn->sock;
>>         pollfd.client_data = NULL;
>>         apr_pollset_add(pollset, &pollfd);
>>         apr_int32_t queries_recvd;
>>         const apr_pollfd_t* activefds;
>>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
>>                               &activefds);
>>         if (rv == APR_SUCCESS) {
>>             assert(queries_recvd == 1);
>>             assert(activefds->desc.s == conn->sock);
>>             assert(activefds->client_data == NULL);
>>         }
>>     }
>>     return rv;
>> }
>>
>> And calling that before many of the existing calls to get_server_line as:
>>
>>     rv = wait_for_server_or_timeout_no_pool(conn);
>>     if (rv != APR_SUCCESS) {
>>         ms_release_conn(ms, conn);
>>         return rv;
>>     }
>>
>> This is just an experiment; I think I can streamline this by
>> pre-populating the pollfd structure as part of the apr_memcache_conn_t
>> (actually now apr_memcache2_conn_t).
>>
>> I have two questions about this:
>>
>> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
>> somewhat different from the one that ships with Apache 2.2.  In particular
>> the 2.4 version cannot be compiled against the headers that come with a 2.2
>> distribution.  Is there any downside to taking my hacked 2.2 apr_memcache.c
>> and running it in Apache 2.4?  Or should I maintain two hacks?
>>
>> 2. This seems wasteful in terms of system calls.  I am making an extra
>> call to poll, rather than relying on the socket timeout.  The socket
>> timeout didn't work as well as this though.  Does anyone have any theories
>> as to why, or what could be done to the patch in
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>>
>> -Josh
>>
>>
>> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jm...@google.com>wrote:
>>
>>> Following up: I tried doing what I suggested above: patching that change
>>> into my own copy of apr_memcache.c  It was first of all a bad idea to pull
>>> in only part of apr_memcache.c because that file changed slightly between
>>> 2.2 and 2.4 and our module works in both.
>>>
>>> I was successful making my own version of apr_memcache (renaming
>>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>>> socket timeout from -1 to 10 seconds, then the system behaved very poorly
>>> under load test (though it worked fine in our unit-tests and system-tests).
>>>  In other words, I think the proposed patch that Jeff pointed to above is
>>> not really working (as he predicted).  This test was done without
>>> SIGSTOPing the memcached; it would timeout under our load anyway and
>>> thereafter behave poorly.
>>>
>>> I'm going to follow up on that bugzilla entry, but for now I'm going to
>>> pursue my own complicated mechanism of timing out the calls from my side.
>>>
>>> -Josh
>>>
>>>
>>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jm...@google.com>wrote:
>>>
>>>> Thanks Jeff, that is very helpful.  We are considering a course of
>>>> action and before doing any work toward this, I'd like to understand the
>>>> pitfalls from people that understand Apache better than us.
>>>>
>>>> Here's our reality: we believe we need to incorporate memcached for
>>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>>>> large sites & hosting providers.  We are fairly close (we think) to
>>>> releasing this functionality as beta.  However, in such large sites, stuff
>>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it does we
>>>> want to fall back to serving partially unoptimized sites rather than
>>>> hanging the servers.
>>>>
>>>> I understand the realities of backward-compatible APIs.  My expectation
>>>> is that this would take years to make it into an APR distribution we could
>>>> depend on.  We want to deploy this functionality in weeks.  The workarounds
>>>> we have tried backgrounding the apr_memcache calls in a thread and timing
>>>> out in mainline are complex and even once they work 100% will be very
>>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>>>> 'apachectl stop') if this happens more than (say) once a month.
>>>>
>>>> Our plan is to copy the patched implementation of
>>>> apr_memcache_server_connect and the static methods it calls into a new .c
>>>> file we will link into our module, naming the new entry-point something
>>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>>>> CS/SE perspective this is offensive and we admit it, but from a product
>>>> quality perspective we believe this beats freezes and complicated/imperfect
>>>> workarounds with threads.
>>>>
>>>> So I have two questions for the Apache community:
>>>>
>>>>    1. What are the practical problems with this approach?  Note that
>>>>    in any case a new APR rev would require editing/ifdefing our code anyway,
>>>>    so I think immunity from APR updates such as this patch being applied is
>>>>    not a distinguishing drawback.
>>>>    2. Is there an example of the correct solution to the technical
>>>>    problem Jeff highlighted: "it is apparently missing a call to
>>>>    adjust the socket timeout and to discard the connection if the
>>>>    timeout is reached".  That sounds like a pattern that might be
>>>>    found elsewhere in the Apache HTTPD code base.
>>>>
>>>> Thanks in advance for your help!
>>>> -Josh
>>>>
>>>>
>>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <tr...@gmail.com>wrote:
>>>>
>>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jm...@google.com>
>>>>> wrote:
>>>>> > Is there a mechanism to time out individual operations?
>>>>>
>>>>> No, the socket connect timeout is hard-coded at 1 second and the
>>>>> socket I/O timeout is disabled.
>>>>>
>>>>> Bugzilla bug https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>>>>> has a patch, though it is apparently missing a call to adjust the
>>>>> socket timeout and to discard the connection if the timeout is
>>>>> reached.  More importantly, the API can only be changed in future APR
>>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>>>>> APR-Util 1.6.
>>>>>
>>>>> >
>>>>> > If memcached freezes, then it appears my calls to 'get' will block
>>>>> until
>>>>> > memcached wakes up.  Is there any way to set a timeout for that call?
>>>>> >
>>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
>>>>> before
>>>>> > doing a 'get'?
>>>>> >
>>>>> > Here are my observations:
>>>>> >
>>>>> > apr_memcache_multgetp seems to time out in bounded time if I SIGSTOP
>>>>> the
>>>>> > memcached process. Yes!
>>>>> >
>>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>>>>> memcached
>>>>> > process.
>>>>> >
>>>>> > That last one really sucks as I am using that to print the status of
>>>>> all my
>>>>> > cache shards to the log file if I detected a problem :(
>>>>> >
>>>>> >
>>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>>>>>  Can I
>>>>> > induce the others to have reasonable timeout behavior?
>>>>> >
>>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache,
>>>>> at
>>>>> > least with graceful-stop.
>>>>> >
>>>>> >
>>>>> > On a related note, the apr_memcache
>>>>> > documentation<
>>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>>>>> >is
>>>>> > very thin.  I'd be happy to augment it with my observations on its
>>>>> > usage
>>>>> > and the meaning of some of the arguments if that was desired.  How
>>>>> would I
>>>>> > go about that?
>>>>>
>>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>>>>> include files, build (make dox) and inspect the results, submit a
>>>>> patch to dev@apr.apache.org.
>>>>>
>>>>> >
>>>>> > -Josh
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Born in Roswell... married an alien...
>>>>> http://emptyhammock.com/
>>>>>
>>>>
>>>>
>>>
>>
>