You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2008/06/14 22:42:03 UTC

showstopper to 1.3.1?

Guys, if anyone is looking at this, I'll hold off from tagging a bit longer,
as I'd rather have apr-1.3.1 address all the platform quirks we identified
in preparing 2.2.9 for release.  But if I hear nothing, I'll have to just 
move ahead :)

Bill

Paul Querna wrote:
> 
> On aurora.apache.org, shortly after installing the new version, we hit a 
> problem with apr_pollset_poll:
> 
> [Thu Jun 12 05:36:51 2008] [error] (70007)The timeout specified has 
> expired: apr_pollset_poll: (listen)
> [Thu Jun 12 05:36:52 2008] [notice] caught SIGTERM, shutting down
> 
> If you look in worker.c, around line 687, you can see that if do a 
> graceful shutdown if we get an unexpected error from apr_pollset_poll.
> 
> This appears to be a regression caused by r641661:
> https://svn.apache.org/viewvc?view=rev&revision=641661
> 
> Which was a fix for PR 42580: 
> https://issues.apache.org/bugzilla/show_bug.cgi?id=42580
> 
> This appears to be an relative edge case on Solaris 10 -- it hasn't 
> happened again, and it is a regression in APR, but relatively small, so 
> I am still +1 for httpd-2.2.9 shipping.


Re: showstopper to 1.3.1?

Posted by Plüm, Rüdiger, VF-Group <ru...@vodafone.com>.
 

> -----Ursprüngliche Nachricht-----
> Von: Joe Orton  
> Gesendet: Mittwoch, 18. Juni 2008 17:20
> An: Ruediger Pluem
> Cc: APR Developer List; dev@httpd.apache.org
> Betreff: Re: showstopper to 1.3.1?
> 
> On Sat, Jun 14, 2008 at 11:24:43PM +0200, Ruediger Pluem wrote:
> > So the code before said that if port_getn returns -1 (== 
> fails) we return APR_TIMEUP
> > if the error is ETIME or EINTR and APR_EGENERAL.
> > So IMHO the error message (in this IMHO the same) would 
> have been shown with the old
> > code.
> > What is more strange to me is that we get a timeout error 
> ((70007)The timeout specified has
> > expired: apr_pollset_poll:) even thought we called 
> apr_pollset_poll with -1 as timeout which
> > means wait indefinitely or no timeout. The implementation 
> of apr_pollset_poll seems to be
> > correct as it ensures that we supply NULL in this case to 
> port_getn. But OTOH the man page
> > for port_get / port_getn documents timeout behaviour only 
> for port_get (setting timeout parameter
> > to null means not timeout) not for port_getn. So couldn't 
> this be a Solaris bug?
> 
> It may be working as designed - this code path gets trigged 
> when all the 
> fds in the pollset get closed asynchronously; see PR 42829 and the 
> thread concerning that bug on this list.

So do I get it correct that port_getn if called only with closed
fd's associated to this port returns immediately with ETIME set as
errno despite its timeout being set to infinite?

> 
> If it is working as designed it is arguably more useful than 
> the epoll 
> behaviour in the same case; at least, it's more useful for httpd.  
> AFAICT the only effect of this to prefork is to generate some 
> error_log 
> spam; could be ignored easily enough.
> 
> The APR change r641661 certainly looks correct anyway.

As I understand it an error would be returned by apr_pollset_poll
with and without r641661 in place.

Regards

Rüdiger

Re: showstopper to 1.3.1?

Posted by Plüm, Rüdiger, VF-Group <ru...@vodafone.com>.
 

> -----Ursprüngliche Nachricht-----
> Von: Joe Orton  
> Gesendet: Mittwoch, 18. Juni 2008 17:20
> An: Ruediger Pluem
> Cc: APR Developer List; dev@httpd.apache.org
> Betreff: Re: showstopper to 1.3.1?
> 
> On Sat, Jun 14, 2008 at 11:24:43PM +0200, Ruediger Pluem wrote:
> > So the code before said that if port_getn returns -1 (== 
> fails) we return APR_TIMEUP
> > if the error is ETIME or EINTR and APR_EGENERAL.
> > So IMHO the error message (in this IMHO the same) would 
> have been shown with the old
> > code.
> > What is more strange to me is that we get a timeout error 
> ((70007)The timeout specified has
> > expired: apr_pollset_poll:) even thought we called 
> apr_pollset_poll with -1 as timeout which
> > means wait indefinitely or no timeout. The implementation 
> of apr_pollset_poll seems to be
> > correct as it ensures that we supply NULL in this case to 
> port_getn. But OTOH the man page
> > for port_get / port_getn documents timeout behaviour only 
> for port_get (setting timeout parameter
> > to null means not timeout) not for port_getn. So couldn't 
> this be a Solaris bug?
> 
> It may be working as designed - this code path gets trigged 
> when all the 
> fds in the pollset get closed asynchronously; see PR 42829 and the 
> thread concerning that bug on this list.

So do I get it correct that port_getn if called only with closed
fd's associated to this port returns immediately with ETIME set as
errno despite its timeout being set to infinite?

> 
> If it is working as designed it is arguably more useful than 
> the epoll 
> behaviour in the same case; at least, it's more useful for httpd.  
> AFAICT the only effect of this to prefork is to generate some 
> error_log 
> spam; could be ignored easily enough.
> 
> The APR change r641661 certainly looks correct anyway.

As I understand it an error would be returned by apr_pollset_poll
with and without r641661 in place.

Regards

Rüdiger

Re: showstopper to 1.3.1?

Posted by Joe Orton <jo...@redhat.com>.
On Sat, Jun 14, 2008 at 11:24:43PM +0200, Ruediger Pluem wrote:
> So the code before said that if port_getn returns -1 (== fails) we return APR_TIMEUP
> if the error is ETIME or EINTR and APR_EGENERAL.
> So IMHO the error message (in this IMHO the same) would have been shown with the old
> code.
> What is more strange to me is that we get a timeout error ((70007)The timeout specified has
> expired: apr_pollset_poll:) even thought we called apr_pollset_poll with -1 as timeout which
> means wait indefinitely or no timeout. The implementation of apr_pollset_poll seems to be
> correct as it ensures that we supply NULL in this case to port_getn. But OTOH the man page
> for port_get / port_getn documents timeout behaviour only for port_get (setting timeout parameter
> to null means not timeout) not for port_getn. So couldn't this be a Solaris bug?

It may be working as designed - this code path gets trigged when all the 
fds in the pollset get closed asynchronously; see PR 42829 and the 
thread concerning that bug on this list.

If it is working as designed it is arguably more useful than the epoll 
behaviour in the same case; at least, it's more useful for httpd.  
AFAICT the only effect of this to prefork is to generate some error_log 
spam; could be ignored easily enough.

The APR change r641661 certainly looks correct anyway.

joe

Re: showstopper to 1.3.1?

Posted by Joe Orton <jo...@redhat.com>.
On Sat, Jun 14, 2008 at 11:24:43PM +0200, Ruediger Pluem wrote:
> So the code before said that if port_getn returns -1 (== fails) we return APR_TIMEUP
> if the error is ETIME or EINTR and APR_EGENERAL.
> So IMHO the error message (in this IMHO the same) would have been shown with the old
> code.
> What is more strange to me is that we get a timeout error ((70007)The timeout specified has
> expired: apr_pollset_poll:) even thought we called apr_pollset_poll with -1 as timeout which
> means wait indefinitely or no timeout. The implementation of apr_pollset_poll seems to be
> correct as it ensures that we supply NULL in this case to port_getn. But OTOH the man page
> for port_get / port_getn documents timeout behaviour only for port_get (setting timeout parameter
> to null means not timeout) not for port_getn. So couldn't this be a Solaris bug?

It may be working as designed - this code path gets trigged when all the 
fds in the pollset get closed asynchronously; see PR 42829 and the 
thread concerning that bug on this list.

If it is working as designed it is arguably more useful than the epoll 
behaviour in the same case; at least, it's more useful for httpd.  
AFAICT the only effect of this to prefork is to generate some error_log 
spam; could be ignored easily enough.

The APR change r641661 certainly looks correct anyway.

joe

Re: showstopper to 1.3.1?

Posted by Ruediger Pluem <rp...@apache.org>.

On 06/14/2008 10:42 PM, William A. Rowe, Jr. wrote:
> Guys, if anyone is looking at this, I'll hold off from tagging a bit 
> longer,
> as I'd rather have apr-1.3.1 address all the platform quirks we identified
> in preparing 2.2.9 for release.  But if I hear nothing, I'll have to 
> just move ahead :)
> 
> Bill
> 
> Paul Querna wrote:
>>
>> On aurora.apache.org, shortly after installing the new version, we hit 
>> a problem with apr_pollset_poll:
>>
>> [Thu Jun 12 05:36:51 2008] [error] (70007)The timeout specified has 
>> expired: apr_pollset_poll: (listen)
>> [Thu Jun 12 05:36:52 2008] [notice] caught SIGTERM, shutting down
>>
>> If you look in worker.c, around line 687, you can see that if do a 
>> graceful shutdown if we get an unexpected error from apr_pollset_poll.
>>
>> This appears to be a regression caused by r641661:
>> https://svn.apache.org/viewvc?view=rev&revision=641661
>>
>> Which was a fix for PR 42580: 
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=42580
>>
>> This appears to be an relative edge case on Solaris 10 -- it hasn't 
>> happened again, and it is a regression in APR, but relatively small, 
>> so I am still +1 for httpd-2.2.9 shipping.

Is this really a regression in APR or were we just as lucky before as we
were after?

Code from httpd

                rv = apr_pollset_poll(pollset, -1, &numdesc, &pdesc);
                 if (rv != APR_SUCCESS) {
                     if (APR_STATUS_IS_EINTR(rv)) {
                         continue;
                     }

                     /* apr_pollset_poll() will only return errors in catastrophic
                      * circumstances. Let's try exiting gracefully, for now. */
                     ap_log_error(APLOG_MARK, APLOG_ERR, rv,
                                  (const server_rec *) ap_server_conf,
                                  "apr_pollset_poll: (listen)");


So we the error message logged if apr_pollset_poll returns anything different then
APR_SUCCESS or APR_EINTR.

So lets have a look at r641661:

--- apr/apr/trunk/poll/unix/port.c	2008/03/27 00:31:21	641660
+++ apr/apr/trunk/poll/unix/port.c	2008/03/27 00:46:05	641661
@@ -295,12 +295,7 @@

      if (ret == -1) {
          (*num) = 0;
-        if (errno == ETIME || errno == EINTR) {
-            rv = APR_TIMEUP;
-        }
-        else {
-            rv = APR_EGENERAL;
-        }
+        rv = apr_get_netos_error();
      }
      else if (nget == 0) {
          rv = APR_TIMEUP;

So the code before said that if port_getn returns -1 (== fails) we return APR_TIMEUP
if the error is ETIME or EINTR and APR_EGENERAL.
So IMHO the error message (in this IMHO the same) would have been shown with the old
code.
What is more strange to me is that we get a timeout error ((70007)The timeout specified has
expired: apr_pollset_poll:) even thought we called apr_pollset_poll with -1 as timeout which
means wait indefinitely or no timeout. The implementation of apr_pollset_poll seems to be
correct as it ensures that we supply NULL in this case to port_getn. But OTOH the man page
for port_get / port_getn documents timeout behaviour only for port_get (setting timeout parameter
to null means not timeout) not for port_getn. So couldn't this be a Solaris bug?

Regards

Rüdiger



Re: showstopper to 1.3.1?

Posted by Ruediger Pluem <rp...@apache.org>.

On 06/14/2008 10:42 PM, William A. Rowe, Jr. wrote:
> Guys, if anyone is looking at this, I'll hold off from tagging a bit 
> longer,
> as I'd rather have apr-1.3.1 address all the platform quirks we identified
> in preparing 2.2.9 for release.  But if I hear nothing, I'll have to 
> just move ahead :)
> 
> Bill
> 
> Paul Querna wrote:
>>
>> On aurora.apache.org, shortly after installing the new version, we hit 
>> a problem with apr_pollset_poll:
>>
>> [Thu Jun 12 05:36:51 2008] [error] (70007)The timeout specified has 
>> expired: apr_pollset_poll: (listen)
>> [Thu Jun 12 05:36:52 2008] [notice] caught SIGTERM, shutting down
>>
>> If you look in worker.c, around line 687, you can see that if do a 
>> graceful shutdown if we get an unexpected error from apr_pollset_poll.
>>
>> This appears to be a regression caused by r641661:
>> https://svn.apache.org/viewvc?view=rev&revision=641661
>>
>> Which was a fix for PR 42580: 
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=42580
>>
>> This appears to be an relative edge case on Solaris 10 -- it hasn't 
>> happened again, and it is a regression in APR, but relatively small, 
>> so I am still +1 for httpd-2.2.9 shipping.

Is this really a regression in APR or were we just as lucky before as we
were after?

Code from httpd

                rv = apr_pollset_poll(pollset, -1, &numdesc, &pdesc);
                 if (rv != APR_SUCCESS) {
                     if (APR_STATUS_IS_EINTR(rv)) {
                         continue;
                     }

                     /* apr_pollset_poll() will only return errors in catastrophic
                      * circumstances. Let's try exiting gracefully, for now. */
                     ap_log_error(APLOG_MARK, APLOG_ERR, rv,
                                  (const server_rec *) ap_server_conf,
                                  "apr_pollset_poll: (listen)");


So we the error message logged if apr_pollset_poll returns anything different then
APR_SUCCESS or APR_EINTR.

So lets have a look at r641661:

--- apr/apr/trunk/poll/unix/port.c	2008/03/27 00:31:21	641660
+++ apr/apr/trunk/poll/unix/port.c	2008/03/27 00:46:05	641661
@@ -295,12 +295,7 @@

      if (ret == -1) {
          (*num) = 0;
-        if (errno == ETIME || errno == EINTR) {
-            rv = APR_TIMEUP;
-        }
-        else {
-            rv = APR_EGENERAL;
-        }
+        rv = apr_get_netos_error();
      }
      else if (nget == 0) {
          rv = APR_TIMEUP;

So the code before said that if port_getn returns -1 (== fails) we return APR_TIMEUP
if the error is ETIME or EINTR and APR_EGENERAL.
So IMHO the error message (in this IMHO the same) would have been shown with the old
code.
What is more strange to me is that we get a timeout error ((70007)The timeout specified has
expired: apr_pollset_poll:) even thought we called apr_pollset_poll with -1 as timeout which
means wait indefinitely or no timeout. The implementation of apr_pollset_poll seems to be
correct as it ensures that we supply NULL in this case to port_getn. But OTOH the man page
for port_get / port_getn documents timeout behaviour only for port_get (setting timeout parameter
to null means not timeout) not for port_getn. So couldn't this be a Solaris bug?

Regards

Rüdiger



Re: showstopper to 1.3.1?

Posted by Oden Eriksson <oe...@mandriva.com>.
Den Monday 16 June 2008 15:23:34 skrev Oden Eriksson:
> Den Saturday 14 June 2008 22:42:03 skrev William A. Rowe, Jr.:
> > Guys, if anyone is looking at this, I'll hold off from tagging a bit
> > longer, as I'd rather have apr-1.3.1 address all the platform quirks we
> > identified in preparing 2.2.9 for release.  But if I hear nothing, I'll
> > have to just move ahead :)
> >
> > Bill
>
> I think this was forgotten in apr-util-1.3.0:
>
> --- Makefile.in 2008-05-22 18:37:47.000000000 -0400
> +++ Makefile.in.oden    2008-06-16 07:40:20.000000000 -0400
> @@ -36,6 +36,7 @@
>  LDADD_dbd_sqlite3 = @LDADD_dbd_sqlite3@
>  LDADD_dbd_mysql = @LDADD_dbd_mysql@
>  LDADD_ldap = @LDADD_ldap@
> +LDADD_dbd_freetds = @LDADD_dbd_freetds@
>
>  TARGETS = $(TARGET_LIB) aprutil.exp apu-config.out $(APU_MODULES)
>
>
>
> Though it doesn't help much for me on cooker, don't know why it won't link
> correctly. I'm using the latest freetds (0.82).

The freetds stuff from HEAD works.


Re: showstopper to 1.3.1?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Oden Eriksson wrote:
> Den Monday 16 June 2008 18:15:24 skrev William A. Rowe, Jr.:
>> Unclear what was up with your change;
>>
>> --- apr-util-1.3.0/dbd/apr_dbd_freetds.c	2008-05-30 16:45:51.000000000
>> -0400 +++ apr-util-1.3.0.oden/dbd/apr_dbd_freetds.c	2008-06-16
>> 10:08:05.000000000 -0400
>>
>> which isn't on trunk, and since there's no explanation I'm ignoring it as
>> I move forward to deal with any 1.3.2, but you might like to take that up
>> to the apr, or remind us if you had earlier submitted a patch.  About the
>> other files touched in your patch, those were all in the 1.3.1 candidate
>> tarball.
> 
> I am sorry but I do not touch the development branch very often, the last time 
> was about three years ago (like).

Oh no - no hassle, just wondering where the patch came from if not on
development, looking forward to your comments on dev@apr.  Thanks again.


Re: showstopper to 1.3.1?

Posted by Oden Eriksson <oe...@mandriva.com>.
Den Monday 16 June 2008 18:15:24 skrev William A. Rowe, Jr.:
> Oden Eriksson wrote:
> > For completeness, this is the patch i had to use for freetds:
> >
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/apr-util/curre
> >nt/SOURCES/apr-util-freetds_fix.diff?revision=219531&view=markup Well...
> > Not so important anyway.
>
> Unclear what was up with your change;
>
> --- apr-util-1.3.0/dbd/apr_dbd_freetds.c	2008-05-30 16:45:51.000000000
> -0400 +++ apr-util-1.3.0.oden/dbd/apr_dbd_freetds.c	2008-06-16
> 10:08:05.000000000 -0400
>
> which isn't on trunk, and since there's no explanation I'm ignoring it as
> I move forward to deal with any 1.3.2, but you might like to take that up
> to the apr, or remind us if you had earlier submitted a patch.  About the
> other files touched in your patch, those were all in the 1.3.1 candidate
> tarball.

I am sorry but I do not touch the development branch very often, the last time 
was about three years ago (like).

>  > Additionally I tried to add the odbc stuff as well from HEAD but failed.
>
> FYI There is a list for apr development at dev@apr.apache.org where such
> posts might get more relevant attention, and there are a number of threads
> about a 1.3.1 potential release (soon to be 1.3.2 potential release).  You
> might be interested and would be welcomed to join the dialog there.

OK. Thanks. I will try to address these things there instead.

> What are you calling HEAD?  repos/asf/apr/apr-util/trunk/?  From when?
> Reports were that it should now be ok, you might retry, or better yet point
> out what failed?  I know the devs are excited to ship odbc support, and
> would like it to be golden in the first apr-util-1 release.

Yes i meant repos/asf/apr/apr-util/trunk/ it was taken today, but as you said 
earlier i have to adress the apr development list.

Cheers.


Re: showstopper to 1.3.1?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Oden Eriksson wrote:
> 
> For completeness, this is the patch i had to use for freetds:
> 
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/apr-util/current/SOURCES/apr-util-freetds_fix.diff?revision=219531&view=markup
> Well... Not so important anyway.

Unclear what was up with your change;

--- apr-util-1.3.0/dbd/apr_dbd_freetds.c	2008-05-30 16:45:51.000000000 -0400
+++ apr-util-1.3.0.oden/dbd/apr_dbd_freetds.c	2008-06-16 10:08:05.000000000 
-0400

which isn't on trunk, and since there's no explanation I'm ignoring it as
I move forward to deal with any 1.3.2, but you might like to take that up
to the apr, or remind us if you had earlier submitted a patch.  About the
other files touched in your patch, those were all in the 1.3.1 candidate
tarball.

 > Additionally I tried to add the odbc stuff as well from HEAD but failed.

FYI There is a list for apr development at dev@apr.apache.org where such
posts might get more relevant attention, and there are a number of threads
about a 1.3.1 potential release (soon to be 1.3.2 potential release).  You
might be interested and would be welcomed to join the dialog there.

What are you calling HEAD?  repos/asf/apr/apr-util/trunk/?  From when?
Reports were that it should now be ok, you might retry, or better yet point
out what failed?  I know the devs are excited to ship odbc support, and
would like it to be golden in the first apr-util-1 release.


Re: showstopper to 1.3.1?

Posted by Oden Eriksson <oe...@mandriva.com>.
Den Monday 16 June 2008 17:14:18 skrev William A. Rowe, Jr.:
> Oden Eriksson wrote:
> > I think this was forgotten in apr-util-1.3.0:
> >
> > --- Makefile.in 2008-05-22 18:37:47.000000000 -0400
> > +++ Makefile.in.oden    2008-06-16 07:40:20.000000000 -0400
> > +LDADD_dbd_freetds = @LDADD_dbd_freetds@
>
> Already leading off this list of
>
>    http://apr.apache.org/dev/dist/CHANGES-APR-UTIL-1.3
>
>  >Changes with APR-util 1.3.1
>  >
>  >  *) Add ODBC DBD Driver.  [Tom Donovan]
>  >
>  >  *) Fix build of the FreeTDS and MySQL drivers.  [Bojan Smojver]
>
> looks like 1.3.1 util may be scuttled for 1.3.2, but it will be available
> sometime rsn.

For completeness, this is the patch i had to use for freetds:

http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/apr-util/current/SOURCES/apr-util-freetds_fix.diff?revision=219531&view=markup

Additionally I tried to add the odbc stuff as well from HEAD but failed. 
Well... Not so important anyway.



Re: showstopper to 1.3.1?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Oden Eriksson wrote:
> 
> I think this was forgotten in apr-util-1.3.0:
> 
> --- Makefile.in 2008-05-22 18:37:47.000000000 -0400
> +++ Makefile.in.oden    2008-06-16 07:40:20.000000000 -0400
> +LDADD_dbd_freetds = @LDADD_dbd_freetds@

Already leading off this list of

   http://apr.apache.org/dev/dist/CHANGES-APR-UTIL-1.3

 >Changes with APR-util 1.3.1
 >
 >  *) Add ODBC DBD Driver.  [Tom Donovan]
 >
 >  *) Fix build of the FreeTDS and MySQL drivers.  [Bojan Smojver]

looks like 1.3.1 util may be scuttled for 1.3.2, but it will be available
sometime rsn.

Re: showstopper to 1.3.1?

Posted by Oden Eriksson <oe...@mandriva.com>.
Den Saturday 14 June 2008 22:42:03 skrev William A. Rowe, Jr.:
> Guys, if anyone is looking at this, I'll hold off from tagging a bit
> longer, as I'd rather have apr-1.3.1 address all the platform quirks we
> identified in preparing 2.2.9 for release.  But if I hear nothing, I'll
> have to just move ahead :)
>
> Bill

I think this was forgotten in apr-util-1.3.0:

--- Makefile.in 2008-05-22 18:37:47.000000000 -0400
+++ Makefile.in.oden    2008-06-16 07:40:20.000000000 -0400
@@ -36,6 +36,7 @@
 LDADD_dbd_sqlite3 = @LDADD_dbd_sqlite3@
 LDADD_dbd_mysql = @LDADD_dbd_mysql@
 LDADD_ldap = @LDADD_ldap@
+LDADD_dbd_freetds = @LDADD_dbd_freetds@

 TARGETS = $(TARGET_LIB) aprutil.exp apu-config.out $(APU_MODULES)



Though it doesn't help much for me on cooker, don't know why it won't link 
correctly. I'm using the latest freetds (0.82).