You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/10/09 16:35:15 UTC

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

On 9 Oct 2001 jerenkrantz@apache.org wrote:

>   --- apr_buckets_socket.c	2001/09/29 07:06:19	1.31
>   +++ apr_buckets_socket.c	2001/10/09 05:17:18	1.32
>   @@ -76,6 +76,12 @@
>
>        if (block == APR_NONBLOCK_READ) {
>            apr_setsocketopt(p, APR_SO_TIMEOUT, timeout);
>   +        /* There was nothing to read right now, so treat it as okay and
>   +         * return a 0-length brigade (see below). */
>   +        if (APR_STATUS_IS_EAGAIN(rv)) {
>   +            *len = 0;
>   +            rv = APR_SUCCESS;
>   +        }
>        }
>

-1.  By doing this, any time you do a nonblocking read from a socket
bucket and get EAGAIN, the whole bucket vaporizes and you never get the
chance to read any more data.  It was correct before: if you ask for a
nonblocking read, you have to be ready to handle an EAGAIN case.  So we
must return EAGAIN and leave the brigade as it was... the caller can just
sit there repeatedly reading from that one bucket if they like, or they
can do some other work and come back to that bucket.  As long as the
caller doesn't treat it as fatal (which it isn't for nonblocking) and the
caller doesn't move on to the next bucket prematurely, we're fine.

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 10 October 2001 07:55 am, Justin Erenkrantz wrote:
> On Wed, Oct 10, 2001 at 06:59:32AM -0700, Ryan Bloom wrote:
> > 	do
> > 	    do stuff so that we don't wait until there is data before
> > 	    we process what we have now.
> > 	while (apr_bucket_read == APR_EAGAIN)
> >
> > to:
> > 	do
> > 	    do stuff just like above
> > 	whild (apr_bucket_read == APR_SUCCESS &&
> > 		APR_BUCKET_FIRST()->length == 0)
> >
> > Libraries should return errors, because they are valid information that
> > the caller can use.  They also do not break the abstraction, because we
> > have essentially said that all bucket _can_ return EAGAIN, although
> > anybody who looks at the code will realize quickly that only socket and
> > pipe bucket ever _will_ return EAGAIN.
>
> See, I think that's my complaint.  Now, we have to always handle
> EAGAIN in every case where we read the buckets to be correct to the
> abstraction.  Something just seems wrong about having to add this for
> a special case where we could reasonably dictate how to handle it
> in the special case.

If you want to be 100% correct, yeah we do.  But we have to handle it
anyway, because pipe buckets can return EAGAIN as well.  In fact, any
bucket with a non-determinant length can return EAGAIN.

> I have a feeling that rather than fixing this, we're just going to
> ditch the socket bucket entirely (what you and Greg have suggested).
> And, I'd rather not do that because it means that the socket bucket
> isn't doing what we want.  And, if it isn't doing what we want, I
> think we need to stop and ask why.

Either way, you have to handle the EAGAIN case.  Either you have to
handle it because the socket returned it to you, or the bucket did.  There
is no difference in the code you have to write.  The reality is that if you
put in the code in this patch, then the special case still exists, but it is
hidden, making it harder to find, but just as important to know about.
The special case moves from "EAGAIN returned" to "SUCCES returned,
and first bucket has 0 length, if the second bucket has an indeterminant
length".  That is painful to check.

We should move away from the socket bucket because it isn't necessary
anymore.  We don't need a socket bucket, because the socket is being
handled completely inside of the core input filter.

> I think in the example of CORE_IN, it would just completely ignore
> EAGAIN and return a 0-length brigade.  In my mind, that just seems
> the only rational decision to make there (perhaps not in the general
> case).  The core filters should not be doing anything to slow down
> processing.

Yeah, most filters will want to just keep going, but this patch makes the
shortcut very hard to see.  If we can just check EAGAIN, then we can
easily branch to the spot where we return the 0 length brigade.  With
this patch, we must check the length of the first and second buckets
to know that we are in the EAGAIN case.

Imagine an output filter that wants to process the data coming from a 
CGI script.  If it gets EAGAIN, then it just passes what it has, and when
the function returns, it calls apr_bucket_read again.  That logic is this
simple:

	if (apr_bucket_read == APR_EAGAIN) {
	    apr_pass_brigade()
	    continue;
	}
Assuming that logic is in a while loop, this just works.  With this patch,
it becomes

	if (apr_bucket_read == APR_EAGAIN &&
	    APR_BUCKET_FIRST->lenght == 0 &&
	    APR_BUCKET_NEXT(APR_BUCKET_FIRST)->length == -1) {
	    apr_pass_brigade()
	    continue;
	}

> But, I'm not sure how any reader of the buckets will know to do a
> select/poll/etc *without* violating the knowledge that the bucket
> is a socket-based bucket.  Then, it'd also have to retrieve the
> socket from the bucket and then do a select/poll/etc on it.  And,
> you'd have to handle the pipe case differently as well.  I guess
> I see the violation occurring when the caller tries to "resolve"
> the EAGAIN not with the EAGAIN itself.  The only way the caller
> can do that (from my understanding) is to know what the storage
> type is.  And, that's verboten, IMHO.

9 times out ot 10, we shouldn't be polling to get the data.  The bucket
code does that for us if you call it with a blocking read, and that
will either pop when the timeout pops or as soon as there is one
byte of data.  There is no reason for the caller to ever do the poll itself.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

Posted by Cliff Woolley <cl...@yahoo.com>.

[I started writing this hours ago and got sidetracked... I think this
issue is already settled since then but I'll finish my thought anyway.]

On Wed, 10 Oct 2001, Justin Erenkrantz wrote:

> > Libraries should return errors, because they are valid information that the
> > caller can use.  They also do not break the abstraction, because we have
> > essentially said that all bucket _can_ return EAGAIN, although anybody who
> > looks at the code will realize quickly that only socket and pipe bucket ever
> > _will_ return EAGAIN.
>
> See, I think that's my complaint.  Now, we have to always handle
> EAGAIN in every case where we read the buckets to be correct to the
> abstraction.

Yes.  That's correct.  EAGAIN could at any time be returned from a read of
any bucket type _if you do a nonblocking read_.  EAGAIN should never be
returned if you do a blocking read.  Basically, SUCCESS means "I got some
data for you" or "the bucket is empty".  In either case, SUCCESS means
"move on to the next bucket, nothing else to see here."

For nonblocking reads, it is understood that there is a special case: you
might be able to read more later but just don't have it available right
now.  That's a separate return code, the standard for which is EAGAIN.
It tells the caller the bucket is empty _right now_, but not to move on to
the next bucket because we'll have something for them later.

Now, that's not to say that ALL possible return values from socket reads
and the like should be returned directly by the buckets.  APR_EOF is a
good example.  That's something that just doesn't make sense from the
point of view of reading a bucket--it's information that the buckets code
can use internally but that it should hide from the caller.

I thought we had already documented that EAGAIN or SUCCESS could be
returned for nonblocking reads, but if we haven't, then yes, it definitely
needs to be done.

Thanks for the patches to revert this and fix the core, btw.

--Cliff

---------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Wed, Oct 10, 2001 at 06:59:32AM -0700, Ryan Bloom wrote:
> 	do
> 	    do stuff so that we don't wait until there is data before
> 	    we process what we have now.
> 	while (apr_bucket_read == APR_EAGAIN)
> 
> to:
> 	do
> 	    do stuff just like above
> 	whild (apr_bucket_read == APR_SUCCESS &&
> 		APR_BUCKET_FIRST()->length == 0)
> 
> Libraries should return errors, because they are valid information that the
> caller can use.  They also do not break the abstraction, because we have
> essentially said that all bucket _can_ return EAGAIN, although anybody who
> looks at the code will realize quickly that only socket and pipe bucket ever
> _will_ return EAGAIN.

See, I think that's my complaint.  Now, we have to always handle
EAGAIN in every case where we read the buckets to be correct to the
abstraction.  Something just seems wrong about having to add this for
a special case where we could reasonably dictate how to handle it
in the special case.

I have a feeling that rather than fixing this, we're just going to
ditch the socket bucket entirely (what you and Greg have suggested).
And, I'd rather not do that because it means that the socket bucket
isn't doing what we want.  And, if it isn't doing what we want, I
think we need to stop and ask why.

I think in the example of CORE_IN, it would just completely ignore 
EAGAIN and return a 0-length brigade.  In my mind, that just seems 
the only rational decision to make there (perhaps not in the general
case).  The core filters should not be doing anything to slow down
processing.  

But, I'm not sure how any reader of the buckets will know to do a
select/poll/etc *without* violating the knowledge that the bucket
is a socket-based bucket.  Then, it'd also have to retrieve the
socket from the bucket and then do a select/poll/etc on it.  And,
you'd have to handle the pipe case differently as well.  I guess
I see the violation occurring when the caller tries to "resolve"
the EAGAIN not with the EAGAIN itself.  The only way the caller 
can do that (from my understanding) is to know what the storage 
type is.  And, that's verboten, IMHO.

> Don't call a function from within a macro.  You have no way of knowing how
> often that function will be called.  Yes, I know it is done other places in the
> code, those are incorrect too.

Yeah, well, I just copied that code from a few lines below, but 
yeah.  =)  -- justin

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

Posted by Ryan Bloom <rb...@covalent.net>.

On Tuesday 09 October 2001 10:44 pm, Justin Erenkrantz wrote:
> On Tue, Oct 09, 2001 at 10:35:15AM -0400, Cliff Woolley wrote:
> > -1.  By doing this, any time you do a nonblocking read from a socket
> > bucket and get EAGAIN, the whole bucket vaporizes and you never get the
> > chance to read any more data.  It was correct before: if you ask for a
> > nonblocking read, you have to be ready to handle an EAGAIN case.  So we
>
> Yes, I didn't think of that case - you are right - it probably
> explains some of the problems I was seeing with mod_ssl.
>
> I'm not sure that the bucket read code should be returning an error
> on anything but fatal errors.  And, receiving EAGAIN isn't a fatal
> error.  My take is that forcing the caller of bucket_read to handle
> EAGAIN isn't very clean.  I think this breaks the abstraction between
> sockets and buckets - I'm not sure that the caller of the bucket_read
> needs to handle EAGAIN explicitly.  Isn't part of the idea of buckets
> is to abstract out certain implementation details - the caller shouldn't
> know that it has a socket bucket underneath it?

Buckets should not be trying to handle error conditions at all.  The purpose
of buckets is to abstract out the storage mechanism, and that is what they do, 
it is not to impose our view of how code should handle errors.  By not passing
APR_EAGAIN back to the program, you are removing some very valuable
information, and making it hard to determine what is going on.  The source
code in the caller has to go from:

	do
	    do stuff so that we don't wait until there is data before
	    we process what we have now.
	while (apr_bucket_read == APR_EAGAIN)

to:
	do
	    do stuff just like above
	whild (apr_bucket_read == APR_SUCCESS &&
		APR_BUCKET_FIRST()->length == 0)

Libraries should return errors, because they are valid information that the
caller can use.  They also do not break the abstraction, because we have
essentially said that all bucket _can_ return EAGAIN, although anybody who
looks at the code will realize quickly that only socket and pipe bucket ever
_will_ return EAGAIN.

> Considering the above comments, would this patch satisfy your
> veto?  -- justin

No.   :-)

>
> Index: buckets/apr_buckets_socket.c
> ===================================================================
> RCS file: /home/cvs/apr-util/buckets/apr_buckets_socket.c,v
> retrieving revision 1.32
> diff -u -r1.32 apr_buckets_socket.c
> --- buckets/apr_buckets_socket.c	2001/10/09 05:17:18	1.32
> +++ buckets/apr_buckets_socket.c	2001/10/10 05:26:29
> @@ -77,10 +77,14 @@
>      if (block == APR_NONBLOCK_READ) {
>          apr_setsocketopt(p, APR_SO_TIMEOUT, timeout);
>          /* There was nothing to read right now, so treat it as okay and
> -         * return a 0-length brigade (see below). */
> +         * return a 0-length bucket. */
>          if (APR_STATUS_IS_EAGAIN(rv)) {
> +            free(buf);
>              *len = 0;
> -            rv = APR_SUCCESS;
> +            a = apr_bucket_immortal_make(a, "", 0);
> +            *str = a->data;
> +            APR_BUCKET_INSERT_AFTER(a, apr_bucket_socket_create(p));
> +            return APR_SUCCESS;
>          }
>      }

Don't call a function from within a macro.  You have no way of knowing how
often that function will be called.  Yes, I know it is done other places in the
code, those are incorrect too.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: cvs commit: apr-util/buckets apr_buckets_socket.c

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Tue, Oct 09, 2001 at 10:35:15AM -0400, Cliff Woolley wrote:
> -1.  By doing this, any time you do a nonblocking read from a socket
> bucket and get EAGAIN, the whole bucket vaporizes and you never get the
> chance to read any more data.  It was correct before: if you ask for a
> nonblocking read, you have to be ready to handle an EAGAIN case.  So we

Yes, I didn't think of that case - you are right - it probably
explains some of the problems I was seeing with mod_ssl.

I'm not sure that the bucket read code should be returning an error 
on anything but fatal errors.  And, receiving EAGAIN isn't a fatal 
error.  My take is that forcing the caller of bucket_read to handle
EAGAIN isn't very clean.  I think this breaks the abstraction between 
sockets and buckets - I'm not sure that the caller of the bucket_read
needs to handle EAGAIN explicitly.  Isn't part of the idea of buckets
is to abstract out certain implementation details - the caller shouldn't
know that it has a socket bucket underneath it?

Has this been decided/discussed before?  If so, then I'll just shut 
up.  =)

Considering the above comments, would this patch satisfy your 
veto?  -- justin

Index: buckets/apr_buckets_socket.c
===================================================================
RCS file: /home/cvs/apr-util/buckets/apr_buckets_socket.c,v
retrieving revision 1.32
diff -u -r1.32 apr_buckets_socket.c
--- buckets/apr_buckets_socket.c	2001/10/09 05:17:18	1.32
+++ buckets/apr_buckets_socket.c	2001/10/10 05:26:29
@@ -77,10 +77,14 @@
     if (block == APR_NONBLOCK_READ) {
         apr_setsocketopt(p, APR_SO_TIMEOUT, timeout);
         /* There was nothing to read right now, so treat it as okay and 
-         * return a 0-length brigade (see below). */
+         * return a 0-length bucket. */
         if (APR_STATUS_IS_EAGAIN(rv)) {
+            free(buf);
             *len = 0;
-            rv = APR_SUCCESS;
+            a = apr_bucket_immortal_make(a, "", 0);
+            *str = a->data;
+            APR_BUCKET_INSERT_AFTER(a, apr_bucket_socket_create(p));
+            return APR_SUCCESS;
         }
     }