You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2001/07/13 03:08:14 UTC

lengths - brigades v.s. buckets.

Ok, I'm back to fixing all the 64 bit off_t discrepancies in APR/Apache.

Can we basically agree that a "Bucket" can never be bigger than apr_ssize_t?
I've no problems with using apr_off_t for the length of a full Brigades itself.
That means we can split a brigade on any apr_off_t, but would only need to
split a bucket on an apr_ssize_t.  It implies a 'Pipe' bucket can't generate
more than 2^31 bytes without breaking the code.

This means a huge file would need to be split by the caller into multiple file
buckets, no longer than ssize_t.  Is this reasonable?

Bill


Re: lengths - brigades v.s. buckets.

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Ok, I'm back to fixing all the 64 bit off_t discrepancies in APR/Apache.
>
> Can we basically agree that a "Bucket" can never be bigger than apr_ssize_t?
Is the bucked backed by RAM?  If so, then I agree.  file buckets that can be sent down the chain for
use by sendfile should not have this restriction. If you need, for whatever reason, to MMAP or read
in the file, then sure apr_ssize_t is a reasonable upper limit (we'll set the actual limit much
lower in practice).

> I've no problems with using apr_off_t for the length of a full Brigades itself.
> That means we can split a brigade on any apr_off_t, but would only need to
> split a bucket on an apr_ssize_t.  It implies a 'Pipe' bucket can't generate
> more than 2^31 bytes without breaking the code.

I don't follow the comment about a pipe bucket.  Sure, if you attempt to buffer the entire pipe,
there is a limit and 2^31 is not an unreasonable limit. In practice, we would never attempt to
buffer this much.

>
> This means a huge file would need to be split by the caller into multiple file
> buckets, no longer than ssize_t.  Is this reasonable?
>
Yes, provided this in no way implies that you cannot have a file_bucket that references an open fd
to a file of arbitray size.

Bill



Re: lengths - brigades v.s. buckets.

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "Bill Stoddard" <bi...@wstoddard.com>
Sent: Friday, July 13, 2001 7:10 AM


> > Ok, I'm back to fixing all the 64 bit off_t discrepancies in APR/Apache.
> >
> > Can we basically agree that a "Bucket" can never be bigger than apr_ssize_t?
>
> Is the bucked backed by RAM?  If so, then I agree.  file buckets that can be sent down the chain for
> use by sendfile should not have this restriction. If you need, for whatever reason, to MMAP or read
> in the file, then sure apr_ssize_t is a reasonable upper limit (we'll set the actual limit much
> lower in practice).

The bigger issue is converting buckets from one type to another.  A brigade operation can 
always insert extra buckets, if it's necessary.  A bucket is a singleton, so _if_ a bucket 
must be convertable to another type of bucket, it can't have disparate sizes.

> > I've no problems with using apr_off_t for the length of a full Brigades itself.
> > That means we can split a brigade on any apr_off_t, but would only need to
> > split a bucket on an apr_ssize_t.  It implies a 'Pipe' bucket can't generate
> > more than 2^31 bytes without breaking the code.
> 
> I don't follow the comment about a pipe bucket.  Sure, if you attempt to buffer the entire pipe,
> there is a limit and 2^31 is not an unreasonable limit. In practice, we would never attempt to
> buffer this much.

Ack.  It goes to the size argument.  If you are doing a _brigade_ read, then the size remains
undefined.  If you convert it to a bucket, it's trapped into the 2^31 restriction.

> > This means a huge file would need to be split by the caller into multiple file
> > buckets, no longer than ssize_t.  Is this reasonable?
>
> Yes, provided this in no way implies that you cannot have a file_bucket that references an open fd
> to a file of arbitray size.

Well, if you leave the size undefined (-1) then you are fine.  If you attempt to convert it or
determine it's length, then we are messed up.


From: "Roy T. Fielding" <fi...@ebuilt.com>
Sent: Thursday, July 12, 2001 11:54 PM


> > This means a huge file would need to be split by the caller into multiple file
> > buckets, no longer than ssize_t.  Is this reasonable?
> 
> Wouldn't that make it difficult to call sendfile on a file bucket that
> points to a huge file?


I question if sendfile() called on a file of a given size would even succeed, or
crash on most largefile/sendfile compatible systems :-)





[PATCH]: lengths - brigades v.s. buckets.

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
The upshot of this patch:  Brigades deal with apr_off_t bytes, but a single bucket never
deals with more than apr_size_t bytes :)


From: "Roy T. Fielding" <fi...@ebuilt.com>
Sent: Thursday, July 12, 2001 11:54 PM

> Bill wrote some time ago;
>
> > This means a huge file would need to be split by the caller into multiple file
> > buckets, no longer than ssize_t [actually, apr_size_t].  Is this reasonable?
> 
> Wouldn't that make it difficult to call sendfile on a file bucket that
> points to a huge file?

Good question, let's see, from FreeBSD...

int sendfile(int fd, int s, off_t offset, size_t nbytes,
             struct sf_hdtr *hdtr, off_t *sbytes, int flags)

the offset is an off_t (that's healthy), but the bytes to send is a size_t.  Hmmm,
so we can't request more bytes, unless we leave it at 0.  We are told that we sent
up to off_t, a healthy big number.

It returns an off_t for bytes sent.  Interesting mishmash.

>From Linux glibc2...

int sendfile(int out_fd, int in_fd, off_t *offset, size_t count)

Ok, same story here.  Rather broken since we are only told of sending int bytes,
which is most definately a discrepancy (between 32 and 64 bit builds as well.)  
Nothing in the man page about using count==0 to send the remainder of the file.

I'd argue that sendfile against a huge file bucket is inherantly non-portable.  Lo and
behold, we defined apr_sendfile in terms of a size_t.

At the same time, apr_brigade_consume and apr_brigade_length need to be playing with apr_off_t,
since the aggregated size of all buckets (file+memory) can certainly exceed the memory space!!!  

Do we push these discrepancies on the apr implementor, or the user?  In today's patch,
I propose I push this back at the user.  I'd move the definition of 'size indeterminate' to
start of -1 (from the header, "If length == -1, start == -1") and length of (apr_size_t)(-1)
so we can have a full apr_size_t of data in a bucket.  Passing 0xffffffff for a read isn't
necessarily bad, since the actual length read should be returned.

If we want to pull these off the user, then we need to first modify apr_sendfile(),
and this patch can be used as the basis for changing the entire behavior.

Bill



Re: lengths - brigades v.s. buckets.

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.
> This means a huge file would need to be split by the caller into multiple file
> buckets, no longer than ssize_t.  Is this reasonable?

Wouldn't that make it difficult to call sendfile on a file bucket that
points to a huge file?

....Roy