You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Marc Slemko <ma...@znep.com> on 1998/10/21 07:45:05 UTC

sendfile API

Does anyone have any comments, positive or negative, on the following API
for a sendfile() implementation:

int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr *hdtr,
        off_t *sbytes, int flags)

   fd is the file descriptor, s is the socket descriptor, nbytes is the number
of bytes to send (0 means send until EOF).
   If hdtr is non-NULL, headers and/or trailers will be sent. sf_hdtr has the
following structure:
/*
 * sendfile(2) header/trailer struct
 */
struct sf_hdtr {
        struct iovec *headers;  /* pointer to an array of header struct iovec's */
        int hdr_cnt;            /* number of header iovec's */
        struct iovec *trailers; /* pointer to an array of trailer struct iovec's */
        int trl_cnt;            /* number of trailer iovec's */
};

   *sbytes is an optional pointer for returning the number of bytes actually
sent on the socket. flags is currently unused, but may be used for future
auto-disconnect and un-bind() flags.
   sendfile(2) returns 0 for success. It returns -1 if an error occurs, with
errno set to the error and *sbytes set to the number of bytes that were sent
prior to the error.
   The only limitation this API appears to have is that nbytes is a size_t,
which is 32bits. Thus if you want to send less than the whole file, but more
than 4GB, you must do it in 4GB or less chunks via multiple calls. I don't
think this will be a serious problem, especially since most usage of
sendfile(2) will likely be with nbytes=0 (send until EOF).


Re: sendfile API

Posted by Marc Slemko <ma...@znep.com>.
On Wed, 21 Oct 1998, Manoj Kasichainula wrote:

> On Tue, Oct 20, 1998 at 10:45:05PM -0700, Marc Slemko wrote:
> >    The only limitation this API appears to have is that nbytes is a size_t,
> > which is 32bits. Thus if you want to send less than the whole file, but more
> > than 4GB, you must do it in 4GB or less chunks via multiple calls. I don't
> > think this will be a serious problem, especially since most usage of
> > sendfile(2) will likely be with nbytes=0 (send until EOF).
> 
> No big deal here, but what does an OS with > 4GB files use to indicate
> sizes? Can we use that type instead?

It gets kinda complicatedon OSes that are on 32 bit hardware and support
64 bit file sizes.  Not overly worth worrying about.

> 
> What situation are we talking about here specifically? A particular OS
> or a wrapper function? That might affect comment on the API call.

A particular OS.


Re: sendfile API

Posted by Manoj Kasichainula <ma...@io.com>.
On Tue, Oct 20, 1998 at 10:45:05PM -0700, Marc Slemko wrote:
>    The only limitation this API appears to have is that nbytes is a size_t,
> which is 32bits. Thus if you want to send less than the whole file, but more
> than 4GB, you must do it in 4GB or less chunks via multiple calls. I don't
> think this will be a serious problem, especially since most usage of
> sendfile(2) will likely be with nbytes=0 (send until EOF).

No big deal here, but what does an OS with > 4GB files use to indicate
sizes? Can we use that type instead?

What situation are we talking about here specifically? A particular OS
or a wrapper function? That might affect comment on the API call.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Only reason I can't survive is if I'm dead or something" -- Mike Tyson

Re: sendfile API

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 21 Oct 1998, Ben Hyde wrote:

> 
> Somebody said:
> > ... A 0-length write() or
> > writev() element would do well as a flush,
> 
> bleck

and then I followed that with "or an ioctl()"  ;)  'cause yeah the
0-length thing is bleck, unless there's a socketopt that turns it on.

Dean


Re: sendfile API

Posted by Ben Hyde <bh...@pobox.com>.
Somebody said:
> ... A 0-length write() or
> writev() element would do well as a flush,

bleck

Re: sendfile API

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 20 Oct 1998, Marc Slemko wrote:

> Does anyone have any comments, positive or negative, on the following API
> for a sendfile() implementation:
> 
> int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr *hdtr,
>         off_t *sbytes, int flags)
> 
>    fd is the file descriptor, s is the socket descriptor, nbytes is the number
> of bytes to send (0 means send until EOF).
>    If hdtr is non-NULL, headers and/or trailers will be sent. sf_hdtr has the
> following structure:
> /*
>  * sendfile(2) header/trailer struct
>  */
> struct sf_hdtr {
>         struct iovec *headers;  /* pointer to an array of header struct iovec's */
>         int hdr_cnt;            /* number of header iovec's */
>         struct iovec *trailers; /* pointer to an array of trailer struct iovec's */
>         int trl_cnt;            /* number of trailer iovec's */
> };
> 
>    *sbytes is an optional pointer for returning the number of bytes actually
> sent on the socket. flags is currently unused, but may be used for future
> auto-disconnect and un-bind() flags.
>    sendfile(2) returns 0 for success. It returns -1 if an error occurs, with
> errno set to the error and *sbytes set to the number of bytes that were sent
> prior to the error.
>    The only limitation this API appears to have is that nbytes is a size_t,
> which is 32bits. Thus if you want to send less than the whole file, but more
> than 4GB, you must do it in 4GB or less chunks via multiple calls. I don't
> think this will be a serious problem, especially since most usage of
> sendfile(2) will likely be with nbytes=0 (send until EOF).

Apache wouldn't use it with nbytes == 0.  Timeouts are zeroed when
progress is made, rather than an absolute number that bounds how long a
client can take... 

I personally find these "combine a zillion syscalls into one" syscalls
very distasteful.  The real reason that you want headers and trailers in
this call is because NAGLE is dumb, and because the BSD socket API is
dumb.  Your choices are:  write() causes a network packet (i.e. no nagle),
and write() causes a delay before a network packet (i.e. nagle).  Neither
are what Apache (and pretty much all other servers) want.  Apache wants: 
write() causes any number of MSS packets to be sent, the last to be held
until an explicit "flush" operation sends it (a timeout is fine too). That
way you can do a series of write()s and writev()s and sendfile()s and
whatever you want and the kernel never stupidly inserts a packet boundary,
and never stupidly delays sending a packet.  A 0-length write() or
writev() element would do well as a flush, as would a special case
ioctl().  (This is the way the linux folks want to go with this...) 

There's no way to distinguish an error on fd from an error on s in your
interface -- there's only one errno... this is a fundamental problem with
sendfile() style stuff... dunno what to do about it.

Dean