You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apreq-dev@httpd.apache.org by Joe Schaefer <jo...@sunstarsys.com> on 2003/01/19 17:46:30 UTC

apreq-2 uploads as bucket brigades?

I'd like to drop the req->upload linked list of temp files,
and use bucket brigades in their place.  I'd planning to 
use a common apreq_param_t struct which extends apreq_value_t 
like so:

struct apreq_param_t {
    enum { ASCII, UTF_8, UTF_16, IS0_LATIN_1 } charset;
    char                *language;
    apreq_table_t       *info;  /* mime headers */

    apr_bucket_brigade  *bb;    /* represents file contents */

    apreq_value_t        v;
};


The main advantage of this approach would be that we can use
the full bucket api for managing the upload data.  The brigade
could use heap-allocated buckets for smaller uploads, and then
switch over to mmapped or file buckets after a certain size.
The bucket API makes this look pretty easy.

The main disadvantage of this approach would be that we'd need 
to support the full bucket api for managing the upload.
With actual files, seek(), dup(), link(), and buffered read()
are natural operations.  I think we'd have to provide a 
compatibility layer for some of these, perhaps using a 
meta-bucket at the front of the brigade?

-- 
Joe Schaefer

Re: apreq-2 uploads as bucket brigades?

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
>>
>>>I'd like to drop the req->upload linked list of temp files,
>>>and use bucket brigades in their place.  I'd planning to 
>>>use a common apreq_param_t struct which extends apreq_value_t 
>>>like so:
>>>
>>>struct apreq_param_t {
>>>    enum { ASCII, UTF_8, UTF_16, IS0_LATIN_1 } charset;
>>>    char                *language;
>>>    apreq_table_t       *info;  /* mime headers */
>>>
>>>    apr_bucket_brigade  *bb;    /* represents file contents */
>>>
>>>    apreq_value_t        v;
>>>};
>>>
>>>
>>>The main advantage of this approach would be that we can use
>>>the full bucket api for managing the upload data.  The brigade
>>>could use heap-allocated buckets for smaller uploads, and then
>>>switch over to mmapped or file buckets after a certain size.
>>>The bucket API makes this look pretty easy.
>>
>>And you can even write upload hook filters with that ;)
>>
>>I agree that using a polished bb API, will make apreq's code more
>>robust, and may be shorter?
> 
> 
> Right.  Hopefully it'll make it easier to use apreq-2 within an
> asynchronous IO environment as well.
> 
> By the way, is there a TIEHANDLE API for bucket brigades in modperl 
> 2?  The core issue I'm concerned about is: what happens when an
> apreq-2 application wants to read the whole brigade, i.e.
> 
>   my $fh = $upload->fh; 
>   print while <$fh>;
>   
> ? The file-buckets within the brigade will start generating 
> heap-buckets.  It *must* be our job to manage those additional 
> heap-buckets, not the application programmer's job.  Perl's
> refcounts should work well for this, but that might mean we'd 
> end up pushing the "FILE API" into the Perl glue and out 
> of the apreq-2 core.

Look at the streaming filters API:
http://perl.apache.org/docs/2.0/user/handlers/filters.html#Stream_oriented_Output_Filter
These can be adopted to work with upload objects as well.

A complete TIEHANDLE is planned for filters (implemented only partially), 
and should be easy to re-use for file uploads as well.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: apreq-2 uploads as bucket brigades?

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:
> > I'd like to drop the req->upload linked list of temp files,
> > and use bucket brigades in their place.  I'd planning to 
> > use a common apreq_param_t struct which extends apreq_value_t 
> > like so:
> > 
> > struct apreq_param_t {
> >     enum { ASCII, UTF_8, UTF_16, IS0_LATIN_1 } charset;
> >     char                *language;
> >     apreq_table_t       *info;  /* mime headers */
> > 
> >     apr_bucket_brigade  *bb;    /* represents file contents */
> > 
> >     apreq_value_t        v;
> > };
> > 
> > 
> > The main advantage of this approach would be that we can use
> > the full bucket api for managing the upload data.  The brigade
> > could use heap-allocated buckets for smaller uploads, and then
> > switch over to mmapped or file buckets after a certain size.
> > The bucket API makes this look pretty easy.
> 
> And you can even write upload hook filters with that ;)
> 
> I agree that using a polished bb API, will make apreq's code more
> robust, and may be shorter?

Right.  Hopefully it'll make it easier to use apreq-2 within an
asynchronous IO environment as well.

By the way, is there a TIEHANDLE API for bucket brigades in modperl 
2?  The core issue I'm concerned about is: what happens when an
apreq-2 application wants to read the whole brigade, i.e.

  my $fh = $upload->fh; 
  print while <$fh>;
  
? The file-buckets within the brigade will start generating 
heap-buckets.  It *must* be our job to manage those additional 
heap-buckets, not the application programmer's job.  Perl's
refcounts should work well for this, but that might mean we'd 
end up pushing the "FILE API" into the Perl glue and out 
of the apreq-2 core.

-- 
Joe Schaefer

Re: apreq-2 uploads as bucket brigades?

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:
> I'd like to drop the req->upload linked list of temp files,
> and use bucket brigades in their place.  I'd planning to 
> use a common apreq_param_t struct which extends apreq_value_t 
> like so:
> 
> struct apreq_param_t {
>     enum { ASCII, UTF_8, UTF_16, IS0_LATIN_1 } charset;
>     char                *language;
>     apreq_table_t       *info;  /* mime headers */
> 
>     apr_bucket_brigade  *bb;    /* represents file contents */
> 
>     apreq_value_t        v;
> };
> 
> 
> The main advantage of this approach would be that we can use
> the full bucket api for managing the upload data.  The brigade
> could use heap-allocated buckets for smaller uploads, and then
> switch over to mmapped or file buckets after a certain size.
> The bucket API makes this look pretty easy.

And you can even write upload hook filters with that ;)

I agree that using a polished bb API, will make apreq's code more robust, 
and may be shorter?

> The main disadvantage of this approach would be that we'd need 
> to support the full bucket api for managing the upload.
> With actual files, seek(), dup(), link(), and buffered read()
> are natural operations.  I think we'd have to provide a 
> compatibility layer for some of these, perhaps using a 
> meta-bucket at the front of the brigade?

Why would you really want to support all of them? Just croak for 
non-implemented ones and if wanted the implementation can come later.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: apreq-2 uploads as bucket brigades?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 06:24 PM 1/21/2003, Stas Bekman wrote:
>Joe Schaefer wrote:
>
>>>FILE*'s can be passed to Tcl to create new file handles, and this is
>>>something that I feel is useful, because it gives you a really simple
>>>API for dealing with smaller files, even if it's not terribly
>>>efficient.
>>
>>It is useful, natural, and convenient.  I'd just like to avoid making it mandatory like we did with apreq-1.  XForms will present a new set of needs for our mfd parser, and the extra flexibility of brigades over
>>a vanilla FILE* pointer could be a really big winner there.
>
>Though it's not crossplatform. That's why apr is using apr_file_t and provides the method to extract the native implementation (FILE/HANDLE/...). I believe we should be using apr_file_t and not FILE*.

I would agree (as an apr hack.)

Can I suggest you also consider the benefit of a custom bucket type?
It might be possible to support the seek/read/write model against the
set-aside file while still supporting the brigade read.  This *could* be
a very happy compromise.

I can't go into it tonight (trying to get the Win9X users on 2.0.44
unbroken) but I'd be happy to share some more thoughts on this
later in the week.

Bill



Re: apreq-2 uploads as bucket brigades?

Posted by "David N. Welton" <da...@dedasys.com>.
Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:

> >>FILE*'s can be passed to Tcl to create new file handles, and this
> >>is something that I feel is useful, because it gives you a really
> >>simple API for dealing with smaller files, even if it's not
> >>terribly efficient.

> > It is useful, natural, and convenient.  I'd just like to avoid
> > making it mandatory like we did with apreq-1.  XForms will present
> > a new set of needs for our mfd parser, and the extra flexibility
> > of brigades over a vanilla FILE* pointer could be a really big
> > winner there.

> Though it's not crossplatform. That's why apr is using apr_file_t
> and provides the method to extract the native implementation
> (FILE/HANDLE/...). I believe we should be using apr_file_t and not
> FILE*.

+1 to that.  Tcl uses the same thing (the native implementation), I
and it would be better to have a HANDLE instead of a FILE on windows.

-- 
David N. Welton
   Consulting: http://www.dedasys.com/
     Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
   Apache Tcl: http://tcl.apache.org/

Re: apreq-2 uploads as bucket brigades?

Posted by Stas Bekman <st...@stason.org>.
Joe Schaefer wrote:

>>FILE*'s can be passed to Tcl to create new file handles, and this is
>>something that I feel is useful, because it gives you a really simple
>>API for dealing with smaller files, even if it's not terribly
>>efficient.
> 
> 
> It is useful, natural, and convenient.  I'd just like to avoid making 
> it mandatory like we did with apreq-1.  XForms will present a new set 
> of needs for our mfd parser, and the extra flexibility of brigades over
> a vanilla FILE* pointer could be a really big winner there.

Though it's not crossplatform. That's why apr is using apr_file_t and provides 
the method to extract the native implementation (FILE/HANDLE/...). I believe 
we should be using apr_file_t and not FILE*.


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: apreq-2 uploads as bucket brigades?

Posted by Joe Schaefer <jo...@sunstarsys.com>.
davidw@dedasys.com (David N. Welton) writes:

> Joe Schaefer <jo...@sunstarsys.com> writes:
> 
> > Performance is one reason.  If we don't *need* apreq to use the OS's
> > filesystem for spooling, we shouldn't.  We can leave that up to the
> > application, but we could make it easy for them to pretend uploads
> > are always files by providing things like
> 
> >   apreq_upload_link(apreq_upload_t *upload)
> >   apreq_upload_seek(apreq_upload_t *upload)
> 
> > instead of just handing them a *FILE pointer to play with.
> 
> Forgive me if I haven't been following as closely as I should, but
> would it still be possible to get ahold of a FILE* somehow?  

I think so.  But I'd like to have the *FILE API to be lazy.
In other words, apreq shouldn't generate a *FILE internally unless

  1) the upload is too large to store in memory, or
  2) an apreq user (not a "hook author") wants to deal 
     with the upload data via FILE*, in which case apreq 
     can generate the FILE* handle if it hasn't already done so.

What I'm looking for is an internal brigade design that manages
such behavior, which is why I suggested using a meta bucket.

It would be really cool (hint) if someone took a crack at this while
I'm working on the rest of apreq-2's core.  On a related note, the 
header docs for apreq_tables.h and apreq_cookie.h are stable in the
current httpd-apreq-2 cvs, but certainly not complete.  People looking 
for things to do might want to have a look at fixing those, or 
contributing a documentation system (any favorites?), or writing unit 
tests for tables and cookies.  I expect to have a working core library 
within the next week, as well as one stand-alone environment available 
for testing.

> FILE*'s can be passed to Tcl to create new file handles, and this is
> something that I feel is useful, because it gives you a really simple
> API for dealing with smaller files, even if it's not terribly
> efficient.

It is useful, natural, and convenient.  I'd just like to avoid making 
it mandatory like we did with apreq-1.  XForms will present a new set 
of needs for our mfd parser, and the extra flexibility of brigades over
a vanilla FILE* pointer could be a really big winner there.

-- 
Joe Schaefer

Re: apreq-2 uploads as bucket brigades?

Posted by "David N. Welton" <da...@dedasys.com>.
Joe Schaefer <jo...@sunstarsys.com> writes:

> Performance is one reason.  If we don't *need* apreq to use the OS's
> filesystem for spooling, we shouldn't.  We can leave that up to the
> application, but we could make it easy for them to pretend uploads
> are always files by providing things like

>   apreq_upload_link(apreq_upload_t *upload)
>   apreq_upload_seek(apreq_upload_t *upload)

> instead of just handing them a *FILE pointer to play with.

Forgive me if I haven't been following as closely as I should, but
would it still be possible to get ahold of a FILE* somehow?  FILE*'s
can be passed to Tcl to create new file handles, and this is something
that I feel is useful, because it gives you a really simple API for
dealing with smaller files, even if it's not terribly efficient.

-- 
David N. Welton
   Consulting: http://www.dedasys.com/
     Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
   Apache Tcl: http://tcl.apache.org/

Re: apreq-2 uploads as bucket brigades?

Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Issac Goldstand" <ma...@beamartyr.net> writes:

> OK, but...
> Why not do both?  Eg, provide access via bucket brigades 
> and also keep the spooled files for more natural seek()ing 
> (,etc)?

Performance is one reason.  If we don't *need* apreq to use
the OS's filesystem for spooling, we shouldn't.  We can leave 
that up to the application, but we could make it easy for them 
to pretend uploads are always files by providing things like

  apreq_upload_link(apreq_upload_t *upload)
  apreq_upload_seek(apreq_upload_t *upload)
  ...

instead of just handing them a *FILE pointer to play with.

IOW, generate the upload structs from the underlying 
brigade, and let that struct be responsible for 
emulating whatever additional file-type APIs we need.

Of course the upload_hook API will also need to change, 
since the hooks would be responsible for generating the 
bucket brigade.  upload_hooks should be using the 
bucket brigade api, *not* our apreq_upload api.  The 
hooks are really parser extensions, so they should be 
stream-oriented anyways.

-- 
Joe Schaefer

Re: apreq-2 uploads as bucket brigades?

Posted by Issac Goldstand <ma...@beamartyr.net>.
OK, but...
Why not do both?  Eg, provide access via bucket brigades and also keep the
spooled files for more natural seek()ing (,etc)?

  Issac

----- Original Message -----
From: "Joe Schaefer"
Subject: apreq-2 uploads as bucket brigades?


>
> I'd like to drop the req->upload linked list of temp files,
> and use bucket brigades in their place.  I'd planning to
> use a common apreq_param_t struct which extends apreq_value_t
> like so:
>
> struct apreq_param_t {
>     enum { ASCII, UTF_8, UTF_16, IS0_LATIN_1 } charset;
>     char                *language;
>     apreq_table_t       *info;  /* mime headers */
>
>     apr_bucket_brigade  *bb;    /* represents file contents */
>
>     apreq_value_t        v;
> };
>
>
> The main advantage of this approach would be that we can use
> the full bucket api for managing the upload data.  The brigade
> could use heap-allocated buckets for smaller uploads, and then
> switch over to mmapped or file buckets after a certain size.
> The bucket API makes this look pretty easy.
>
> The main disadvantage of this approach would be that we'd need
> to support the full bucket api for managing the upload.
> With actual files, seek(), dup(), link(), and buffered read()
> are natural operations.  I think we'd have to provide a
> compatibility layer for some of these, perhaps using a
> meta-bucket at the front of the brigade?
>
> --
> Joe Schaefer
>