You are viewing a plain text version of this content. The canonical link for it is here.

Posted to apreq-dev@httpd.apache.org by Joe Schaefer <jo...@sunstarsys.com> on 2004/04/01 16:30:53 UTC

Re: [PROPOSAL] xml-related changes to apreq2

Stas Bekman <st...@stason.org> writes:

> Joe Schaefer wrote:

[...]

> > In so doing I've replaced the req->body table with new hooks in the
> > parser.  The hooks will provide a simple iterator api together with
> > a means of converting the iterated-over elts into apreq_param_t's.
> > This additional level of indirection will allow us to continue support
> > for the current apreq_param(s)-related APIs by using the iterator
> > api internally (instead of searching the old req->body table). 
> 
> 
> What's the performance hit added by this transition? I'd hate to see
> the library slow down, just because it suddenly wants to parse xml.

I can't say for certain, since I haven't implemented it yet.
However, I expect it to be something like this:

  1) existing parsers will not be negatively impacted, since they will 
     just relocate the req->body table to ctx->body. Future (xml)
     parsers will see a large benefit though, because they can use 
     whatever data type they want for storing their parsed data.

  2) The current accessor apis (apreq_param, apreq_params, apreq_upload,
     apreq_uploads) will take a performance hit due to the additional 
     indirection: they'll need to call req->parser->it_init() to get an 
     iterator, and then call req->parser->it_next() to get the result.  
     This is additional overhead when compared to a direct call to 
     apr_table_get(req->body, key), but I don't know how significant
     the extra overhead will be.

However, it has always been an express goal of apreq2 to provide support
for xml parsers, especially wrt XForms.  Without such a change, I don't 
think that XForms will happen in apreq2, since apr_tables are simply the 
wrong data structure for representing xml docs.

-- 
Joe Schaefer

Re: [PROPOSAL] xml-related changes to apreq2

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> [...]
> 
> 
>>So once modified you will never know what was the impact. Perhaps we
>>need to benchmark the two implementations (once you have it changed)
>>and see how bad it is and whether it's worth it.
> 
> 
> -0.  Either we make an effort to support xml, or we don't.  This
> is a design decision about apreq2's basic functionality. I'm certain 
> we can come up with an implementation that's sufficiently speedy
> (ie folks using Apache::Request would never know the difference),
> but if you're concerned about it now, perhaps you/someone should
> run some benchmarks on Apache::Request in current cvs, comparing
> it against CGI.pm as a baseline reference.

That would be comparing apples to oranges, IMHO. Last time I benchmarked the 
mp1 version CGI.pm was twice slower on the small inputs, and 4 times slower on 
a big bigger inputs and will grow the gap with the growing input size.
http://perl.apache.org/docs/1.0/guide/performance.html#Apache__args_vs__Apache__Request__param_vs__CGI__param

So I think if you want to benchmark the difference, you need to benchmark 
apreq itself before and after the change.

> Also bear in mind that apr_tables aren't particularly fast on
> large data sets, so we shouldn't try to pretend that tables are 
> the cat's meow.  For example, the old apreq_tables implementation 
> (in CVS) does lookups much faster and with far better scalability. 
> If we dumped req->body entirely, we no longer are confronted with 
> apreq2 choosing "the best data structure" for parsers to store their 
> data in.  Each parser would be able to choose what's best for itself.

I suppose we need to look at the 80/20 rule. If an average form parsing 
performs significantly better with the current code, vs the future one, we at 
least should consider some compile time optimizations which will somehow 
switch to using a faster (similar to the current) code. I don't say it has to 
be done immediately, but this is something that could be done in the future.

So for example you could say:

We need to move to the new "architecture" because of this and because of that, 
and after moving we have seen a performance hit for the average case of this 
particular input (.e.g HTML forms), therefore we may consider to have a 
compile time option to choose one branch over the other.

i.e. if we can tell that the difference is significant and the user will ever 
want to process only HTML forms, he may want to compile apreq with a special 
option which will use a shortcut and different table structure.

again, this all means nothing without the benchmark that will tell that there 
is a significant impact at all.

> [...]
> 
> 
>>Why again apreq2 is supposed to support xml? You never really
>>explained the reason for that goal. I wasn't aware there was such a
>>goal. 
> 
> 
> XForms etc. has been in the STATUS file since forever.  It's billed
> as the replacement for HTML forms, and specificies some interoperability
> between multipart/form-data, application/x-www-urlencoding and
> application/xml.  Currently we're lacking a parser api suitable for 
> the xml component of XForms (you really can't stuff an xml document 
> into an apr_table).

Thanks, now I understand why. It was in the STATUS file forever indeed but it 
never explained why.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: [PROPOSAL] xml-related changes to apreq2

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Stas Bekman <st...@stason.org> writes:

[...]

> So once modified you will never know what was the impact. Perhaps we
> need to benchmark the two implementations (once you have it changed)
> and see how bad it is and whether it's worth it.

-0.  Either we make an effort to support xml, or we don't.  This
is a design decision about apreq2's basic functionality. I'm certain 
we can come up with an implementation that's sufficiently speedy
(ie folks using Apache::Request would never know the difference),
but if you're concerned about it now, perhaps you/someone should
run some benchmarks on Apache::Request in current cvs, comparing
it against CGI.pm as a baseline reference.

Also bear in mind that apr_tables aren't particularly fast on
large data sets, so we shouldn't try to pretend that tables are 
the cat's meow.  For example, the old apreq_tables implementation 
(in CVS) does lookups much faster and with far better scalability. 
If we dumped req->body entirely, we no longer are confronted with 
apreq2 choosing "the best data structure" for parsers to store their 
data in.  Each parser would be able to choose what's best for itself.

[...]

> Why again apreq2 is supposed to support xml? You never really
> explained the reason for that goal. I wasn't aware there was such a
> goal. 

XForms etc. has been in the STATUS file since forever.  It's billed
as the replacement for HTML forms, and specificies some interoperability
between multipart/form-data, application/x-www-urlencoding and
application/xml.  Currently we're lacking a parser api suitable for 
the xml component of XForms (you really can't stuff an xml document 
into an apr_table).

-- 
Joe Schaefer

Re: [PROPOSAL] xml-related changes to apreq2

Posted by Stas Bekman <st...@stason.org>.

Joe Schaefer wrote:
> Stas Bekman <st...@stason.org> writes:
> 
> 
>>Joe Schaefer wrote:
> 
> 
> [...]
> 
> 
>>>In so doing I've replaced the req->body table with new hooks in the
>>>parser.  The hooks will provide a simple iterator api together with
>>>a means of converting the iterated-over elts into apreq_param_t's.
>>>This additional level of indirection will allow us to continue support
>>>for the current apreq_param(s)-related APIs by using the iterator
>>>api internally (instead of searching the old req->body table). 
>>
>>
>>What's the performance hit added by this transition? I'd hate to see
>>the library slow down, just because it suddenly wants to parse xml.
> 
> 
> I can't say for certain, since I haven't implemented it yet.
> However, I expect it to be something like this:
> 
>   1) existing parsers will not be negatively impacted, since they will 
>      just relocate the req->body table to ctx->body. Future (xml)
>      parsers will see a large benefit though, because they can use 
>      whatever data type they want for storing their parsed data.
> 
>   2) The current accessor apis (apreq_param, apreq_params, apreq_upload,
>      apreq_uploads) will take a performance hit due to the additional 
>      indirection: they'll need to call req->parser->it_init() to get an 
>      iterator, and then call req->parser->it_next() to get the result.  
>      This is additional overhead when compared to a direct call to 
>      apr_table_get(req->body, key), but I don't know how significant
>      the extra overhead will be.

So once modified you will never know what was the impact. Perhaps we need to 
benchmark the two implementations (once you have it changed) and see how bad 
it is and whether it's worth it.

> However, it has always been an express goal of apreq2 to provide support
> for xml parsers, especially wrt XForms.  Without such a change, I don't 
> think that XForms will happen in apreq2, since apr_tables are simply the 
> wrong data structure for representing xml docs.

Why again apreq2 is supposed to support xml? You never really explained the 
reason for that goal. I wasn't aware there was such a goal.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com