You are viewing a plain text version of this content. The canonical link for it is here.

Posted to apreq-dev@httpd.apache.org by Andy Grundman <an...@hybridized.org> on 2007/03/31 04:39:55 UTC

Parsing error when parsing the second time

I'm trying to use APR::Request outside of Apache to parse application/ 
x-www-form-urlencoded POST data.  I'm using libapreq2-2.08.

I have some working code mostly taken from the libapreq2 tests, but  
I'm running into an issue where the second time I try to parse  
something, nothing is parsed.  The code I'm using is below along with  
the output.  Is there something I'm forgetting to reset before  
calling the function a second time?

#!/usr/bin/perl

use strict;
use warnings;

use APR::Pool;
use APR::Brigade;
use APR::Bucket;
use APR::BucketAlloc;
use APR::Request;
use APR::Request::Parser;

urlencoded();
urlencoded();

sub urlencoded {
     my $pool = APR::Pool->new;
     my $ba   = APR::BucketAlloc->new($pool);
     my $bb   = APR::Brigade->new($pool, $ba);

     my $buf = 'a=1&b=2';
     $bb->insert_tail(
         APR::Bucket->new( $ba, $buf )
     );

     warn "Parsing $buf\n";

     my $parser = APR::Request::Parser->urlencoded(
         $pool,
         $ba,
         'application/x-www-form-urlencoded',
     );

     my $req = APR::Request::Custom->handle(
         $pool,
         '',
         '',
         $parser,
         7,
         $bb,
     );

     my $table = $req->param;
     $table->do( sub {
         my ( $key, $value ) = @_;
         warn "$key => $value\n";
     } );
}

----

Output:

Parsing a=1&b=2
a => 1
b => 2
Parsing a=1&b=2

--
Andy Grundman
andy@hybridized.org

Re: Parsing error when parsing the second time

Posted by Andy Grundman <an...@hybridized.org>.

On Apr 30, 2007, at 6:57 PM, Joe Schaefer wrote:

> Joe Schaefer <jo...@sunstarsys.com> writes:
>
>> Andy Grundman <an...@hybridized.org> writes:
>>
>>> I'm working on trying to improve the performance of Catalyst's  
>>> body parsing.
>>> We're currently using the all-Perl HTTP::Body, and it  actually  
>>> beats
>>> APR::Request for urlencoded data.  The regexes are  pretty  
>>> simple, so this
>>> isn't too surprising.
>>
>> I just ran a few microbenchmarks comparing apreq's urldecoding  
>> parser to
>> HTTP::Body, and apreq came out about 10x faster.  How are you getting
>> your results?
>
> The test script is I used is here (requires svn trunk):
>
>    http://people.apache.org/~joes/testing_apreq2_vs_http_body.pl
>
> when I run it with an arg of 10000, here's what it produced:
>
> Benchmark: timing 10000 iterations of apreq_args, apreq_body,  
> http_body...
> apreq_args:  2 wallclock secs ( 1.89 usr +  0.00 sys =  1.89 CPU) @  
> 5291.01/s (n=10000)
> apreq_body:  4 wallclock secs ( 3.81 usr +  0.00 sys =  3.81 CPU) @  
> 2624.67/s (n=10000)
>  http_body: 70 wallclock secs (69.84 usr +  0.00 sys = 69.84 CPU) @  
> 143.18/s (n=10000)
>
> I'm guessing you benchmarked by throwing lots of requests at a
> webserver, in which case you probably hit a bottleneck somewhere
> unrelated to the actual parsing.

Thanks Joe, I was using a simple benchmark script but may have been  
using the wrong apreq method or something.  I'll give your script a try.

-Andy

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Joe Schaefer <jo...@sunstarsys.com> writes:

> Andy Grundman <an...@hybridized.org> writes:
>
>> I'm working on trying to improve the performance of Catalyst's body parsing.
>> We're currently using the all-Perl HTTP::Body, and it  actually beats
>> APR::Request for urlencoded data.  The regexes are  pretty simple, so this
>> isn't too surprising.
>
> I just ran a few microbenchmarks comparing apreq's urldecoding parser to
> HTTP::Body, and apreq came out about 10x faster.  How are you getting
> your results?

The test script is I used is here (requires svn trunk):

   http://people.apache.org/~joes/testing_apreq2_vs_http_body.pl

when I run it with an arg of 10000, here's what it produced:

Benchmark: timing 10000 iterations of apreq_args, apreq_body, http_body...
apreq_args:  2 wallclock secs ( 1.89 usr +  0.00 sys =  1.89 CPU) @ 5291.01/s (n=10000)
apreq_body:  4 wallclock secs ( 3.81 usr +  0.00 sys =  3.81 CPU) @ 2624.67/s (n=10000)
 http_body: 70 wallclock secs (69.84 usr +  0.00 sys = 69.84 CPU) @ 143.18/s (n=10000)

I'm guessing you benchmarked by throwing lots of requests at a
webserver, in which case you probably hit a bottleneck somewhere
unrelated to the actual parsing.

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

> I'm working on trying to improve the performance of Catalyst's body parsing.
> We're currently using the all-Perl HTTP::Body, and it  actually beats
> APR::Request for urlencoded data.  The regexes are  pretty simple, so this
> isn't too surprising.

I just ran a few microbenchmarks comparing apreq's urldecoding parser to
HTTP::Body, and apreq came out about 10x faster.  How are you getting
your results?

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

> On Mar 31, 2007, at 12:42 AM, Joe Schaefer wrote:
>
>>> I'm working on trying to improve the performance of Catalyst's body
>>> parsing. We're currently using the all-Perl HTTP::Body, and it
>>> actually beats APR::Request for urlencoded data.  The regexes are
>>> pretty simple, so this isn't too surprising.
>>
>> Still pretty impressive that you can do it faster than apreq2 can.
>> Have you tried comparing it with apreq2's query string parser?
>> That one's not stream oriented, so it should be a bit faster than
>> our body parser.
>
> I think I avoided this because I wasn't sure if it took into account
> any kind of query string length limit.  But I'll check the code and
> give it a try. 

There is no limit on query string length in libapreq2.

>
>> One of our grand plans is to divorce APR from mod-perl iteself,
>> so it can ship as a standalone set of wrappers for libapr and
>> libaprutil.  Another is to fold apreq2's stuff into the
>> various projects it interacts with.
>>
>> That would significantly lower the barrier to entry for apreq,
>> because it wouldn't even exist as a separate distribution anymore.
>> It would just be part of all the other stuff.
>
> Cool.  I guess there's a lot of handy memory-handling code in libapr
> that would have to be replaced in order to have a completely
> standalone module. 
>

The idea would be to bundle our stuff with libapr(util) and httpd and APR,
so we'd keep the dependency without forcing users to deal with it.
Part of the problem we face is acceptance amongst those communities
to take responsibility for our code.  If the code isn't accompanied
by a few developers capable of maintaining it, then it will never get
accepted elsewhere.

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Andy Grundman <an...@hybridized.org>.

On Mar 31, 2007, at 12:42 AM, Joe Schaefer wrote:

>> I'm working on trying to improve the performance of Catalyst's body
>> parsing. We're currently using the all-Perl HTTP::Body, and it
>> actually beats APR::Request for urlencoded data.  The regexes are
>> pretty simple, so this isn't too surprising.
>
> Still pretty impressive that you can do it faster than apreq2 can.
> Have you tried comparing it with apreq2's query string parser?
> That one's not stream oriented, so it should be a bit faster than
> our body parser.

I think I avoided this because I wasn't sure if it took into account  
any kind of query string length limit.  But I'll check the code and  
give it a try.

> One of our grand plans is to divorce APR from mod-perl iteself,
> so it can ship as a standalone set of wrappers for libapr and
> libaprutil.  Another is to fold apreq2's stuff into the
> various projects it interacts with.
>
> That would significantly lower the barrier to entry for apreq,
> because it wouldn't even exist as a separate distribution anymore.
> It would just be part of all the other stuff.

Cool.  I guess there's a lot of handy memory-handling code in libapr  
that would have to be replaced in order to have a completely  
standalone module.

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

> On Mar 31, 2007, at 12:02 AM, Joe Schaefer wrote:
>
>> You're wandering into largely unchartered territory
>> by using APR:: outside of mod-perl, so  I would try
>> to be conservative if possible, and follow the examples
>> in the test code. APR::Pool has a good set of tests,
>> so I'm really not sure the problem is there or somewhere
>> in apreq's xs.  It just stood out that you were using
>> a new pool for each subroutine call, which probably
>> isn't the best way to do things.
>
> OK, thanks for the advice.  I'm actually a bit surprised this is
> uncharted territory.

Well I use it, but I don't know how many other people do.

> I'm working on trying to improve the performance of Catalyst's body
> parsing. We're currently using the all-Perl HTTP::Body, and it
> actually beats APR::Request for urlencoded data.  The regexes are
> pretty simple, so this isn't too surprising.

Still pretty impressive that you can do it faster than apreq2 can.
Have you tried comparing it with apreq2's query string parser?
That one's not stream oriented, so it should be a bit faster than
our body parser.

> Where I hope it will help a lot more is multipart/form-data parsing, but I
> don't have this working yet to see.  I also want to use it to  replace some
> uses of URI::Escape which is fairly slow compared to apreq.
>
> The reason I want it to work outside of Apache is so that people running
> Catalyst is FastCGI mode

Yeah. I've been hoping for a fastcgi module for apreq2 as well.

> , or using one of the standalone Perl servers, can benefit from the
> improved performance.  I have a grand plan of building a standalone XS
> module that does a lot of what apreq does, but without all the Apache
> requirements.  URI escaping/ unescaping, header parsing, cookies,
> query strings, body data, etc.  A lot of the apreq code could be used
> in something like this I think.  The barrier to entry with apreq is
> pretty high at the moment, when compared to a standard CPAN module.

One of our grand plans is to divorce APR from mod-perl iteself,
so it can ship as a standalone set of wrappers for libapr and
libaprutil.  Another is to fold apreq2's stuff into the
various projects it interacts with.

That would significantly lower the barrier to entry for apreq,
because it wouldn't even exist as a separate distribution anymore.
It would just be part of all the other stuff.

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Andy Grundman <an...@hybridized.org>.

On Mar 31, 2007, at 12:02 AM, Joe Schaefer wrote:

> You're wandering into largely unchartered territory
> by using APR:: outside of mod-perl, so  I would try
> to be conservative if possible, and follow the examples
> in the test code. APR::Pool has a good set of tests,
> so I'm really not sure the problem is there or somewhere
> in apreq's xs.  It just stood out that you were using
> a new pool for each subroutine call, which probably
> isn't the best way to do things.

OK, thanks for the advice.  I'm actually a bit surprised this is  
uncharted territory.

I'm working on trying to improve the performance of Catalyst's body  
parsing.  We're currently using the all-Perl HTTP::Body, and it  
actually beats APR::Request for urlencoded data.  The regexes are  
pretty simple, so this isn't too surprising.

Where I hope it will help a lot more is multipart/form-data parsing,  
but I don't have this working yet to see.  I also want to use it to  
replace some uses of URI::Escape which is fairly slow compared to apreq.

The reason I want it to work outside of Apache is so that people  
running Catalyst is FastCGI mode, or using one of the standalone Perl  
servers, can benefit from the improved performance.  I have a grand  
plan of building a standalone XS module that does a lot of what apreq  
does, but without all the Apache requirements.  URI escaping/ 
unescaping, header parsing, cookies, query strings, body data, etc.   
A lot of the apreq code could be used in something like this I  
think.  The barrier to entry with apreq is pretty high at the moment,  
when compared to a standard CPAN module.

-Andy

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

> The APR::Pool docs warn about memory leakage if using a global pool,
> do you think I should be calling $pool->clear after every time I use
> it?

No.  I don't know how you're planning to make use of
APR::Request in a standalone application, but I would
recommend creating a global pool and letting perl
clean it up at program exit.  If you run short of
memory, calling $pool->clear somewhere in the middle
of your program will hose all your APR::Request structs,
so try to avoid that.

You're wandering into largely unchartered territory
by using APR:: outside of mod-perl, so  I would try
to be conservative if possible, and follow the examples 
in the test code. APR::Pool has a good set of tests,
so I'm really not sure the problem is there or somewhere
in apreq's xs.  It just stood out that you were using
a new pool for each subroutine call, which probably
isn't the best way to do things.

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Andy Grundman <an...@hybridized.org>.

On Mar 30, 2007, at 11:39 PM, Joe Schaefer wrote:

> Andy Grundman <an...@hybridized.org> writes:
>
>>
>> urlencoded();
>> urlencoded();
>>
>> sub urlencoded {
>>     my $pool = APR::Pool->new;
>
> The pool cleanup code is pretty wicked in the APR::Pool xs code.
> If you can get away with using a single global pool, I think
> you'll be better off.

Hey what do you know, moving the pool out of the sub does indeed fix  
the problem!

I also tried $pool->clear and $pool->destroy at the end of the sub,  
but it didn't help.

The APR::Pool docs warn about memory leakage if using a global pool,  
do you think I should be calling $pool->clear after every time I use it?

Thanks,
-Andy

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

>
> urlencoded();
> urlencoded();
>
> sub urlencoded {
>     my $pool = APR::Pool->new;

The pool cleanup code is pretty wicked in the APR::Pool xs code.
If you can get away with using a single global pool, I think
you'll be better off.

-- 
Joe Schaefer

Re: Parsing error when parsing the second time

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Andy Grundman <an...@hybridized.org> writes:

> I'm trying to use APR::Request outside of Apache to parse application/
> x-www-form-urlencoded POST data.  I'm using libapreq2-2.08.
>
> I have some working code mostly taken from the libapreq2 tests, but
> I'm running into an issue where the second time I try to parse
> something, nothing is parsed.  The code I'm using is below along with
> the output.  Is there something I'm forgetting to reset before
> calling the function a second time?

The underlying apreq_handle_custom() function was missing an initializer
for bytes_read.  The pool cleanup was just enough to jigger around the
allocator to fill those bits with something non-zero, which triggered
the bug.  It should be fixed in apreq's trunk now.

-- 
Joe Schaefer