You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Chris Datfung <ch...@gmail.com> on 2011/03/30 12:17:25 UTC

Apache2::Filter Intermittently Missing Injected String

I have a script that uses Apache2::Filter to filter the server response
output and inject a string into the HTML body. The script normally works
fine expect intermittently the output is missing the injected string. This
happens around 10% of the time. I verified that there is enough memory and
CPU available and tried playing with the buffer size, but to no avail.

The server is running:

Apache 2.2.17-2
Modperl 2.0.4-7

Any explanation for why the script fails 10% of the time?

Thanks
Chris

Re: Apache2::Filter Intermittently Missing Injected String

Posted by Adam Prime <ad...@utoronto.ca>.
I wrote a module based on a talk Geoff Young gave a bazillion years ago 
to abstract this problem away (sort of).  You can check it out here:

http://search.cpan.org/~aprime/Apache2-Filter-TagAware-0.02/lib/Apache2/Filter/TagAware.pm

Adam

On 3/31/2011 12:30 AM, Chris Datfung wrote:
> On Wed, Mar 30, 2011 at 12:36 PM, Hendrik Schumacher <hs@activeframe.de
> <ma...@activeframe.de>> wrote:
>
>     Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
>
>     I had a similar problem with a http proxy that injected a string
>     into the
>     HTML body. If the response is passed to the filter in multiple parts
>     there
>     is a certain probability that the response is split on the string
>     position
>     you are looking for (for example part 2 ends with "</bo" and part 3
>     starts
>     with "dy>"). I had to buffer the last bytes of each response part
>     and take
>     them into account
>
>
> Hi Hendrik,
>
> That is exactly the problem. How did you buffer the last bytes of each
> response. Don't you just set the BUFF_LEN and thats the number of
> characters you get?
>
> Chris
>


Re: Apache2::Filter Intermittently Missing Injected String

Posted by Torsten Förtsch <to...@gmx.net>.
On Thursday, March 31, 2011 12:08:03 Chris Datfung wrote:
> y string is always within the first 5000 bytes, but setting BUFF_LEN to
> 8000 did not help as the buffer still sometimes gets cut after ~2500 bytes
> or so. Do you know of any way to force the bucket to be a certain length?

To my knowledge there is no such device.

But you can accumulate the content of the buckets in $f->ctx until there is 
enough of it. If the current brigade does not have enough data simply do not 
pass it on to the next filter.

You have to watch out for flush end eos buckets, though.

If the string you are looking for is always within the first 5000 bytes that 
should not cause problems. Avoid, however, to accumulate the whole response.

Something like:

sub filter {
  my ($f, $bb)=@_;
  my $mybb=$f->ctx;
  $f->ctx($mybb=APR::Brigade->new($f->r->pool, $f->c->bucket_alloc))
    unless $mybb;
  $mybb->concat($bb);
  if( $mybb->lengh>=5000 ) {
    $mybb->flatten(my $buf);
    $buf=~s/.../.../;
    $mybb->cleanup;
    $mybb->insert_tail(APR::Bucket->new($mybb->bucket_alloc, $buf));
    my $rc=$f->next->pass_brigade($mybb);
    $mybb->destroy;
    $f->remove;
    $rc==APR::Const::SUCCESS or return $rc;
  }
  return Apache2::Const::OK;
}

You still have to add code to check for flush and eos buckets.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net

Re: Apache2::Filter Intermittently Missing Injected String

Posted by Hendrik Schumacher <hs...@activeframe.de>.
Hi Chris,

my example implementation doesnt assume a string cut-off at a certain
place. If your search string has a length of 7 bytes, the "worst case" is
that one buffer contains the first 6 bytes and the next buffer the last
one. If the string is cut at another place you just carry over a little
bit too much (but it doesnt hurt as long as you make sure that the
replacement takes place only once).
I dont think that you can force the bucket to be a certain length. I dont
know how it is handled exactly but I would assume that you get passed any
content that is flushed by the previous handler/filter. The only thing you
could possibly control is the threshold at which Apache does an automatic
flushing of the output buffer.

Hendrik

Am Do, 31.03.2011, 12:08 schrieb Chris Datfung:
> Hi Hendrik,
>
> That seems like a good work around assuming the string gets cut off at the
> same place each time. Thanks for that, in my case, I'm not certain that it
> does. I thought the BUFF_LEN constant defines how many bytes should be
> read.
> My string is always within the first 5000 bytes, but setting BUFF_LEN to
> 8000 did not help as the buffer still sometimes gets cut after ~2500 bytes
> or so. Do you know of any way to force the bucket to be a certain length?
>
> Thanks
> Chris
>
> On Thu, Mar 31, 2011 at 10:07 AM, Hendrik Schumacher
> <hs...@activeframe.de>wrote:
>
>> Am Do, 31.03.2011, 06:30 schrieb Chris Datfung:
>> > On Wed, Mar 30, 2011 at 12:36 PM, Hendrik Schumacher
>> > <hs...@activeframe.de>wrote:
>> >
>> >> Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
>> >>
>> >> I had a similar problem with a http proxy that injected a string into
>> >> the
>> >> HTML body. If the response is passed to the filter in multiple parts
>> >> there
>> >> is a certain probability that the response is split on the string
>> >> position
>> >> you are looking for (for example part 2 ends with "</bo" and part 3
>> >> starts
>> >> with "dy>"). I had to buffer the last bytes of each response part and
>> >> take
>> >> them into account
>> >
>> >
>> > Hi Hendrik,
>> >
>> > That is exactly the problem. How did you buffer the last bytes of each
>> > response. Don't you just set the BUFF_LEN and thats the number of
>> > characters
>> > you get?
>> >
>> > Chris
>> >
>>
>> You have to handle the "last bytes buffer" yourself. If you use the
>> f->read approach of Apache2::Filter, you could use the following
>> (untested
>> and probably not very efficient):
>>
>> my $lastbytes = undef;
>> my $done = undef;
>> while ($filter->read(my $buffer, $wanted)) {
>> {
>>  if ($lastbytes)
>>  {
>>    $buffer = $lastbytes.$buffer;
>>    $lastbytes = undef;
>>  }
>>  if (not $done)
>>  {
>>    if ($buffer =~ s/<\/body>/$injection<\/body>/)
>>    {
>>      $done = 1;
>>    }
>>    else
>>    {
>>      $lastbytes = substr ($buffer, -6); # length of string to search - 1
>>      $buffer = substr ($buffer, 0, -6);
>>    }
>>  }
>>  $filter->print($buffer);
>> }
>> if ($filter->seen_eos && $lastbytes) {
>>  $filter->print($lastbytes);
>> }
>>
>> If you are using the callback approach, you would have to store
>> $lastbytes
>> somewhere (eg in $filter->ctx) and make sure to flush $lastbytes on eos.
>>
>> Hendrik
>>
>>
>>
>



Re: Apache2::Filter Intermittently Missing Injected String

Posted by Chris Datfung <ch...@gmail.com>.
Hi Hendrik,

That seems like a good work around assuming the string gets cut off at the
same place each time. Thanks for that, in my case, I'm not certain that it
does. I thought the BUFF_LEN constant defines how many bytes should be read.
My string is always within the first 5000 bytes, but setting BUFF_LEN to
8000 did not help as the buffer still sometimes gets cut after ~2500 bytes
or so. Do you know of any way to force the bucket to be a certain length?

Thanks
Chris

On Thu, Mar 31, 2011 at 10:07 AM, Hendrik Schumacher <hs...@activeframe.de>wrote:

> Am Do, 31.03.2011, 06:30 schrieb Chris Datfung:
> > On Wed, Mar 30, 2011 at 12:36 PM, Hendrik Schumacher
> > <hs...@activeframe.de>wrote:
> >
> >> Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
> >>
> >> I had a similar problem with a http proxy that injected a string into
> >> the
> >> HTML body. If the response is passed to the filter in multiple parts
> >> there
> >> is a certain probability that the response is split on the string
> >> position
> >> you are looking for (for example part 2 ends with "</bo" and part 3
> >> starts
> >> with "dy>"). I had to buffer the last bytes of each response part and
> >> take
> >> them into account
> >
> >
> > Hi Hendrik,
> >
> > That is exactly the problem. How did you buffer the last bytes of each
> > response. Don't you just set the BUFF_LEN and thats the number of
> > characters
> > you get?
> >
> > Chris
> >
>
> You have to handle the "last bytes buffer" yourself. If you use the
> f->read approach of Apache2::Filter, you could use the following (untested
> and probably not very efficient):
>
> my $lastbytes = undef;
> my $done = undef;
> while ($filter->read(my $buffer, $wanted)) {
> {
>  if ($lastbytes)
>  {
>    $buffer = $lastbytes.$buffer;
>    $lastbytes = undef;
>  }
>  if (not $done)
>  {
>    if ($buffer =~ s/<\/body>/$injection<\/body>/)
>    {
>      $done = 1;
>    }
>    else
>    {
>      $lastbytes = substr ($buffer, -6); # length of string to search - 1
>      $buffer = substr ($buffer, 0, -6);
>    }
>  }
>  $filter->print($buffer);
> }
> if ($filter->seen_eos && $lastbytes) {
>  $filter->print($lastbytes);
> }
>
> If you are using the callback approach, you would have to store $lastbytes
> somewhere (eg in $filter->ctx) and make sure to flush $lastbytes on eos.
>
> Hendrik
>
>
>

Re: Apache2::Filter Intermittently Missing Injected String

Posted by Hendrik Schumacher <hs...@activeframe.de>.
Am Do, 31.03.2011, 06:30 schrieb Chris Datfung:
> On Wed, Mar 30, 2011 at 12:36 PM, Hendrik Schumacher
> <hs...@activeframe.de>wrote:
>
>> Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
>>
>> I had a similar problem with a http proxy that injected a string into
>> the
>> HTML body. If the response is passed to the filter in multiple parts
>> there
>> is a certain probability that the response is split on the string
>> position
>> you are looking for (for example part 2 ends with "</bo" and part 3
>> starts
>> with "dy>"). I had to buffer the last bytes of each response part and
>> take
>> them into account
>
>
> Hi Hendrik,
>
> That is exactly the problem. How did you buffer the last bytes of each
> response. Don't you just set the BUFF_LEN and thats the number of
> characters
> you get?
>
> Chris
>

You have to handle the "last bytes buffer" yourself. If you use the
f->read approach of Apache2::Filter, you could use the following (untested
and probably not very efficient):

my $lastbytes = undef;
my $done = undef;
while ($filter->read(my $buffer, $wanted)) {
{
  if ($lastbytes)
  {
    $buffer = $lastbytes.$buffer;
    $lastbytes = undef;
  }
  if (not $done)
  {
    if ($buffer =~ s/<\/body>/$injection<\/body>/)
    {
      $done = 1;
    }
    else
    {
      $lastbytes = substr ($buffer, -6); # length of string to search - 1
      $buffer = substr ($buffer, 0, -6);
    }
  }
  $filter->print($buffer);
}
if ($filter->seen_eos && $lastbytes) {
  $filter->print($lastbytes);
}

If you are using the callback approach, you would have to store $lastbytes
somewhere (eg in $filter->ctx) and make sure to flush $lastbytes on eos.

Hendrik



Re: Apache2::Filter Intermittently Missing Injected String

Posted by Chris Datfung <ch...@gmail.com>.
On Wed, Mar 30, 2011 at 12:36 PM, Hendrik Schumacher <hs...@activeframe.de>wrote:

> Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
>
> I had a similar problem with a http proxy that injected a string into the
> HTML body. If the response is passed to the filter in multiple parts there
> is a certain probability that the response is split on the string position
> you are looking for (for example part 2 ends with "</bo" and part 3 starts
> with "dy>"). I had to buffer the last bytes of each response part and take
> them into account


Hi Hendrik,

That is exactly the problem. How did you buffer the last bytes of each
response. Don't you just set the BUFF_LEN and thats the number of characters
you get?

Chris

Re: Apache2::Filter Intermittently Missing Injected String

Posted by Hendrik Schumacher <hs...@activeframe.de>.
Am Mi, 30.03.2011, 12:17 schrieb Chris Datfung:
> I have a script that uses Apache2::Filter to filter the server response
> output and inject a string into the HTML body. The script normally works
> fine expect intermittently the output is missing the injected string. This
> happens around 10% of the time. I verified that there is enough memory and
> CPU available and tried playing with the buffer size, but to no avail.
>
> The server is running:
>
> Apache 2.2.17-2
> Modperl 2.0.4-7
>
> Any explanation for why the script fails 10% of the time?
>
> Thanks
> Chris
>

I had a similar problem with a http proxy that injected a string into the
HTML body. If the response is passed to the filter in multiple parts there
is a certain probability that the response is split on the string position
you are looking for (for example part 2 ends with "</bo" and part 3 starts
with "dy>"). I had to buffer the last bytes of each response part and take
them into account when looking for the search-string in the next part. I
dont know though if this is possible in Apache2::Filter or if this is your
problem at all.

Hendrik