You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Drew Bertola <dr...@jupiterhosting.com> on 2006/12/31 03:55:04 UTC

splitting a string...

Happy New Year, everyone.

I need some quick help with handling my content in an output filter.

I'd like to know how I can get the portion of client content before the
<head> tag (if it exists), and the portion after it.  That way, I can
insert something just after the <head> tag (adult content warnings, etc).

I seem to get stuck right after finding the position of the head tag.  I
can send on the tail end, but don't know how to capture the beginning
portion:


...
      apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

      if ( ( position = strcasestr(str, search_tag) ) == NULL )
        {
          /*
           * If we didn't find the <head> tag, just pass along
           * everything to the next filter and we're done.
           */
          ap_fputs(f->next, ctx->bb, str);
        }
      else
        {
          /*
           * so, we have a <head> tag.  So, lets find and process it
           * and insert our notice.
           */

          tail = position + 6;

          head = ???  <<<<<<<<< Here's where I'm stuck...
         
          ap_fputs(f->next, ctx->bb, "<!-- start test -->\n");

          /*
            ap_fputs(f->next, ctx->bb, head);
          */
          ap_fputs(f->next, ctx->bb, "<html>\n<head>\n<meta />\n");
          ap_fputs(f->next, ctx->bb, "<!-- after head, before tail -->\n");
          ap_fputs(f->next, ctx->bb, tail);
          ap_fputs(f->next, ctx->bb, "<!-- end test -->\n");
        }
    }

  return APR_SUCCESS;
}   


There's much more to code to make this thing work right, but this is my
first hurdle.
Please point out any other major idiotic mistakes I'm making.

Thanks,
--
Drew

Re: splitting a string...

Posted by Nick Kew <ni...@webthing.com>.
On Sun, 31 Dec 2006 12:18:17 -0800
Drew Bertola <dr...@jupiterhosting.com> wrote:

[ in a thread I seem to be missing the start of ]

> > If str was \0 terminated, you would not need a length. In the above
> > case, you better move the bucket you just read.
> >   
> 
> How can I ensure it's null terminated?

(depends where it's coming from, but probably moot)

> [chop]
> 
> This prints everything up to my head tag, then my required warning
> meta tag is inserted, then everything after my head tag.

If it's markup you're parsing, use a markup parser.  Bearing in mind
that it's a nontrivial wheel to reinvent (your tag may have whitspace
including linebreaks in it, may span more than one bucket or even
brigade, and might also come from a 16-bit character set), you'd be
better advised to use an existing parser.

mod_publisher may be what you need.  Alternatively, see
mod_proxy_html for a simpler HTML-aware output filter.
Or mod_line_edit for non-markup-aware (sed-like) parsing
of arbitrary text (they're all at apache.webthing.com).
Or follow my .sig to where there's a tutorial on filtering
by direct manipulation of buckets.


-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Dienstag, den 02.01.2007, 13:15 -0800 schrieb Drew Bertola:
> I made a mental note to revisit apr_strmatch and friends.  I have to
> find an example of how to allocate the pool used in the precompile.

I doubt that this is worth the effort, but a child init handler, that
initialises a static would be the place to do this.

See http://www.google.com/codesearch?q=ap_hook_child_init for examples.

Sincerely,
Joachim



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Joe Lewis wrote:
> Make SURE you are not using things like strcasestr - it is not
> platform independent (requires GNU source definitions).  Besides, you
> may want to change that to a series of apr_strmatch_precompile and
> apr_strmatch commands to do the searching - those take an additional
> parameter for the length of the bucket, and give you another added
> precaution against the dreaded NULL-termination issues across
> platforms and other modules.  Joe Orton helped me a great deal when I
> started finding problems that I thought were PHP bugs, but really
> allowed me to adapt for any lazy module that gets written.
Thanks, Joe.

I made a mental note to revisit apr_strmatch and friends.  I have to
find an example of how to allocate the pool used in the precompile.

It would be nice if I could attach it to the server pool so that it
would only be precompiled once, no?

Any help there?

--
Drew


Re: splitting a string...

Posted by Joe Lewis <jo...@joe-lewis.com>.
Drew Bertola wrote:
>>   
>>     
>>>          while ( i < len )
>>>            {
>>>              ap_fputc(f->next, ctx->bb, str[i++]);
>>>            }
>>>     
>>>       
>> This is a performance hog.
>>   
>>     
>
> With this, I don't have the segfaults anymore.
>
> html is ok too.
>
> --
> Drew
>   
I would expect it to be a performance hog.  You are playing with 
creating a good number of individual buckets.

Make SURE you are not using things like strcasestr - it is not platform 
independent (requires GNU source definitions).  Besides, you may want to 
change that to a series of apr_strmatch_precompile and apr_strmatch 
commands to do the searching - those take an additional parameter for 
the length of the bucket, and give you another added precaution against 
the dreaded NULL-termination issues across platforms and other modules.  
Joe Orton helped me a great deal when I started finding problems that I 
thought were PHP bugs, but really allowed me to adapt for any lazy 
module that gets written.

If this information is not enough to help, let me know, and I will rip a 
lot of the code out of my template wrapper and shoot that to you (and 
hope it works in that state - I won't be doing the "checks" on it).

Joe

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Montag, den 01.01.2007, 06:18 -0800 schrieb Drew Bertola:
> Joachim Zobel wrote:
> > Am Montag, den 01.01.2007, 01:48 -0800 schrieb Drew Bertola:
> >   
> >>           APR_BUCKET_REMOVE(e);
> >>           APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
> >>     
> 
> This generates segfaults when processing php and doesn't help with my
> problem.   Is the problem due to filter order?

No. The problem is due to not removing the buckets you copy.

Sincerely,
Joachim



Re: splitting a string...

Posted by Joe Lewis <jo...@joe-lewis.com>.
Drew Bertola wrote:
> Again, makes me wish for a null output filter example.
>   
Try this link - it was written about 6 years ago, but still applies.  
It's a Ryan Bloom article (he was one of the APR developers).  The link :

http://www.serverwatch.com/news/article.php/1127731

> My understanding is that it would need this:
>
> - create context if it doesn't already exist.
>   
Only if you are going to use it.
> - loop through buckets (from FIRST to SENTINEL) in brigade passed to
> filter appending each bucket to my context's brigade.
>   
Yes.
> - pass my brigade to next filter.
>   
Yes.  (If they are not passed, you will not get that data).

[snip]

> - do I need to look out for APR_BUCKET_IS_EOS or APR_BUCKET_IS_FLUSH or
> are they implicit before APR_BRIGADE_IS_SENTINEL?
>   
Watch for them..
> - do I need to use ap_pass_brigade()? I've used it here before returning.
>   
Yes.  When done, always pass a brigade, whether it is a new one or the 
original brigade (modified or not).


Joe

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Dienstag, den 02.01.2007, 13:29 -0800 schrieb Drew Bertola:
> >   for ( e = APR_BRIGADE_FIRST(bb) ;
> >         e != APR_BRIGADE_SENTINEL(bb) ;
> >         e = APR_BUCKET_NEXT(e) ) {


> - create context if it doesn't already exist.
> - loop through buckets (from FIRST to SENTINEL) in brigade passed to
> filter appending each bucket to my context's brigade.

The moment you append e the value of APR_BUCKET_NEXT(e) changes to your
context brigades sentinel.

> - pass my brigade to next filter.
> 
> So, shouldn't this work? ...

> - do I need to look out for APR_BUCKET_IS_EOS or APR_BUCKET_IS_FLUSH or
> are they implicit before APR_BRIGADE_IS_SENTINEL?

Yes and no. You can have multiple brigades and only the last one holds
_the_ EOS bucket.

> - do I need to use ap_pass_brigade()? I've used it here before returning.

That depends, but usually you will use it. ap_pass_brigade is misnomed.
It passes the buckets held by the parameter brigade to the next filters
brigade. Alternatively you probably could append to the next filters
brigade.

See also
http://www.heute-morgen.de/modules/doc/well_behaved_filters.html

Sincerely,
Joachim



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Joachim Zobel wrote:
> Am Dienstag, den 02.01.2007, 01:14 -0800 schrieb Drew Bertola:
>   
>> line 91 looks like this: 
>>
>>       apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);
>>
>> Also, it only happens if I use
>>
>>           APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
>>     
>
> Ah, understood.
>
> You don't mention this, but you probably have a
>
>   for ( e = APR_BRIGADE_FIRST(bb) ;
>         e != APR_BRIGADE_SENTINEL(bb) ;
>         e = APR_BUCKET_NEXT(e) ) {
>
> for your loop. Right?
>
> So if you _move_ e to another brigade, the e != APR_BRIGADE_SENTINEL(bb)
> will never be fullfilled and APR_BUCKET_NEXT(e) will step through the
> wrong brigade and will treat the sentinel as a bucket. This causes the
> observed segfault.

Are you sure about that?

Again, makes me wish for a null output filter example.

My understanding is that it would need this:

- create context if it doesn't already exist.
- loop through buckets (from FIRST to SENTINEL) in brigade passed to
filter appending each bucket to my context's brigade.
- pass my brigade to next filter.

So, shouldn't this work? ...


static int null_filter(ap_filter_t *f, apr_bucket_brigade *bb)
{
  null_filter_struct *ctx = f->ctx;
  apr_bucket *e;

  /*
   * if we don't have a context for this filter, let's create one and
   * create it's bucket brigade.
   */
  if ( ! ctx )
    {
      f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
      ctx->bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
    }

  /*
   * let's loop through the buckets passed to us.
   */
  for( e  = APR_BRIGADE_FIRST(bb);
       e != APR_BRIGADE_SENTINEL(bb);
       e  = APR_BUCKET_NEXT(e) )
    {
      /*
       * if the bucket is an end of stream bucket or a flush bucket,
       * we can pass on what we have so far and be done with this brigade.
       */
      if ( APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH(e) )
        {
          APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
          APR_BUCKET_REMOVE(e);

          ap_pass_brigade(f->next, ctx->bb);

          return APR_SUCCESS;
        }

      APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
      APR_BUCKET_REMOVE(e);
    }

  ap_pass_brigade(f->next, ctx->bb);

  return APR_SUCCESS;   
}


I'm not sure about these things:

- do I need to look out for APR_BUCKET_IS_EOS or APR_BUCKET_IS_FLUSH or
are they implicit before APR_BRIGADE_IS_SENTINEL?

- do I need to use ap_pass_brigade()? I've used it here before returning.

--
Drew

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Dienstag, den 02.01.2007, 01:14 -0800 schrieb Drew Bertola:
> line 91 looks like this: 
> 
>       apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);
> 
> Also, it only happens if I use
> 
>           APR_BRIGADE_INSERT_TAIL(ctx->bb, e);

Ah, understood.

You don't mention this, but you probably have a

  for ( e = APR_BRIGADE_FIRST(bb) ;
        e != APR_BRIGADE_SENTINEL(bb) ;
        e = APR_BUCKET_NEXT(e) ) {

for your loop. Right?

So if you _move_ e to another brigade, the e != APR_BRIGADE_SENTINEL(bb)
will never be fullfilled and APR_BUCKET_NEXT(e) will step through the
wrong brigade and will treat the sentinel as a bucket. This causes the
observed segfault.

You could instead copy the bucket using

/**
 * Copy a bucket.
 * @param e The bucket to copy
 * @param c Returns a pointer to the new bucket
 */
#define apr_bucket_copy(e,c) (e)->type->copy(e, c)

And then delete the bucket at the end of the loop using 

/**
 * Delete a bucket by removing it from its brigade (if any) and then
 * destroying it.
 * @remark This mainly acts as an aid in avoiding code verbosity.  It is
 * the preferred exact equivalent to:
 * <pre>
 *      APR_BUCKET_REMOVE(e);
 *      apr_bucket_destroy(e);
 * </pre>
 * @param e The bucket to delete
 */
#define apr_bucket_delete(e) do {					\
        APR_BUCKET_REMOVE(e);						\
        apr_bucket_destroy(e);						\
    } while (0)

Since buckets do reference counting this is not copying of the data, so
its relatively cheap. There are other solutions, which work equally
well.

Sincerely,
Joachim



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Joachim Zobel wrote:
> Am Montag, den 01.01.2007, 06:18 -0800 schrieb Drew Bertola:
>   
>> This generates segfaults when processing php and doesn't help with my
>> problem.   Is the problem due to filter order?
>>     
>
> To find out more about the segf's do 
[snip - thanks]


I'm confused about the cause of the segfaults.  Here's what gdb is
showing me (again, only for php files)...

tail end of gdb /usr/sbin/httpd core.xxxxx:

Loaded symbols for /usr/lib/libart_lgpl_2.so.2
Failed to read a valid object file image from memory.
Core was generated by `/usr/sbin/httpd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000058 in ?? ()


and the full backtrace shows:

#0  0x00000058 in ?? ()
No symbol table info available.
#1  0x001347bf in lt_insert_filter (f=0xa0cbd30, bb=0xa0cc168)
    at lt_insert.c:91
        len = 4104
        str = 0xa0d3cb4 "000108 nothing just filling up space,\n000109
nothing just filling up space,\n000110 nothing just filling up
space,\n000111 nothing just filling up space,\n000112 nothing just
filling up space,\n000113 not"...
        position = <value optimized out>
        i = 4123
        ctx = (lt_insert_struct *) 0xa0cc188
        e = (apr_bucket *) 0xa0cc194
#2  0x007b8f60 in ap_pass_brigade () from /usr/sbin/httpd
No symbol table info available.


line 91 looks like this: 

      apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

Also, it only happens if I use

          APR_BRIGADE_INSERT_TAIL(ctx->bb, e);

rather than

          i = 0;
          while ( i < len )
            {
              ap_fputc(f->next, ctx->bb, str[i++]);
            }


which works well, but, as you note, is a bit expensive.

I'd love to get my hands on a well written "null_output_filter_module".

Thanks,
--
Drew

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Montag, den 01.01.2007, 06:18 -0800 schrieb Drew Bertola:
> This generates segfaults when processing php and doesn't help with my
> problem.   Is the problem due to filter order?

To find out more about the segf's do 

# in the config
CoreDumpDirectory /tmp

# on the command line
ulimit -c unlimited

and restart apache.

Then use gdb to analyze the coredump.

Sincerely,
Joachim 



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Joachim Zobel wrote:
> Am Montag, den 01.01.2007, 01:48 -0800 schrieb Drew Bertola:
>   
>>           APR_BUCKET_REMOVE(e);
>>           APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
>>     

This generates segfaults when processing php and doesn't help with my
problem.   Is the problem due to filter order?

> This is what you should dow with every bucket you dont handle.
>
>   
>>          while ( i < len )
>>            {
>>              ap_fputc(f->next, ctx->bb, str[i++]);
>>            }
>>     
>
> This is a performance hog.
>   

With this, I don't have the segfaults anymore.

html is ok too.

--
Drew

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Montag, den 01.01.2007, 01:48 -0800 schrieb Drew Bertola:
>           APR_BUCKET_REMOVE(e);
>           APR_BRIGADE_INSERT_TAIL(ctx->bb, e);

This is what you should dow with every bucket you dont handle.

>          while ( i < len )
>            {
>              ap_fputc(f->next, ctx->bb, str[i++]);
>            }

This is a performance hog.

Your problem comes from copying the buckets contents with ap_putc while
leaving the original bucket in.

Sincerely,
Joachim



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Thanks for all the ideas.   They've come in very handy.  I haven't done
extensive testing, and the module is still too stupid to read
configuration data, but I've gotten much farther.

Right now, static html works fine, but php pages (need both per customer
requirements) spit out too much at the tail end.  I'll post my code
below for review, as well as a snippet of the php page code as
delivered.  The problem is much worse when the pages are very large, so
the example php code uses a loop to print a comment 1000 times.  I
suspect I'm filling the brigade or a bucket to full, or not doing
something to tell it where the end is.

Please send me any and all feedback that may help (stupid mistakes or
items I've overlooked most of all).

Here's my php test code:

<?php

echo "<html>\n";
echo "<head>\n";

echo "<!-- ";

for ( $i = 0; $i < 1000; $i++ )
  {
    echo "nothing just filling up space,\n";
  }

echo "-->\n";
echo "<title>PHP Test Page</title>\n";
echo "</head>\n";
echo "<body>\n";
echo "Now, I say goodbye at " . date("m/d/Y") . "\n";
echo "</body>\n";
echo "</html>\n";

Here's what php code looks like when delivered:

<html>
<head>
<meta yadayada sadssssssssssssssssssss />

<!-- nothing just filling up space,
nothing just filling up space,

[snip ~1000 lines the same]

nothing just filling up space,
-->
<title>PHP Test Page</title>
</head>
<body>
Now, I say goodbye at 01/01/2007
</body>
</html>                      <---- looks good to here, but then...
ng just filling up space,
nothing just filling up space,

[snip ~1000 lines the same]

nothing just filling up space,
-->
<title>PHP Test Page</title>
</head>
<body>

Now, I say goodbye at 01/01/2007
</body>
</html>




And here's the module:

#define DEBUG 1

#include "httpd.h"
#include "http_config.h"
#include "http_log.h"
#include "http_request.h"
#include "apr_general.h"
#include "apr_strings.h"
#include "apr_buckets.h"
#include "util_filter.h"
#include <string.h>

module AP_MODULE_DECLARE_DATA lt_insert_module;

typedef struct lt_insert_t {
  apr_bucket_brigade *bb;
} lt_insert_struct;

static int lt_insert_filter(ap_filter_t *f,
                            apr_bucket_brigade *bb)
{
  lt_insert_struct *ctx = f->ctx;
  apr_bucket *e;
 
  if ( ! ctx )
    {
      f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
      ctx->bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
    }

  for( e = APR_BRIGADE_FIRST(bb);
       e != APR_BRIGADE_SENTINEL(bb);
       e = APR_BUCKET_NEXT(e) )
    {
      if ( APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH(e) )
        {
          APR_BUCKET_REMOVE(e);
          APR_BRIGADE_INSERT_TAIL(ctx->bb, e);
          ap_pass_brigade(f->next, ctx->bb);
          return APR_SUCCESS;
        }

      apr_size_t len;
      const char *str;
      const char *search_tag = "<head>";
      const char *insert_line = "<meta yadayada sadssssssssssssssssssss
/>\n";
      char *position = NULL;
      char *tail = NULL;
      int i = 0;
      int insert_done = 0;

      apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

      if ( ( ( position = strcasestr(str, search_tag) ) == NULL ) ||
           ( insert_done ) )
        {
          /*
           * If we didn't find the <head> tag, just pass along
           * everything to the next filter and we're done.
           */
          i = 0;

          while ( i < len )
            {
              ap_fputc(f->next, ctx->bb, str[i++]);
            }
        }
      else
        {
          /*
           * so, we have a <head> tag.  So, lets find and process it
           * and insert our notice.
           */
          tail = position + strlen(search_tag);

          i = 0;

          while ( (str + i) < ( position + strlen(search_tag) ) )
            {
              ap_fputc(f->next, ctx->bb, str[i++]);
            }

          ap_fputs(f->next, ctx->bb, "\n");
          ap_fputs(f->next, ctx->bb, insert_line);

          while ( i < len )
            {
              ap_fputc(f->next, ctx->bb, str[i++]);
            }
         
          insert_done = 1;
        }
    }

  return APR_SUCCESS;   
}

static void register_hooks(apr_pool_t *p)
{
  ap_register_output_filter("LT_INSERT",
                            lt_insert_filter,
                            NULL,
                            AP_FTYPE_CONTENT_SET);
}

module AP_MODULE_DECLARE_DATA lt_insert_module =
  {
    STANDARD20_MODULE_STUFF,
    NULL,                    /* create per-directory config structure */
    NULL,                    /* merge per-directory config structures */
    NULL,                    /* create per-server config structure    */
    NULL,                    /* merge per-server config structures    */
    NULL,                    /* command apr_table_t                   */
    register_hooks           /* register hooks                        */
  };

----------------------------------------------------------------

--
Drew



Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Sonntag, den 31.12.2006, 19:02 -0700 schrieb Joe Lewis:
> ... split bucket brigades ...

You mean splitting buckets, not brigades, do you? To avoid a
misunderstanding here are the relevant API functions.

>>From apr_buckets.h
------------------------------------------------------------------------
/**
 * Split one bucket in two.
 * @param e The bucket to split
 * @param point The offset to split the bucket at
 */
#define apr_bucket_split(e,point) (e)->type->split(e, point)

/**
 * Split one bucket in two at the specified position by duplicating
 *  the bucket structure (not the data) and modifying any necessary
 *  start/end/offset information.  If it's not possible to do this
 *  for the bucket type (perhaps the length of the data is
indeterminate,
 *  as with pipe and socket buckets), then APR_ENOTIMPL is returned.
 * @param e The bucket to split
 * @param point The offset of the first byte in the new bucket
 */
apr_status_t (*split)(apr_bucket *e, apr_size_t point);



Re: splitting a string...

Posted by Joe Lewis <jo...@joe-lewis.com>.
Joachim Zobel wrote:
> Hi.
>
> Rethinkinking your problem I found that it is probably much easier to
> split the bucket in three, namely before_head, head, and after_head
> using apr_bucket_split and replace the head bucket.
>   
That is exactly the technique I use in my template wrapper - I look 
through the buckets on output, and when I see the <title> tag, I have to 
make a copy of that.  Once I have it, I grab the template file, slap the 
title into a "variable" in the page, paste in the rest of the head tag 
if the page had one, and parse just a few other components.  The BEST 
way to do that is to split bucket brigades, and then just insert copys 
of the data in a bucket form, and delete the "variable" results.  I 
think a Ralph Bloom had a good "how to" for a simpler "template" 
wrapping module on a web page write up somewhere.  But, I second 
Joachim's advice - bucket brigades (that is what they were designed for, 
too).

Joe
> Did you read this:
> http://www.cs.virginia.edu/~jcw5q/talks/apache/bucketbrigades.ac2002.pdf
>
> Sincerely,
> Joachim
>
>
>   


Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Hi.

Rethinkinking your problem I found that it is probably much easier to
split the bucket in three, namely before_head, head, and after_head
using apr_bucket_split and replace the head bucket.

Did you read this:
http://www.cs.virginia.edu/~jcw5q/talks/apache/bucketbrigades.ac2002.pdf

Sincerely,
Joachim



Re: splitting a string...

Posted by Drew Bertola <dr...@jupiterhosting.com>.
Joachim Zobel wrote:
> Hi.
>
> One recommendation beforehand: If you can live with the memory
> footprint, use mod_perl. It gives you full access to the apache API and
> it is much easier to handle. 
>   

Thanks for the response.  I considered mod_perl, but we're trying to
keep this very light weight.

> If str was \0 terminated, you would not need a length. In the above
> case, you better move the bucket you just read.
>   

How can I ensure it's null terminated?

> You need to punch a \0 into str, the you can use it as head. 
>   

Yes, I worked this out, but was getting a lot of segfaults.  I'm now doing:


          tail = position + strlen(search_tag);

          while ( (str + i) < ( position + strlen(search_tag) ) )
            {
              ap_fputc(f->next, ctx->bb, str[i++]);
            }

          ap_fputs(f->next, ctx->bb, "\n");
          ap_fputs(f->next, ctx->bb, insert_line);
          ap_fputs(f->next, ctx->bb, tail);

This prints everything up to my head tag, then my required warning meta
tag is inserted, then everything after my head tag.

The problem is that my short html test page works fine, but my short php
test page has garbage at the end of it.  I believe this is left over in
the bucket brigade.  Any clues?

>         
> You also need to handle the case that <head> is broken into 2 parts in 2
> buckets.
>   

OK, I'll look at that, too. 

Thanks,
--
Drew

Re: splitting a string...

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Hi.

One recommendation beforehand: If you can live with the memory
footprint, use mod_perl. It gives you full access to the apache API and
it is much easier to handle. 

Am Samstag, den 30.12.2006, 18:55 -0800 schrieb Drew Bertola:
> I'd like to know how I can get the portion of client content before the
> <head> tag (if it exists), and the portion after it.  That way, I can
> insert something just after the <head> tag (adult content warnings, etc).
> 
> I seem to get stuck right after finding the position of the head tag.  I
> can send on the tail end, but don't know how to capture the beginning
> portion:
> 
> 
> ...
>       apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);
> 
>       if ( ( position = strcasestr(str, search_tag) ) == NULL )
>         {
>           /*
>            * If we didn't find the <head> tag, just pass along
>            * everything to the next filter and we're done.
>            */
>           ap_fputs(f->next, ctx->bb, str);

If str was \0 terminated, you would not need a length. In the above
case, you better move the bucket you just read.

>         }
>       else
>         {
>           /*
>            * so, we have a <head> tag.  So, lets find and process it
>            * and insert our notice.
>            */
> 
>           tail = position + 6;
> 
>           head = ???  <<<<<<<<< Here's where I'm stuck...

You need to punch a \0 into str, the you can use it as head. 
        
You also need to handle the case that <head> is broken into 2 parts in 2
buckets.

Sincerely,
Joachim