You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/11/02 18:34:43 UTC

Re: Mod_include design

On Fri, Oct 27, 2000 at 12:10:11PM -0700, rbb@covalent.net wrote:
> Paul Reder wrote:
> > > Plus, this really should be implemented as a real parser, not the current
> > > hack we have.
> >  
> > I'm assuming you aren't looking for a full YACC/LEXX implemented parser for
> > this simple setup, just something cleaner and more formal than the kludged
> > string matcher that is currently there.
> 
> Yes.

Paul -- yes, a clean parser is much more desirable. From your initial note,
I think you are definitely on the right track: a state machine to record
where you are in the parsing algorithm (and lexing if you separate that
out), and storing context into the filter context (at one point, Ryan said
something about a "static variable" -- I'll smack him for that next time I
see him (he broke headers because he held context that way); use the context
field like you intended).

I believe that you can probably update mod_include in several waves. In
fact, I would highly recommend this, so that the individual changes can be
reviewed. This also shows people progress, and keeps us from thinking "damn.
mod_include isn't going anywhere and needs to be fixed now". (I was berating
Ryan at the conference for breaking mod_include, then not fixing it; he
pointed out you were working on it, but that was news since I hadn't seen
anything (yet)).

What would be slick, and would help out your parsing work is to document the
grammar and put it into comments within mod_include. Once a rigorous grammar
is present, then it will be easier for us reviewers to check the work
against that grammar.

It seems that the discussion also mentioned handling of the tag and how it
might be split across buckets. The tag isn't going to be that long... memcpy
the darn thing into a holding buffer so you can do a strcmp(). I don't know
how long an individual directive may be, but it might be advantageous to
see if a maximum exists and copy directives into a holding buffer. NOTE:
this doesn't conflict with the "no copy" desires of buckets. The bucket of a
mod_include file is raw text rather than directives; that raw text won't get
copied (since you'd only copy to the buffer when a directive is seen). Using
a buffer for the tag/directives will simplify the context structure that you
posted: you'd only need a buffer for the directive that you're accumulating.
Any brigade contents from before the directive will have been passed. Any
brigade contents after a directive simply hasn't been parsed (yet) by the
mod_include filter and is still in a param/local variable on the execution
stack (cuz you won't return from the filter until you've completed parsing
through it; the only state for a return is "in a directive [and the
directive buffer is being filled]" or "outside a directive [and I'm not
holding anything in the directive buffer or a set-aside brigade]").
[ quick question: what *is* the max length? the grammar will help with that ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Nov 02, 2000 at 11:59:47AM -0800, rbb@covalent.net wrote:
> 
> > But: it definitely depends on how bounded the size of a directive can be. I
> > mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
> > then we just put that buffer into the ctx and run with it.
> 
> I wouldn't personally put it into the ctx, because this will all be done
> in a single call to the filter,

What do you mean? The directive can arrive over multiple (not single!) calls
to the filter. We can put a set-aside brigade into the context, or we can
alloc the buffer once and put it there (and copy stuff onto the end). It is
more efficient to copy from the brigade into the buffer, than to copy during
a set-aside, then copy again into a buffer.

> but yes if the directive length is
> bounded, we could copy.  Unfortunately, I don't think the length is
> bounded, so I think we need to be able to deal with partial tags in a
> bucket.

Well, we need to see the grammar to determine the bound. At a minimum, the
tags are bounded and I would guess certain arguments/tokens are bounded;
these can/should use a buffer to hold them during parsing.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by "Paul J. Reder" <re...@raleigh.ibm.com>.
Wow, I'm a little overwhelmed here. After a dearth of comments or input I took
my eyes off my mail for 1/2 a day and get flooded. Please allow me to 
apologize up front for the length of this post. Allow me to try to catch
up and comment on the topics brought up...

States:
==========================================================================
The listed states in the first post were simply my initial take on states.
They acted as a place holder in the design for future parsing states. As
for whether tag state is required: My feeling is that it is safer and more
accurate checking a state variable instead of trying to make an inference
from a pointer being NULL/non-NULL. I also feel that keeping state during
the tag parsing is much more efficient. Parsing the tag is more than doing 
5 byte comparisons. Consider the following pathological case:

"<", "!", "-", "-", "#" each in their own brigades.

Ryan suggested that I just squirrel away the buckets into a set aside
brigade in the ctx then restart the tag checking at the beginning each
time mod_include is called.

This means that I re-iterate through not just a couple of byte compares for
each set aside bucket, but also the code required to read from the buckets,
and track compared positions. This isn't a lot of overhead, but it is more
than just 5 byte compares, and is easily avoided.

By the time I am done there are likely to be a number of other states used
during the directive parsing. These additional states will never be visible
across invocations of mod_include because they are only entered once the
full tag has been found. These states could be stored locally, but I feel
keeping all of the state information in one place makes more sense.

In summary: The number of states will grow to include parsing states. Tag state
is useful for code readability (state == X vs. brgd_ptr != NULL) and for 
efficiency. And for crying out loud, the state variables are just an int and
a few pointers.

Max Buffer Size:
===========================================================================
In 1.3 Apache allowed each token to be a max of HUGE_STRING_LEN (8192 bytes).
The code did not read the whole directive in, it read bytes from what it
viewed as a file stream. I would guess that most tags would be less than 
1KB total length, with some going more.

My feeling is to have a default buffer of say 512 or 1024, then alloc a larger
one if required (see next comment section about copying). This would be a
VERY local buffer only existing during the life of the send_parsed_content
function. It could be allocated and freed safely within that function, or
allocated from the request pool and cleaned up automagically later. I like
the local alloc and free better personally.

Copying:
===========================================================================
This will probably send Ryan into convulsions ;), but after studying the code
and having conversations with a couple of people and now having received 
unsolicited agreement from Greg, I feel that making a copy once is the only
reasonable way to go.

Granted, Ryan is correct that the bucket handling code is contained in a couple
of functions. The problem is that these functions get called frequently and that
these function do a copy anyway. And not just a copy but a pool alloc and copy
for the values. So I don't think there can be any debate about copying. It is
currently happening and there is no way to completely remove copying. It would
be impossible, for example, to execute the setting of a variable with a value
that spans buckets.

What I suggest is to make one copy. There is still debate about whether to copy
as you go or copy once you have the whole thing. My feeling is that copying once
you have the whole thing allows you to determine how much space is required so
you can either use the local buffer or allocate a bigger one. This is an important
point since we really have no idea how big the thing can get, and we certainly
have no idea how big it is until we find the end.

I also advocate copying only the directive content (not the the start and end
delimiters). This provides a minor code simplification. The old code used the
end delimiter to mark the end of the stream for parsing purposes. Since we
already know where the end of the tag is, there is no sense complicating the
parsing code looking for the ending sequence again.

I also advocate not ever copying any part of the directive again. The tolower'ing
can be done in place. The values can be referenced in place. Every token has
a clear delimiter (white space, "=", "\0", etc.). These delimiters can be converted to 
"\0" and pointers set to reference the bytes in place. During handle_set, for example,
the value gets apr_pstrdup'ed during the apr_table_setn call anyway.

So in summary, once the full tag has been obtained (and the directive bytes counted)
a buffer may be allocated and the directive bytes copied from the buckets into the
buffer. The parser will return pointers into this buffer after having marked the
end of the token with a NULL (tolower'ing as required). If the value must exist
outside of this buffer (i.e. for a set) then the value can be dup'ed as required.
One optimization would be to check if the whole directive is contained in one bucket
and just use that buffer instead of copying at all.

Grammar:
==============================================================================
I am working on a complete documented grammar for this. I will post it when it
is mostly complete. This post is already long enough so I will spare you all for now.

-- 
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it.  Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein

Re: Mod_include design

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Nov 02, 2000 at 03:46:17PM -0500, Bill Stoddard wrote:
> > > But: it definitely depends on how bounded the size of a directive can be.
> I
> > > mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH +
> 200,
> > > then we just put that buffer into the ctx and run with it.
> >
> > I don't see anything in 1.3 indicating that it is *theoretically*
> > bounded to a relatively small number...  For example, consider the cmd
> > attribute on exec.  (How long can a command-line be?  way long...)
> >
> > *Practically* a human isn't going to deal with long (>2048-byte)
> > directives (though machine-generated directives have no such distaste).
> >
> > Overall I think collecting directive bytes into a buffer as we parse is
> > the way to go...
> 
> The uri in "include virtual=uri" constructed dynamically (for example via a
> CGI) could be a quite large.  I see no reason we shouldn't impose (and
> enforce) a reasonably large upper limit on tag length.

Well, it isn't hard to allocate a directive buffer of, say, 500 bytes. If
the user gives us a pathological case, then we simply grow the buffer. No
big deal. 99% of the time, we follow the fast-path with the 500 byte buffer
for the directive (tag+args+whatever)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by Bill Stoddard <st...@raleigh.ibm.com>.
> > But: it definitely depends on how bounded the size of a directive can be.
I
> > mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH +
200,
> > then we just put that buffer into the ctx and run with it.
>
> I don't see anything in 1.3 indicating that it is *theoretically*
> bounded to a relatively small number...  For example, consider the cmd
> attribute on exec.  (How long can a command-line be?  way long...)
>
> *Practically* a human isn't going to deal with long (>2048-byte)
> directives (though machine-generated directives have no such distaste).
>
> Overall I think collecting directive bytes into a buffer as we parse is
> the way to go...

The uri in "include virtual=uri" constructed dynamically (for example via a
CGI) could be a quite large.  I see no reason we shouldn't impose (and
enforce) a reasonably large upper limit on tag length.

Bill


Re: Mod_include design

Posted by Jeff Trawick <tr...@bellsouth.net>.
Greg Stein <gs...@lyra.org> writes:

> It will be *incredibly* easier to parse a directive if we can turn it into a
> contiguous string. Consider that we are going to have to do some level of
> copying to deal with the tag [for a strcmp], and then copy each argument
> into a new buffer, etc. The nature of this stuff is simply that we will
> eventually copy an entire directive anyhow, so let's just do it up front and
> simplify the parsing step.

yep
 
> If our goal is to rebuild mod_include into a *clean* and *maintainable*
> module, then this will be quite handy. The parsing step will be clear and
> quite servicable.

yep

> But: it definitely depends on how bounded the size of a directive can be. I
> mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
> then we just put that buffer into the ctx and run with it.

I don't see anything in 1.3 indicating that it is *theoretically*
bounded to a relatively small number...  For example, consider the cmd
attribute on exec.  (How long can a command-line be?  way long...)

*Practically* a human isn't going to deal with long (>2048-byte)
directives (though machine-generated directives have no such distaste).

Overall I think collecting directive bytes into a buffer as we parse is
the way to go...

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Re: Mod_include design

Posted by Bill Stoddard <st...@raleigh.ibm.com>.
> On Thu, Nov 02, 2000 at 12:44:15PM -0800, rbb@covalent.net wrote:
> >...
> > Greg's comment that we might as well copy as we find the tag, I don't
> > agree with either.  Picture a brigade that looks like:
> >
> > "<" -> "!" -> "-" -> "foobar"
> >
> > This is an invalid tag.  If we copy as we find things, then we will copy
> > things into the buffer that isn't really a tag, and if we use the presence
> > of data in the buffer to siginify a tag, then we have a problem.
> >
> > If we copy the tag once we have found the full tag, then we can actually
> > allocate a buffer of the correct size for the tag that we have.
>
> You are going to do one of two things as you look for the tag:
>
> 1) set aside a brigade
> 2) copy into a buffer
>
> Given that (1) might do a copy, and that you will eventually do (2), then
> you may as well do it as you go.
>
> And, oh, gee, heavens-to-betsy, we find that the tag is invalid. Well, gosh
> darnit. We went and copied four bytes into a buffer and now we need to send
> that all down to the next filter.
>
> Yah, right. Like we need to be worried about those four bytes.
>
>
> Copy the darn thing until you find your tag. It keeps it clean and simple,
> and there is next to zero performance degradation. I don't understand the
> rationale for a bunch of complexity to avoid copying four bytes

Yes, I agree with Greg here. If you see what you think is a tag, assume it IS
a tag and copy it. If you later discover it is not really a tag, then send it
out on the wire. The performance impact is minimal and it is certainly not
worth adding even more complexity to the code to handle this case more
efficiently.

Bill

Bill


Re: Mod_include design

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Nov 02, 2000 at 12:44:15PM -0800, rbb@covalent.net wrote:
>...
> Greg's comment that we might as well copy as we find the tag, I don't
> agree with either.  Picture a brigade that looks like:
> 
> "<" -> "!" -> "-" -> "foobar"
> 
> This is an invalid tag.  If we copy as we find things, then we will copy
> things into the buffer that isn't really a tag, and if we use the presence
> of data in the buffer to siginify a tag, then we have a problem.
> 
> If we copy the tag once we have found the full tag, then we can actually
> allocate a buffer of the correct size for the tag that we have.

You are going to do one of two things as you look for the tag:

1) set aside a brigade
2) copy into a buffer

Given that (1) might do a copy, and that you will eventually do (2), then
you may as well do it as you go.

And, oh, gee, heavens-to-betsy, we find that the tag is invalid. Well, gosh
darnit. We went and copied four bytes into a buffer and now we need to send
that all down to the next filter.

Yah, right. Like we need to be worried about those four bytes.


Copy the darn thing until you find your tag. It keeps it clean and simple,
and there is next to zero performance degradation. I don't understand the
rationale for a bunch of complexity to avoid copying four bytes.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by rb...@covalent.net.
On 2 Nov 2000, Jeff Trawick wrote:

> rbb@covalent.net writes:
> 
> > > But: it definitely depends on how bounded the size of a directive can be. I
> > > mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
> > > then we just put that buffer into the ctx and run with it.
> > 
> > I wouldn't personally put it into the ctx, because this will all be done
> > in a single call to the filter, 
> 
> but we aren't guaranteed that the entire tag is in the same
> brigade...  isn't your reasoning for not putting it in the ctx based
> on a tag not crossing brigade boundaries?

Please read my previous messages again.  I do garauntee that by the time
we are parsing the actual tag, it is in a single brigade.  What I am
saying, is that we absolutely can not actually process the SSI tag until
we have all of it, so if we already have the full tag, then we don't need
the ctx field.

Greg's comment that we might as well copy as we find the tag, I don't
agree with either.  Picture a brigade that looks like:

"<" -> "!" -> "-" -> "foobar"

This is an invalid tag.  If we copy as we find things, then we will copy
things into the buffer that isn't really a tag, and if we use the presence
of data in the buffer to siginify a tag, then we have a problem.

If we copy the tag once we have found the full tag, then we can actually
allocate a buffer of the correct size for the tag that we have.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: Mod_include design

Posted by Jeff Trawick <tr...@bellsouth.net>.
rbb@covalent.net writes:

> > But: it definitely depends on how bounded the size of a directive can be. I
> > mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
> > then we just put that buffer into the ctx and run with it.
> 
> I wouldn't personally put it into the ctx, because this will all be done
> in a single call to the filter, 

but we aren't guaranteed that the entire tag is in the same
brigade...  isn't your reasoning for not putting it in the ctx based
on a tag not crossing brigade boundaries?

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Re: Mod_include design

Posted by rb...@covalent.net.
> But: it definitely depends on how bounded the size of a directive can be. I
> mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
> then we just put that buffer into the ctx and run with it.

I wouldn't personally put it into the ctx, because this will all be done
in a single call to the filter, but yes if the directive length is
bounded, we could copy.  Unfortunately, I don't think the length is
bounded, so I think we need to be able to deal with partial tags in a
bucket.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: Mod_include design

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Nov 02, 2000 at 11:20:59AM -0800, rbb@covalent.net wrote:
> > Ryan, a state machine is how parsers like this work, in our "here is some
> > more text" model. It is required, but you can deny it all you like :-)
> 
> Greg, the problem is where the state machine goes.  I agree that there
> will be _A_ state machine in mod_include.

That was certainly /not/ clear :-)

> I disagree that it should be
> the state machine that Paul initially outlined.  The state machine that he
> outlined was to determine how to split a tag out of the brigade.  This is
> unnecessary.  By just splitting the brigade at any subset of <!-- and
> -->, we can find the tag, and at that point we go into the state machine
> to actually parse the tag.

Sure.

>...
> > [ note that "buffer" could be a set-aside brigade; if you don't memcpy a
> >   directive out of a brigade into a buffer, then the parsing becomes much
> >   more difficult, and more states would be desirable to control and drive
> >   the parsing ]
> 
> I disagree that the copy is necessary, because the code is completely
> localized to one or two functions, but that's fine.

It will be *incredibly* easier to parse a directive if we can turn it into a
contiguous string. Consider that we are going to have to do some level of
copying to deal with the tag [for a strcmp], and then copy each argument
into a new buffer, etc. The nature of this stuff is simply that we will
eventually copy an entire directive anyhow, so let's just do it up front and
simplify the parsing step.

If our goal is to rebuild mod_include into a *clean* and *maintainable*
module, then this will be quite handy. The parsing step will be clear and
quite servicable.

But: it definitely depends on how bounded the size of a directive can be. I
mean, can a directive be 10k long? If it is bounded, say, to MAX_PATH + 200,
then we just put that buffer into the ctx and run with it.

Performance-wise: copying 80 bytes of directive text is meaningless (despite
the fact that we'd end up copying the tag and args anyways).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by Bill Stoddard <st...@raleigh.ibm.com>.
The changes I had to make to get mod_cgissi.c working were really quite
minimal, the biggest being handling a tag spanning across a buffer. The main
work was to maintain state (2 states, in tag and not in tag), and some futzing
with GET_CHAR to work with a buffer rather than a file and indices into that
buffer. You could probably do something very similar with buckets rather than
buffers. In fact, you could probably extend my work to use buckets, not that
this is the best solution...

Bill
----- Original Message -----
From: <rb...@covalent.net>
To: <ne...@apache.org>
Sent: Thursday, November 02, 2000 2:20 PM
Subject: Re: Mod_include design


>
> > Ryan, a state machine is how parsers like this work, in our "here is some
> > more text" model. It is required, but you can deny it all you like :-)
>
> Greg, the problem is where the state machine goes.  I agree that there
> will be _A_ state machine in mod_include.  I disagree that it should be
> the state machine that Paul initially outlined.  The state machine that he
> outlined was to determine how to split a tag out of the brigade.  This is
> unnecessary.  By just splitting the brigade at any subset of <!-- and
> -->, we can find the tag, and at that point we go into the state machine
> to actually parse the tag.
>
> > 1) not parsing a directive [pass all text up to a directive marker]
> > 2) parsing a directive [accumulate until we have a complete directive]
> >
> > These two states will exist, but it is entirely reasonable/possible that
it
> > will not be concretely exposed as a two-state machine. The state could
> > simply be "if anything is in the directive buffer, then we are in state 2;
> > if the buffer is empty, then we are in state 1".
>
> The state machine I am referring to, is the state that tries to find a
> tag.  It is unnecessary to have a full-blown state machine that goes
> across filter invocations, because we can't deal with the tag until we
> have a full tag anyway, so saving the full state in the ctx pointer isn't
> required.
>
> > [ note that "buffer" could be a set-aside brigade; if you don't memcpy a
> >   directive out of a brigade into a buffer, then the parsing becomes much
> >   more difficult, and more states would be desirable to control and drive
> >   the parsing ]
>
> I disagree that the copy is necessary, because the code is completely
> localized to one or two functions, but that's fine.
>
> Ryan
>
______________________________________________________________________________
_
> Ryan Bloom                        rbb@apache.org
> 406 29th St.
> San Francisco, CA 94131
> ----------------------------------------------------------------------------
---
>


Re: Mod_include design

Posted by rb...@covalent.net.
> Ryan, a state machine is how parsers like this work, in our "here is some
> more text" model. It is required, but you can deny it all you like :-)

Greg, the problem is where the state machine goes.  I agree that there
will be _A_ state machine in mod_include.  I disagree that it should be
the state machine that Paul initially outlined.  The state machine that he
outlined was to determine how to split a tag out of the brigade.  This is
unnecessary.  By just splitting the brigade at any subset of <!-- and
-->, we can find the tag, and at that point we go into the state machine
to actually parse the tag.

> 1) not parsing a directive [pass all text up to a directive marker]
> 2) parsing a directive [accumulate until we have a complete directive]
> 
> These two states will exist, but it is entirely reasonable/possible that it
> will not be concretely exposed as a two-state machine. The state could
> simply be "if anything is in the directive buffer, then we are in state 2;
> if the buffer is empty, then we are in state 1".

The state machine I am referring to, is the state that tries to find a
tag.  It is unnecessary to have a full-blown state machine that goes
across filter invocations, because we can't deal with the tag until we
have a full tag anyway, so saving the full state in the ctx pointer isn't
required.

> [ note that "buffer" could be a set-aside brigade; if you don't memcpy a
>   directive out of a brigade into a buffer, then the parsing becomes much
>   more difficult, and more states would be desirable to control and drive
>   the parsing ]

I disagree that the copy is necessary, because the code is completely
localized to one or two functions, but that's fine.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: Mod_include design

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Nov 02, 2000 at 10:48:04AM -0800, rbb@covalent.net wrote:
> I wrote:
> > Paul -- yes, a clean parser is much more desirable. From your initial note,
> > I think you are definitely on the right track: a state machine to record
> > where you are in the parsing algorithm (and lexing if you separate that
> > out),
>...
> 
> You can smack me all you want, but a state machine is still
> wrong.  :-)
>...

Ryan, a state machine is how parsers like this work, in our "here is some
more text" model. It is required, but you can deny it all you like :-)

The exact form of that state machine is going to depend upon the grammar,
and on the model that Paul chooses for representing/parsing the grammar. At
a miminum you have two states:

1) not parsing a directive [pass all text up to a directive marker]
2) parsing a directive [accumulate until we have a complete directive]

These two states will exist, but it is entirely reasonable/possible that it
will not be concretely exposed as a two-state machine. The state could
simply be "if anything is in the directive buffer, then we are in state 2;
if the buffer is empty, then we are in state 1".

[ note that "buffer" could be a set-aside brigade; if you don't memcpy a
  directive out of a brigade into a buffer, then the parsing becomes much
  more difficult, and more states would be desirable to control and drive
  the parsing ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Mod_include design

Posted by rb...@covalent.net.
> Paul -- yes, a clean parser is much more desirable. From your initial note,
> I think you are definitely on the right track: a state machine to record
> where you are in the parsing algorithm (and lexing if you separate that
> out), and storing context into the filter context (at one point, Ryan said
> something about a "static variable" -- I'll smack him for that next time I
> see him (he broke headers because he held context that way); use the context
> field like you intended).

You can smack me all you want, but a state machine is still
wrong.  :-)  If I said static variable, it was a mistake.  What I meant
was a local variable.  I didn't have great connectivity in London after
the conference, so I didn't always re-read before I sent the messages.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------