You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rich Bowen <rb...@rcbowen.com> on 2008/02/11 16:15:03 UTC

Re: mod_substitute docs

On Oct 22, 2007, at 14:07, Jim Jagielski wrote:

>> I'm a little concerned if the module ships /without/ the 'f'latten  
>> flag
>> triggered by default.  I'd rather the module offered the inverse  
>> option.
>>
>> This is truly a 'bug' users won't understand (my text contains  
>> 'jimfoojag',
>> why isn't it translating to 'jimbarjag'?!?)
>>
>
> It would... it's only an issue really if there are subsequent
> substitutions *and* a previous substitution would result in
> a string that would also be affected by a latter one.

Sorry to dredge up an ancient thread, but I'm finally using  
mod_substitute in production, and I'm finding this to be much more  
complex than the contrived example.

Are there cases in which it might just happen to work without  
flattening? And, if so, is it possible that it will work sometimes  
and not other times, or is it going to be completely consistent from  
one pass to the next?

I've gotten some answers to these on IRC, so it may be that the light  
is beginning to dawn.

The actual example that I'm using is here : http://apache.pastebin.ca/ 
899919
Slightly more complex than just s/foo/bar/

The background is probably too stupid to go into - using TinyMCE to  
edit content, and IE7 unwilling to honor align="left" in certain  
contexts.

--
Rich Bowen
rbowen@rcbowen.com




Re: mod_substitute docs

Posted by Rich Bowen <rb...@rcbowen.com>.
On Feb 14, 2008, at 07:53, Vincent Bray wrote:

> On 12/02/2008, Jim Jagielski <ji...@jagunet.com> wrote:
>>  So if you have substituted content that you think will be re- 
>> substituted
>>  by another rule, then you should flatten. If they are one-shots, or
>>  self contained, or in "no way" could result in overlaps, then  
>> flattening
>>  isn't required. :)
>
> In your example, which of the two Substitute directives needs to have
> the 'f', or is it both?

It would be the FIRST one that ran. However, when I have a Substitute  
in one <Directory> scope and another in another, I'm having some  
difficulty figuring out which one actually fires first.

--
"There are two kinds of light--the glow that illuminates, and the  
glare that obscures."
James Thurber



Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 9:48 AM, Jim Jagielski wrote:

> In the words of the Blue Raja, "What the fork"
>
> Done in r627764  :)
>

Should we consider this for 2.2 backport?

Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
In the words of the Blue Raja, "What the fork"

Done in r627764  :)

Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 9:09 AM, Rich Bowen wrote:

>
> On Feb 14, 2008, at 09:07, Jim Jagielski wrote:
>
>> It's not even just that "there is a second substitution" but rather
>> that the pattern/regex being looked for could possibly be the
>> result of a previous substition.
>>
>> If you have to subs like
>>
>>      s/foo/bar/
>>      s/plum/apple/
>>
>> there is no way one could affect the other (or depend
>> on the other) so flattening is not required.
>
> Yes, I understand that, but I figured it would probably take longer  
> to determine, on an arbitrary case, whether there was such an  
> overlap, than to perform the flattening.
>

So we adjust the default to flatten and update the docs to
say something like "if it is known in advance that there is no
overlap or potential overlap between substitutions (that the
result of one substitution cannot match the pattern or regex
for a later one), then substantial speed and memory utilization
improvements can be realized by using the 'quick' method" ??

My concerns are: (1) the default would be slower and more of a
memory hog, which seems opposite of our normal expectations
(2) the change (in default behavior as well as "dropping"
'f' (ignoring it actually) and adding 'q') would happen
between patch-level releases.

Both, of course, could be fixed by good docs, but than again,
so could, maybe, the current situation we're trying to fix...

I'm +0 either way, but if we decide to change, I'll work up the
patch to change the default and add 'q'

Re: mod_substitute docs

Posted by Rich Bowen <rb...@rcbowen.com>.
On Feb 14, 2008, at 09:07, Jim Jagielski wrote:

> It's not even just that "there is a second substitution" but rather
> that the pattern/regex being looked for could possibly be the
> result of a previous substition.
>
> If you have to subs like
>
>      s/foo/bar/
>      s/plum/apple/
>
> there is no way one could affect the other (or depend
> on the other) so flattening is not required.

Yes, I understand that, but I figured it would probably take longer  
to determine, on an arbitrary case, whether there was such an  
overlap, than to perform the flattening.

--
"She had a pretty gift for quotation, which is a serviceable  
substitute for wit."
W. Somerset Maugham



Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 9:00 AM, Rich Bowen wrote:

>
> On Feb 14, 2008, at 08:50, Eric Covener wrote:
>
>>
>> Anyone else +1 for flatten-as-default and providing an option such  
>> as:
>>
>> 'q'uick: Substitute more efficiently, but further substitutions will
>> not be able match across the boundaries of this substitutions
>> replacement string.
>
> Yes in concept. Don't know how much this affects performance - I had  
> the impression that it would make things slower. Would it be  
> possible to flatten only in the event that a second substitute is in  
> fact called during the course of the same request?
>

It's not even just that "there is a second substitution" but rather
that the pattern/regex being looked for could possibly be the
result of a previous substition.

If you have to subs like

      s/foo/bar/
      s/plum/apple/

there is no way one could affect the other (or depend
on the other) so flattening is not required.

> I don't want to make things slower simply because the docs are  
> confusing. We can make the docs less confusing.
>
> --
> Just because your voice reaches halfway around the world doesn't  
> mean you are wiser than when it reached only to the end of the bar.
> Edward R. Murrow
>


Re: mod_substitute docs

Posted by Rich Bowen <rb...@rcbowen.com>.
On Feb 14, 2008, at 08:50, Eric Covener wrote:

>
> Anyone else +1 for flatten-as-default and providing an option such as:
>
> 'q'uick: Substitute more efficiently, but further substitutions will
> not be able match across the boundaries of this substitutions
> replacement string.

Yes in concept. Don't know how much this affects performance - I had  
the impression that it would make things slower. Would it be possible  
to flatten only in the event that a second substitute is in fact  
called during the course of the same request?

I don't want to make things slower simply because the docs are  
confusing. We can make the docs less confusing.

--
Just because your voice reaches halfway around the world doesn't mean  
you are wiser than when it reached only to the end of the bar.
Edward R. Murrow


Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 8:50 AM, Eric Covener wrote:

> On Thu, Feb 14, 2008 at 8:05 AM, Jim Jagielski <ji...@jagunet.com>  
> wrote:
>>
>>
>> On Feb 14, 2008, at 7:53 AM, Vincent Bray wrote:
>>
>>> On 12/02/2008, Jim Jagielski <ji...@jagunet.com> wrote:
>>>> So if you have substituted content that you think will be re-
>>>> substituted
>>>> by another rule, then you should flatten. If they are one-shots, or
>>>> self contained, or in "no way" could result in overlaps, then
>>>> flattening
>>>> isn't required. :)
>>>
>>> In your example, which of the two Substitute directives needs to  
>>> have
>>> the 'f', or is it both?
>>>
>>
>> The first, since it needs to ensure that the just-replaced content
>> (and all before/after it) finds themselves back into a single bucket.
>>
>>
>
> Anyone else +1 for flatten-as-default and providing an option such as:
>
> 'q'uick: Substitute more efficiently, but further substitutions will
> not be able match across the boundaries of this substitutions
> replacement string.
>

I had thought we had debated this and figured out that having
the default be slow and a memory hog was likely not a good idea;
that most people would not require it since they would be
doing things like s/text/htm/ and s/168.0.0.1/www.example/com/,
etc (that is, the substitutions wouldn't overlap)... For
those who did, they would need to suffer the performance
hit.


Re: mod_substitute docs

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
William A. Rowe, Jr. wrote:
> Eric Covener wrote:
>>
>> Anyone else +1 for flatten-as-default and providing an option such as:
>>
>> 'q'uick: Substitute more efficiently, but further substitutions will
>> not be able match across the boundaries of this substitutions
>> replacement string.
> 
> +1, anything requiring this depth-of-understanding is prone to be a 
> problem.
> 
> The optimization would have to know that the smallest pattern-match space
> is some (x) characters long, and unconditionally do a set-aside of (x-1)
> characters of trailing data from each bucket, to match across calls.
> 
> Flatten is so much simpler for the casual user.

Actually, I have a rather sick thought...

...if pcre would accept a callback for "read", where we could compose only
as large a pattern buffer as it desired, or feed char-by-char, it would
be possible to read across buckets.  Something we aught to explore.



Re: mod_substitute docs

Posted by Jeff McAdams <je...@iglou.com>.
Jim Jagielski wrote:
> 
> On Feb 14, 2008, at 9:06 AM, William A. Rowe, Jr. wrote:
> 
>> Eric Covener wrote:
>>> Anyone else +1 for flatten-as-default and providing an option such as:
>>> 'q'uick: Substitute more efficiently, but further substitutions will
>>> not be able match across the boundaries of this substitutions
>>> replacement string.
>>
>> +1, anything requiring this depth-of-understanding is prone to be a
>> problem.
>>
>> The optimization would have to know that the smallest pattern-match space
>> is some (x) characters long, and unconditionally do a set-aside of (x-1)
>> characters of trailing data from each bucket, to match across calls.
>>
>> Flatten is so much simpler for the casual user.
>>
> 
> And sooo much slower (and memory extensive)... I submit that
> the vast majority of people would NOT require it and thus
> should not be subjected to the overhead.
> 
> But that's just my own 2c :)

Speaking as a "mere" end user.  :)

I would be terribly, extremely, frustrated to add a second substitution
and have it not work, only to find out later on that I needed to set
some magic flag that affects some operation down in the deep bowels of
the code that has no visible change to the operation (not counting
performance...the actual result would be the same, if taking longer and
hitting more system resources).

The principle of least surprise says to make flattening the default and
provide a flag for optimization that might break things...because then
at least the user is reading *some* sort of documentation specifically
about the flattening to be able to know about the flag and hopefully
that documentation will include the caveat that it could break things.

I guess that's my 2 yen (roughly the same value, but from a different place)
-- 
Jeff McAdams
"They that can give up essential liberty to obtain a
little temporary safety deserve neither liberty nor safety."
                                       -- Benjamin Franklin


Re: mod_substitute docs

Posted by Plüm, Rüdiger, VF-Group <ru...@vodafone.com>.
 

> -----Ursprüngliche Nachricht-----
> Von: Jim Jagielski 
> Gesendet: Donnerstag, 14. Februar 2008 15:12
> An: dev@httpd.apache.org
> Betreff: Re: mod_substitute docs
> 
> 
> On Feb 14, 2008, at 9:06 AM, William A. Rowe, Jr. wrote:
> 
> > Eric Covener wrote:
> >> Anyone else +1 for flatten-as-default and providing an 
> option such  
> >> as:
> >> 'q'uick: Substitute more efficiently, but further 
> substitutions will
> >> not be able match across the boundaries of this substitutions
> >> replacement string.
> >
> > +1, anything requiring this depth-of-understanding is prone 
> to be a  
> > problem.
> >
> > The optimization would have to know that the smallest 
> pattern-match  
> > space
> > is some (x) characters long, and unconditionally do a set-aside of  
> > (x-1)
> > characters of trailing data from each bucket, to match across calls.
> >
> > Flatten is so much simpler for the casual user.
> >
> 
> And sooo much slower (and memory extensive)... I submit that
> the vast majority of people would NOT require it and thus
> should not be subjected to the overhead.
> 
> But that's just my own 2c :)

+1 to this. Do not make (f)latten the default for the reasons stated by Jim.

Regards

Rüdiger


Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 9:06 AM, William A. Rowe, Jr. wrote:

> Eric Covener wrote:
>> Anyone else +1 for flatten-as-default and providing an option such  
>> as:
>> 'q'uick: Substitute more efficiently, but further substitutions will
>> not be able match across the boundaries of this substitutions
>> replacement string.
>
> +1, anything requiring this depth-of-understanding is prone to be a  
> problem.
>
> The optimization would have to know that the smallest pattern-match  
> space
> is some (x) characters long, and unconditionally do a set-aside of  
> (x-1)
> characters of trailing data from each bucket, to match across calls.
>
> Flatten is so much simpler for the casual user.
>

And sooo much slower (and memory extensive)... I submit that
the vast majority of people would NOT require it and thus
should not be subjected to the overhead.

But that's just my own 2c :)


Re: mod_substitute docs

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Eric Covener wrote:
> 
> Anyone else +1 for flatten-as-default and providing an option such as:
> 
> 'q'uick: Substitute more efficiently, but further substitutions will
> not be able match across the boundaries of this substitutions
> replacement string.

+1, anything requiring this depth-of-understanding is prone to be a problem.

The optimization would have to know that the smallest pattern-match space
is some (x) characters long, and unconditionally do a set-aside of (x-1)
characters of trailing data from each bucket, to match across calls.

Flatten is so much simpler for the casual user.

Bill

Re: mod_substitute docs

Posted by Eric Covener <co...@gmail.com>.
On Thu, Feb 14, 2008 at 8:05 AM, Jim Jagielski <ji...@jagunet.com> wrote:
>
>
>  On Feb 14, 2008, at 7:53 AM, Vincent Bray wrote:
>
>  > On 12/02/2008, Jim Jagielski <ji...@jagunet.com> wrote:
>  >> So if you have substituted content that you think will be re-
>  >> substituted
>  >> by another rule, then you should flatten. If they are one-shots, or
>  >> self contained, or in "no way" could result in overlaps, then
>  >> flattening
>  >> isn't required. :)
>  >
>  > In your example, which of the two Substitute directives needs to have
>  > the 'f', or is it both?
>  >
>
>  The first, since it needs to ensure that the just-replaced content
>  (and all before/after it) finds themselves back into a single bucket.
>
>

Anyone else +1 for flatten-as-default and providing an option such as:

'q'uick: Substitute more efficiently, but further substitutions will
not be able match across the boundaries of this substitutions
replacement string.

-- 
Eric Covener
covener@gmail.com

Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 14, 2008, at 7:53 AM, Vincent Bray wrote:

> On 12/02/2008, Jim Jagielski <ji...@jagunet.com> wrote:
>> So if you have substituted content that you think will be re- 
>> substituted
>> by another rule, then you should flatten. If they are one-shots, or
>> self contained, or in "no way" could result in overlaps, then  
>> flattening
>> isn't required. :)
>
> In your example, which of the two Substitute directives needs to have
> the 'f', or is it both?
>

The first, since it needs to ensure that the just-replaced content
(and all before/after it) finds themselves back into a single bucket.


Re: mod_substitute docs

Posted by Vincent Bray <no...@gmail.com>.
On 12/02/2008, Jim Jagielski <ji...@jagunet.com> wrote:
>  So if you have substituted content that you think will be re-substituted
>  by another rule, then you should flatten. If they are one-shots, or
>  self contained, or in "no way" could result in overlaps, then flattening
>  isn't required. :)

In your example, which of the two Substitute directives needs to have
the 'f', or is it both?

-- 
noodl

Re: mod_substitute docs

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Feb 11, 2008, at 10:15 AM, Rich Bowen wrote:

>
> On Oct 22, 2007, at 14:07, Jim Jagielski wrote:
>
>>> I'm a little concerned if the module ships /without/ the 'f'latten  
>>> flag
>>> triggered by default.  I'd rather the module offered the inverse  
>>> option.
>>>
>>> This is truly a 'bug' users won't understand (my text contains  
>>> 'jimfoojag',
>>> why isn't it translating to 'jimbarjag'?!?)
>>>
>>
>> It would... it's only an issue really if there are subsequent
>> substitutions *and* a previous substitution would result in
>> a string that would also be affected by a latter one.
>
> Sorry to dredge up an ancient thread, but I'm finally using  
> mod_substitute in production, and I'm finding this to be much more  
> complex than the contrived example.
>
> Are there cases in which it might just happen to work without  
> flattening? And, if so, is it possible that it will work sometimes  
> and not other times, or is it going to be completely consistent from  
> one pass to the next?
>
> I've gotten some answers to these on IRC, so it may be that the  
> light is beginning to dawn.
>
> The actual example that I'm using is here : http://apache.pastebin.ca/899919
> Slightly more complex than just s/foo/bar/
>
> The background is probably too stupid to go into - using TinyMCE to  
> edit content, and IE7 unwilling to honor align="left" in certain  
> contexts.
>

IN general, think of it this way. mod_substitute generally works on
content within a bucket. Say we have a bucket that contains the
text "Rich Bowen wants a pony dude" and we have the directive to
change 'pony' to 'puma'. Without flattening, mod_substitute turns
this single bucket into 3, the 1st containing "Rich Bowen wants a ",
the 2nd containing "puma" and the third containing " dude".

If there are no more substitutions, then  we're golden and can move on.
But NOW say that you also have another sub that wants to turn 'a puma'
into 'an aardvark'... Since it's operating on buckets, even though
the *stream* contains 'a pony', it's not in a single bucket, in
which case *this* substitution will never happen.

So if you have substituted content that you think will be re-substituted
by another rule, then you should flatten. If they are one-shots, or
self contained, or in "no way" could result in overlaps, then flattening
isn't required. :)