You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Ross Burton <ro...@mail.com> on 2000/05/09 21:49:07 UTC

[Cocoon 2] Regex matchers :-)

Well, I've bitten the bullet and decided to learn regex's properly.  And
what better way then writing mod_rewrite for Cocoon 2!  :-)

At the moment I have knocked together a simple test class which does the
actual work, using the recently released Jakarta Regex library.  I was about
to turn this into a sitemap translator class, but there is no framework and
no simple interfaces to implement...

Pier - what are the design decisions in this area of the sitemap
(LinkMatcher and LinkTranslator), and could they be made pluggable?  I would
look into this more myself, but I have a computer architecture final exam in
10 days time and I still can't draw the inside of a MIPS processor...  :-(

What I plan to implement:
* matching of URLs and rewriting based on backreferences (completed but not
throughly tested)
* inserting data in the parameters for this request (?)
* client-side redirects, for "out-dated" URLs.  This should really be used
as a compatibility phase as Cocoon 2 based sites go public, as a public URL
space should not change if it is well designed.

Regards,
Ross Burton


Re: [Cocoon 2] Regex matchers :-)

Posted by Stefano Mazzocchi <st...@apache.org>.
Donald Ball wrote:
> 
> On Fri, 12 May 2000, Stefano Mazzocchi wrote:
> 
> > Now, how can this work using full regexp? say we allow the use of
> > pluggable matching with sort of XPointer syntax
> >
> >  <redirect
> >   source="regexp(/htdocs/foo/[a|b|c]/index.html)"
> >   target="/xdocs/*/index"
> >  />
> >
> > What does the star mean? [a|b|c]? I can't see how this can be easier
> > than using wildcards... less verbose, totally, but reducing verbosity
> > doesn't always increase readability, expecially in configurations.
> 
> you would do this:
> 
> <redirect
>  source="regexp(/htdocs/foo/([a|b|c])/index.html)"
>  target="/xdocs/$1/index"
> />
> 
> $1 refers to the first paranthesized match, $2 the second, etc. Have you
> never used perl or general UNIX regexps before? The pattern matching and
> variable referral rules are well understood.

Uh, guess I'm showing my win32 background, aren't I :)

All right, dudes, as I said, I'm -0 on this, which means: if you code it
and integrate it, fine, but I'm not going to write that code :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: [Cocoon 2] Regex matchers :-)

Posted by Donald Ball <ba...@webslingerZ.com>.
On Fri, 12 May 2000, Stefano Mazzocchi wrote:

> Now, how can this work using full regexp? say we allow the use of
> pluggable matching with sort of XPointer syntax
> 
>  <redirect 
>   source="regexp(/htdocs/foo/[a|b|c]/index.html)" 
>   target="/xdocs/*/index"
>  />
> 
> What does the star mean? [a|b|c]? I can't see how this can be easier
> than using wildcards... less verbose, totally, but reducing verbosity
> doesn't always increase readability, expecially in configurations.

you would do this:

<redirect
 source="regexp(/htdocs/foo/([a|b|c])/index.html)"
 target="/xdocs/$1/index"
/>

$1 refers to the first paranthesized match, $2 the second, etc. Have you
never used perl or general UNIX regexps before? The pattern matching and
variable referral rules are well understood.

- donald


Re: [Cocoon 2] Regex matchers :-)

Posted by "Pier P. Fumagalli" <pi...@apache.org>.
Stefano Mazzocchi wrote:
> 
> 1) pluggable matching syntax: +1

agreed...

> 2) redirect capabilities in the sitemap: -0 (maybe useful in non-apache
>    enviornments, but I don't care for Cocoon 2.0)

it was in the first draft of the sitemap...

> 3) xinclude in the sitemap: +1

... hmmmm .... this means separating configurations from the sitemap or
not? let's give it a try...

	pier

-- 
----------------------------------------------------------------------
pier: stable structure erected over water to allow docking of seacraft
<ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
----------------------------------------------------------------------



Re: [Cocoon 2] Regex matchers :-)

Posted by Stefano Mazzocchi <st...@apache.org>.
Ross Burton wrote:

> > I think you touched a very important point I tried not to touch before:
> > site migration. While the goal is to make the sitemap simple and
> > effective, you are right that normally people have to migrate one site
> > into another, possibly doing URI-space rearchitecting without breaking
> > usage and creating floods of broken links.
> > I'm wide open to suggestions on this.
> 
> I think what is needed is a <redirect> tag along with the <process> tags,
> matches patterns in a similar manner to the process tags.  These can rewrite
> the user's URL (just as process does) but throw a client-side redirect (301
> Moved Permanently IIRC) so that bookmarks can be updated.
> 
> Something like:
> 
>     <redirect source="/htdocs/foo/**/*.shtml" target="/foo/**/*.xml"/>
> 
> Although this would be possible using the existing matchers, a regex matcher
> might make it easier.  :-)

I cannot understand how.

The "extended wildcard matching" that Pier implemented allows you to use
the stars as variables and move the location abstracted by the wildcard
into the next expression... while, in theory, this doesn't work all the
time, Pier showed it is possible to make it work, so I'm happy (for
now).

Now, how can this work using full regexp? say we allow the use of
pluggable matching with sort of XPointer syntax

 <redirect 
  source="regexp(/htdocs/foo/[a|b|c]/index.html)" 
  target="/xdocs/*/index"
 />

What does the star mean? [a|b|c]? I can't see how this can be easier
than using wildcards... less verbose, totally, but reducing verbosity
doesn't always increase readability, expecially in configurations.

> Well, that's my idea anyway.  I'm getting a feeling that the <process> etc
> tags might need to be pluggable in the sitemap...

There you go: 1, 2, then N = flexibility syndrome.

I rethought about your issue and I think redirection is _not_ something
that Cocoon should deal with.

Cocoon is, after all, a servlet. A servlet is usually mapped to a
particular location and it is never ran alone, but only in a serving
enviornment which, normally, takes care of its URI space. Cocoon uses
the sitemap to take care of its URI space, which, in general, is a
subset of the serving environment's URI space.

You say that redirection is useful when you have to do tons of it:
usually when doing a URI mapping port. This is _exactly_ the place for
stuff like mod_rewrite and I do not want to include mod_rewriting hacks
into Cocoon2, which was designed specifically to avoid them.
 
> Another thing - any plans on using Xinclude in the sitemap file?  A
> large-ish redirect sitemap fragment could be pretty messy, it would be nice
> to have it in another file.

Even if I don't like <redirect> in the sitemap, I think the use of
sitemap linking is vital, expecially when site administration is
fragmented into more than one group.

So my votes are:

1) pluggable matching syntax: +1
2) redirect capabilities in the sitemap: -0 (maybe useful in non-apache
enviornments, but I don't care for Cocoon 2.0)
3) xinclude in the sitemap: +1

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------


Re: [Cocoon 2] Regex matchers :-)

Posted by "Pier P. Fumagalli" <pi...@apache.org>.
Ross Burton wrote:
> 
> I think what is needed is a <redirect> tag along with the <process> tags,
> matches patterns in a similar manner to the process tags.  These can rewrite
> the user's URL (just as process does) but throw a client-side redirect (301
> Moved Permanently IIRC) so that bookmarks can be updated.
> 
> Something like:
> 
>     <redirect source="/htdocs/foo/**/*.shtml" target="/foo/**/*.xml"/>
> 
> Although this would be possible using the existing matchers, a regex matcher
> might make it easier.  :-)

agreed...

	pier

-- 
----------------------------------------------------------------------
pier: stable structure erected over water to allow docking of seacraft
<ma...@betaversion.org>      <http://www.betaversion.org/~pier/>
----------------------------------------------------------------------



Re: [Cocoon 2] Regex matchers :-)

Posted by Ross Burton <ro...@mail.com>.
> > > Let me ask you: why?
> > Because.  ;-)
> you geek :)

Well, I do have a "Foobar" t-shirt to wear when coding...  :-)

> I think you touched a very important point I tried not to touch before:
> site migration. While the goal is to make the sitemap simple and
> effective, you are right that normally people have to migrate one site
> into another, possibly doing URI-space rearchitecting without breaking
> usage and creating floods of broken links.
> I'm wide open to suggestions on this.

I think what is needed is a <redirect> tag along with the <process> tags,
matches patterns in a similar manner to the process tags.  These can rewrite
the user's URL (just as process does) but throw a client-side redirect (301
Moved Permanently IIRC) so that bookmarks can be updated.

Something like:

    <redirect source="/htdocs/foo/**/*.shtml" target="/foo/**/*.xml"/>

Although this would be possible using the existing matchers, a regex matcher
might make it easier.  :-)

Well, that's my idea anyway.  I'm getting a feeling that the <process> etc
tags might need to be pluggable in the sitemap...

Another thing - any plans on using Xinclude in the sitemap file?  A
large-ish redirect sitemap fragment could be pretty messy, it would be nice
to have it in another file.

Just my tuppence,
Ross Burton



Re: [Cocoon 2] Regex matchers :-)

Posted by Donald Ball <ba...@webslingerZ.com>.
On Thu, 11 May 2000, Stefano Mazzocchi wrote:

> Ross Burton wrote:
> > 
> > Take the (hopefully soon) common scenario of a traditional HTML web site
> > which has a rather horrible structure to it.  They kick out the old server
> > and replace all of the content with XML in Cocoon 2 and a structured
> > extendable URI space.  However, many people have links into the old URI
> > space which need to be maintained.  So Cocoon 2 should accept an old URI,
> > and send a redirect to the client, so that they go to the correct new page
> > and are informed of the new URL (the location changes).  I think that this
> > needs to be addressed in the sitemap somehow, maybe with a <redirect> block
> > along with the <process> blocks.
> > 
> > Hmm....
> 
> Hmmmm, I see your point...
> 
> Anyone else about this?
> 
> I think you touched a very important point I tried not to touch before:
> site migration. While the goal is to make the sitemap simple and
> effective, you are right that normally people have to migrate one site
> into another, possibly doing URI-space rearchitecting without breaking
> usage and creating floods of broken links.
> 
> I'm wide open to suggestions on this.

We'll need regexps for this for sure. :) I think that yes, there should be
a redirect block that matches an incoming request and can issue a new
request into the sitemap on behalf of the original. E.g. so that I can do
something like this:

<process uri="/b-ball/(.*)\.html">
 <redirect uri="/basketball/$1"/>
</process>

- donald


Re: [Cocoon 2] Regex matchers :-)

Posted by Stefano Mazzocchi <st...@apache.org>.
Ross Burton wrote:
> 
> > > Well, I've bitten the bullet and decided to learn regex's properly.  And
> > > what better way then writing mod_rewrite for Cocoon 2!  :-)
> >
> > <horribly-face-devastated-by-fear belongs-to="stefano">
> > AAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!!!!!!!!!!
> > </horribly-face-devastated-by-fear>
> 
> :-)
> 
> I knew you'd like this.  I'm not suggesting that this should be part of
> Cocoon 2, it's just that somebody, somewhere, might need the power.  My
> other excuse is that this is a good test of Cocoon 2 sittemap and regex
> programming skills...

ok
 
> > > * matching of URLs and rewriting based on backreferences (completed but
> not
> > > throughly tested)
> > Let me ask you: why?
> 
> Because.  ;-)

you geek :)
 
> > My humble opinion is: if you need full regexp power to map your
> > resources to your URI space, you have a problem in your URI space.
> > I'd be happy to be proven wrong on this, but so far, nobody proposed me
> > a URI space that is not addressable with *,?,** wildcards.
> > On the other hand, I agree that no matter how hard I try to enforce my
> > vision, having this matching syntax fixed could create forking
> > frictions....
> > So, I'm ok to provide pluggable hooks for syntax matching engines, but
> > about shipping regexp matching engines with Cocoon, well, that's another
> > story.
> 
> I've no problems with not seeing this code ever enter the Cocoon 2 CVS, it's
> just that pluggable matching engines seem like a good idea, even if it
> requires a restart of Cocoon to take effect.

ok, great, just making sure...
 
> > > * inserting data in the parameters for this request (?)
> >
> > no way, this is mod_rewrite and Cocoon doesn't need this kind of hacking
> > on the request since it has a full blown architecture to deal with this.
> > The sitemap does mapping, nothing else.
> 
> The question mark was to indicate that this is a "not sure" item - I'll
> forget it.

good
 
> > > * client-side redirects, for "out-dated" URLs.  This should really be
> used
> > > as a compatibility phase as Cocoon 2 based sites go public, as a public
> URL
> > > space should not change if it is well designed.
> > Can you elaborate more on this?
> 
> Take the (hopefully soon) common scenario of a traditional HTML web site
> which has a rather horrible structure to it.  They kick out the old server
> and replace all of the content with XML in Cocoon 2 and a structured
> extendable URI space.  However, many people have links into the old URI
> space which need to be maintained.  So Cocoon 2 should accept an old URI,
> and send a redirect to the client, so that they go to the correct new page
> and are informed of the new URL (the location changes).  I think that this
> needs to be addressed in the sitemap somehow, maybe with a <redirect> block
> along with the <process> blocks.
> 
> Hmm....

Hmmmm, I see your point...

Anyone else about this?

I think you touched a very important point I tried not to touch before:
site migration. While the goal is to make the sitemap simple and
effective, you are right that normally people have to migrate one site
into another, possibly doing URI-space rearchitecting without breaking
usage and creating floods of broken links.

I'm wide open to suggestions on this.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: [Cocoon 2] Regex matchers :-)

Posted by Ross Burton <ro...@mail.com>.
> > Well, I've bitten the bullet and decided to learn regex's properly.  And
> > what better way then writing mod_rewrite for Cocoon 2!  :-)
>
> <horribly-face-devastated-by-fear belongs-to="stefano">
> AAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!!!!!!!!!!
> </horribly-face-devastated-by-fear>

:-)

I knew you'd like this.  I'm not suggesting that this should be part of
Cocoon 2, it's just that somebody, somewhere, might need the power.  My
other excuse is that this is a good test of Cocoon 2 sittemap and regex
programming skills...

> > * matching of URLs and rewriting based on backreferences (completed but
not
> > throughly tested)
> Let me ask you: why?

Because.  ;-)

> My humble opinion is: if you need full regexp power to map your
> resources to your URI space, you have a problem in your URI space.
> I'd be happy to be proven wrong on this, but so far, nobody proposed me
> a URI space that is not addressable with *,?,** wildcards.
> On the other hand, I agree that no matter how hard I try to enforce my
> vision, having this matching syntax fixed could create forking
> frictions....
> So, I'm ok to provide pluggable hooks for syntax matching engines, but
> about shipping regexp matching engines with Cocoon, well, that's another
> story.

I've no problems with not seeing this code ever enter the Cocoon 2 CVS, it's
just that pluggable matching engines seem like a good idea, even if it
requires a restart of Cocoon to take effect.

> > * inserting data in the parameters for this request (?)
>
> no way, this is mod_rewrite and Cocoon doesn't need this kind of hacking
> on the request since it has a full blown architecture to deal with this.
> The sitemap does mapping, nothing else.

The question mark was to indicate that this is a "not sure" item - I'll
forget it.

> > * client-side redirects, for "out-dated" URLs.  This should really be
used
> > as a compatibility phase as Cocoon 2 based sites go public, as a public
URL
> > space should not change if it is well designed.
> Can you elaborate more on this?

Take the (hopefully soon) common scenario of a traditional HTML web site
which has a rather horrible structure to it.  They kick out the old server
and replace all of the content with XML in Cocoon 2 and a structured
extendable URI space.  However, many people have links into the old URI
space which need to be maintained.  So Cocoon 2 should accept an old URI,
and send a redirect to the client, so that they go to the correct new page
and are informed of the new URL (the location changes).  I think that this
needs to be addressed in the sitemap somehow, maybe with a <redirect> block
along with the <process> blocks.

Hmm....

Regards,
Ross Burton



Re: [Cocoon 2] Regex matchers :-)

Posted by Stefano Mazzocchi <st...@apache.org>.
Ross Burton wrote:
> 
> Well, I've bitten the bullet and decided to learn regex's properly.  And
> what better way then writing mod_rewrite for Cocoon 2!  :-)

<horribly-face-devastated-by-fear belongs-to="stefano">
AAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!!!!!!!!!!
</horribly-face-devastated-by-fear>
 
> At the moment I have knocked together a simple test class which does the
> actual work, using the recently released Jakarta Regex library.  I was about
> to turn this into a sitemap translator class, but there is no framework and
> no simple interfaces to implement...

...
 
> Pier - what are the design decisions in this area of the sitemap
> (LinkMatcher and LinkTranslator), and could they be made pluggable?  

When dealing with the sitemap, Pier and I _knew_ people would have asked
for this: full regexp in the sitemap. I personally find this a disaster.

> I would
> look into this more myself, but I have a computer architecture final exam in
> 10 days time and I still can't draw the inside of a MIPS processor...  :-(

Sounds a familiar problem to me :(
 
> What I plan to implement:
> * matching of URLs and rewriting based on backreferences (completed but not
> throughly tested)

Let me ask you: why?

My humble opinion is: if you need full regexp power to map your
resources to your URI space, you have a problem in your URI space.

I'd be happy to be proven wrong on this, but so far, nobody proposed me
a URI space that is not addressable with *,?,** wildcards.

On the other hand, I agree that no matter how hard I try to enforce my
vision, having this matching syntax fixed could create forking
frictions....

So, I'm ok to provide pluggable hooks for syntax matching engines, but
about shipping regexp matching engines with Cocoon, well, that's another
story.

> * inserting data in the parameters for this request (?)

no way, this is mod_rewrite and Cocoon doesn't need this kind of hacking
on the request since it has a full blown architecture to deal with this.
The sitemap does mapping, nothing else.

> * client-side redirects, for "out-dated" URLs.  This should really be used
> as a compatibility phase as Cocoon 2 based sites go public, as a public URL
> space should not change if it is well designed.

Can you elaborate more on this?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------