You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2000/06/20 15:52:05 UTC

[C2] Sitemap revised again

You people are going to hate me for this, but I think I solved the
problems in the current sitemap and after careful thinking, there are a
few things to change again.

1) resource loading model

We want sitemaps to be cascaded. This is a fundamental feature for site
management scalability. We also know documents and sitemaps can be
stored in very different locations, which range from file systems, web
servers, ftp servers, compressed archives, CVS servers, XML databases.

Careful: resource loading is _not_ the equivalent of "Generator"
modularity. A resource is a stream of chars that happens to be xml
well-formed. A generator is an adaptor between something and a SAX
event. A "parser" is a specific generator that does XML parsing, but
using resource loading abstraction, we are able to use the exact same
code to load a document from the file system, a remote (dynamic) URI or
even a CVS server.

Note: in some cases, the generators might be able to skip the parsing
stage, thus requiring a special generation logic to hook to SAX-aware
output events from XML storage repositories... for example Prowler might
be able to generate directly SAX events in response of an XPath query
without the need for Xml serialization and parsing.

Anyway, today we follow the namespace pattern

 <generator type="parser" src:file="c:\program files\mydocs\file.xml"/>

indicatest that the "jar" protocol should be used to get the resource.

Now, I propose to use the java.net.URL method to do this.

 <generator type="parser" src="file://c:\program
files\mydoes\file.xml"/>

Why? mainly to allow resource loading abstraction to be independent of
the attribute/element schema, for example allowing some RDF-equivalent
syntax such as

 <generator type="parser">
  <param name="src" value="file://c:\program files\mydoes\file.xml"/>
 </generator>

what allows to use resource loading abstraction also for components such
as

 <chooser type="auth" src="file:///home/www/choosers/auth.class">
  <param name="permission-file" value="./permissions/default.xml"/>
 </chooser>




2) RDF model for attributes

the <param> element has a special meaning, just like some RDF elements.

 <xxx yyy="zzz" aaa="bbb"/>

will be equivalent to

 <xxx>
  <param name="yyy" value="zzz"/>
  <param name="aaa" value="bbb"/>
 </xxx>

this allows you to use whatever verbosity you like.

[XXX: should we use RDF directly?]




3) Matchers and Choosers

Ok, this is hard and tricky, so stick with me and don't loose yourself
in the declarative forest.


We agreed the sitemap needs a complete boolean conditional model. This
was identified into

 <choose>
  <when test="">
  </when>
  <otherwise>
  </otherwise>
 </choose>

following XSLT's.

The natural problem is: what do we put in the "test" attribute? XSLT
places XPath. Always and only XPath. 

Note: XPath is not extensible, only XSLT is. (in fact, both XQL and
XPointer can be seen as XPath extentions, but each extention requires a
new specification since XPath doesn't describe an extensible model)

The use of a special "XPath"-equivalent for sitemaps was proposed by
Donald. While I'm not against it in principle, I find it too weak for
the planned needs.

We both failed to see the sitemap model is already powerful enough to
make both sides happy:

 <choose type="donald's-xpath-chooser">
  <when test="/cookie[user='stefano']">
   ...
  </when>
 </choose>

or

 <choose type="fancy-cookie-chooser">
  <when test="user is stefano">
  </when>
 </choose>

while "chooser" pluggability allows you to do whatever you want with the
"test" attribute.

So, the interface for Chooser becomes

 public interface Chooser implements Component {
   public boolean evaluate(String test, ...);
 }

where "..." identifies all the objects the chooser will need to evaluate
the choice.

Ok, you say, but this is going to be slow! Right, so here we keep going

 public interface CompiledChooser implements Component {
   public boolean evaluate(...);
 }

then

 public interface ChooserFactory {
   public String generateCode(String test);
 }

which allows us to compile choosers into classes that are indexed by the
"test" string hash and executed to avoid runtime parsing of the "test"
string. (of course, this is required only for very complex operation
like Donald's XPath alternative.


So far for the "Chooser" part.


There were naming discussions between "Choosers" and "Matchers"... I
think there is no need for this: they are _different_ things. Different
models.

Let's see why: in the original sitemap proposal Pier and I wrote we had

 <process uri="/docs/*">

 </process>

this follows a declarative matching model, just like xsl:template does.
But it adds variable percolation of the URI tokens generated by the
wildcard pattern. This has no equivalent in XSLT. (xsl:value-of is
similar but not equal to this and much more verbose and general).

It was suggested that using uri-based declaration may be limiting. At
the same time, a better conditional model was asked for.

I merged the two since it seemed to be the good thing to do.

The problem is that choosers respond with booleans, matchers response
with maps.

Moreover, the nice thing about the xslt declarative model (which allows
very nice work parallelization even inside the same sitemap) was removed
with a more procedural view of nested <choose> elements.

So, I propose to introduce -both- Chooosers and Matchers, the first
responsible to decide if the condition is satistifed or not at runtime,
the second to understand if the current status "matches" a given pattern
and, if yes, the pattern is used to percolate information thru the
pipeline.

 <match pattern="/docs/*">
  <generator src="./docs/*.xml"
  <choose type="load">
   <when test="load is high">
    <filter src="./stylesheets/high-load/*.xsl"/>
   </when>
   <otherwise>
    <filter src="./stylesheets/default/*.xsl"/>
   </otherwise>
  </choose>
 </match>

[NOTE: when the "type" of the component is not specified a default
component will be used, the <component> section will allow each category
to define its default value that can be omitted to reduce verbosity]

but also allows weird things like

 <match type="user-agent" pattern="Mozilla */* *">
  <choose type="math">
   <when test="$2 > 5">
    ...
   </when>
  </choose>
 </match>

(not that I suggest to do this, but it proves the concept)

Unlike xsl:templates, matchers can be nested

 <match type="remote-ip" pattern="192.238.*.*">
  <match pattern="/docs/*">
   ...
  </match>
 </match>

the matcher interface is similar to the Chooser one but different

 public interface Matcher implements Component {
   public Map match(String pattern, ...);
 }

and can follow the same compilable model for Choosers

 public interface CompiledMatcher implements Component {
   public Map match(...);
 }

and

 public interface MatcherFactory {
   public String generateCode(String pattern);
 }




The two models, just like it happens for XSLT, give you enough
flexibility to perform whatever conditional sequence you need, and given
you complete programmability thru the use of pluggable conditional
components and don't pose a severe performance limitation given the
ability to compile the single matching/choosing patterns/tests.


I believe this solves all the problems encountered so far and reuses
much of the good patterns that XSLT proposed, while removing the
limitations and rought edges that XSLT has in some areas.

While incredibly flexible, I don't think this proposal is based on more
flexibility than it requires... but keep in mind this, even if
finalized, will be the sitemap version 1.0 and there will be other
versions in the future.

Anyway, for what I'm able to see now, I think this thing rocks the
party.

Let me know your comments

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------