You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2000/05/28 02:09:34 UTC

Pipeline conditional model

The sitemap research is coming along pretty good but many of you
outlined how badly thought the "matching" idea was.

Creating something like a sitemap is a dynamic equilibrium between
useful-ness and flexibility syndrome. The use of "matching components"
were introduced to:

1) remove logic code from within the sitemap
2) allow pipelines to be choosen depending on different parameters
rather than URI request

It turns out that the original proposed model

 <process uri="...">
  <matcher .../>
  ...(pipeline)...
 </process>

is limited because doesn't allow matching itself to be componentized.
For example, there is no notion of boolean algebra in matchers, but
doing matching based on A AND B, would require the creation of another
matcher C which includes both logics from A and B.

The above may be a good solution for programmers, but it's definately
not a good solution if we want to be future-compatible with
sitemap-authoring tools.

I think a pipeline conditional model should be componentizable just like
the pipeline itself.

To do this, one possible solution is to introduce boolean elements that
operate on these matching components. For example,

 <process uri="...">
  <AND>
   <matcher type="A"/>
   <matcher type="B"/>
   <OR>
    <marcher type="C"/>
   </OR>
  </AND>
  ...(pipeline)...
 </process>

which is the logical equivalent of (using Java syntax)

  ((A && B) || C)

and reminds of inverse polish notation.

                   ------------- o --------------

I spent several hours in front of my whiteboard yesterday night and
started questioning the whole idea of "matchers" since it is evident
that forcing the use of marked-up inverse polish notation for booleans
is not exactly "user friendly" for non-programmers.

Mind you: user friendly-ness is not a direct goal of the sitemap,
specially because there is no engineering definition of user-friendly
since, like many mail signatures evangelize, it depends on who you
choose as friends.

But unlike regexp which are complex but can't be avoided for complex
string matching and searching, such a notation seems to me totally
ackword and useless. A lot like a hack than a real design.

In my experience, when your design seems like a hack after lots of
thinking, you have probably made a mistake very soon in your decisions.
So I took two steps back and analyzed my reasoning again.

                   ------------ o --------------

Ok, it turns out that matching was created to simplify sitemap
administration. How? well, the idea was to keep the pipelines simple and
reduce their number.

The ideal situation is when the <process> tag number in your sitemap
grows as a function of

  f(x) := a + b*log(x)

where 'x' is the number of URI served by your serving enviornment and
'f(x)' the number of <process> elements required to manage their
operation.

If this goal is reached, this is very likely (don't have a proof for
this, but I'm working on it for my thesis) you reached the minimum
entropy for your site.

There is a vivid research to find out the best way to "measure" the
complexity of a web site, to define its 'metric'. If we take into
consideration the sitemaps as the 'metric' of our sites, it would be
possible (in theory) to elaborate absolute principles of optimization
based on state theory, not much different from thermodynamic principles
that govern energy and entropy.

Well, in theory :)

Anyway, even if this reasoning started the whole matching deal, after
long reasoning it appeared to me this is not directly related with the
sitemap schema at all. In fact, there would not be absolute metrics if
it was based on the sitemap schema, just like thermodynamics limits
don't depend on the the thermical machine components.

So I wiped my whiteboard, wrote a graphical description of a very
complex URI-processing pipeline and tried to write the markup for it.

To create the schema, I analyzed the XSLT conditional model with
graphical analysys. XSLT shows two different conditional models:

 - xsl:if
 - xsl:choose

the first example

  <xsl:if test="A">
   <1/>
  </xsl:if>

can be viewed as

  -->(A)-----------+--->-
      +----(1)-----+

while

  <xsl:choose>
   <xsl:when test="A">
    <1/>
   </xsl:when>
   <xsl:when test="B">
    <2/>
   </xsl:when>
   <otherwise>
    <3/>
   </otherwise>
  </xsl:choose>

can be visualized as

      +(A)---(1)----+
  -->-+(B)---(2)----+--->-
      +(*)---(3)----+

where '*' indicates "everything but A|B".

The first evidence is that the <xsl:if> model is just a simplification
of the <xsl:choose> model. In fact

  <xsl:choose>
   <xsl:when test="A">
    <1/>
   </xsl:when>
   <otherwise>
    <!-- do nothing -->
   </otherwise>
  </xsl:choose>

is totally equivalent as our first example (even if much more verbose
and harder to use and read).

This inspired me the idea that there is an alternative method based on
if-like conditionals that is equivalent to the xsl:choose model. I also
remembered how incredibly powerful the "else" construct was when
introduced in C (previous strong-typed procedural languages used gotos
or exit points to avoid 'else')

Ok, so I tried with

 <if test="A">
  <1/>
 </if>
 <else>
  <2/>
 </else>

where the possible pipelines are

 (A)  -> 1
 (!A) -> 2

which is different from

 <if test="A">
  <1/>
 </if>
 <2/>

where the possible pipelines are

 (A)  -> 12
 (!A) -> 2

Is this enough? No, we need boolean logic, but we should avoid to do
anything with inverse notation or direct boolean elements. Giacomo
inspired me with the idea that element nesting is equivalent to boolean
operations. Let's see if this is true.

  <if test="A">
   <if test="B">
    <1/>
   </if>
   <else>
    <2/>
   </else>
  </if>
  <else>
   <3/>
  </else>

which leads to

 (A + B)  -> 1
 (A + !B) -> 2
 !(A)     -> 3

or, in terms of truth table

  A B  
  1 1   1
  1 0   2
  0 1   3
  0 0   3

Ok, this shows we can do AND and NOT. But you should know that all
Boolean logic can be determined with just NAND gates (or NOR gates),
which is the theory behind digital two-state electronic circuits.

To prove this, we can show (using DeMorgan laws) that

 A * B = !!(A * B) = !(!A + !B)

  A B | (A * B) | !A !B | (!A + !B) | !(!A + !B)
  1 1      1       0  0       0            1
  1 0      1       0  1       0            1
  0 1      1       1  0       0            1
  0 0      0       1  1       1            0

which shows the proof.

But what does it mean to use NAND only? Well, at first, it means that
it's more verbose to write OR than AND, in fact the conditional table

  (A * B) -> 1
 !(A * B) -> 2

is written like
  
 <if test="A">
  <1>
 </if>
 <else-if test="B">
  <1>
 </else-if>
 <else>
  <2>
 </else>

which requires the duplication of <1/>

There are possible solutions for reduce the impact of this problem:

 a) the use of <resource> placeholders
 b) the addition of boolean operators -inside- the if test string.

At this point, I'm not sure that OR operations are required so much,
given that conditional pipelines are normally AND oriented. But I might
well be mistaken on this by shortsightness.

I'd like to hear your comments on this before stating any decision in
this area about OR operation.

Anysay, is this a complete conditional model? Yes, it is, out two tags
(<if> and <else>) map the boolean space completely.

Are we done? We could be, but take a look at this

  <if test="A">
    <1/>
  </if>
  <else>
   <if test="B">
     <2/>
   </if>
  </else>
  <else>
   <3/>
  </else>

which is the direct equivalent of the xsl:choose example above. I
suggest to introduce another element <else-if> to reduce verbosity... so
it becomes

  <if test="A">
   <1/>
  </if>
  <else-if test="B">
   <2/>
  </else-if>
  <else>
   <3/>
  </else>

which also keeps all the conditional elements at the same siblings
level, which makes it more visually appealing and easier to read.

The use of these three element in a nestable way is a complete
conditional model for pipeline composition and it's the model I propose
for the sitemap.
      
                        ------------ o ------------

So far so good for what concerns the schema for the elements.

It must be noted that the only attribute introduced in the <if> and
<else-if> elements was "test", which represents the condition for the
conditional element.

This follows directly the XSLT model where the testing syntax is
directly defined in another specification (XPath). While complex, this
separation allows powerful reusability of tree-querying capabilities and
it must be appreciated, even if reduces the validation capabilities
during parsing. On the other side, allows the test strings to be more
compact and more readable in the long run.

I previously went against this model and tried to use xml-ized syntax
for the testing string. The first examples were 

 <if type="browser" accepts="wap"/>

which many of you found a little 'esotic' since it used attribute names
to be function of the value of the type attribute. While this is
perfectly legal (XLink itself uses the same pattern in some areas), I
agree there are other solutions that are more XML-friendly and more
reasonable to XML readers. It was suggested to use

 <if type="browser" test="accepts(wap)"/>

which removed the dependencies from the attributes names and their
values.

But it was also suggested to create a complete testing syntax, following
the XPath model.

At first, this appeared as FS to me, but after more thinking (and some
whiteboard tries) I think there must be an incredible readability value
in something like this, if we choose a simple and visible syntax.

I went on noting that if we treat each condition as atomic, we can
always fragment it into three components

  (subject) (action) (predicate)

which is not different from what RDF indicates. So, what we are doing,
is basically sort of reverse RDF, like it was already noted on this
list.

This is normally expressed in sentences like

 if (subject) (action) (predicate) then
   do ...
 else
   do ...

for example

 if user-agent is Mozilla/5 then
   filter with XSLT using styles/xul-style.xsl
 else
   filter with XSLT using styles/normal-style.xsl

I know this might seem totally strange to you now that you are used to
XML syntax and you think about markup-ing almost everything, but take a
look at this translation

 <if test="user-agent is Mozilla/5">
  <filter type="xslt" src:local="styles/xul-style.xsl"/>
 </if>
 <else>
  <filter type="xslt" src:local="styles/normal-style.xsl"/>
 </else>

Yes, we leave the [subject|action|predicate] string by itself and we
don't mark it up. We leave the sitemap parser to validate this and this
is utterly simple since the sentence must _always_ contain two or three
tokens space-separated.

Ok, let's make some examples

  user-agent matches *MSIE*
  atomic-time passed 3:00PM
  user belongs-to administrators
  session is-valid
  cookie contains style
  load greater-then 2.5

where the tokens are identified as such:

1) first token: name of the matching component as defined in the
component section.
2) second token: method name to call in the matching component. This can
be validated by class introspection when the sitemap is loaded.
3) third token: string argument passed to the matching component.

So, for example

 <matcher type="browser" src:class="BrowserMatcher"/>
 ...
 <if test="browser supports image/svg">

the BrowserMatcher class must be something like

 public class BrowserMatcher extends AbstractMatcher {
    public boolean supports(String parameter, ???) {
     ...
   }
 }

where ??? indicates parameters that are yet to be defined but are always
passed to every method (stuff like ServletRequest, ServletResponse,
ServletContext and such)

In the case we want to add boolean operators to the test syntax
directly, this could be done like

 <if test="(browser supports image/svg) or (browser supports
image/svg-xml)"/>

A complete example is here:

  <process uri="*">
   <if test="user belongs-to allowed-users"/>
    <generator type="parser" src:local="*"/>
    <if test="browser supports wap">
     <filter type="xslt" srl:local="stylesheet/2wml.xsl"/>
     <if test="response bigger-than 1.5Kb">
      <serializer type="splitted-wap"/>
     </if>
     <else>
      <serializer type="xml"/>
     </else>
    </if>
    <else-if test="browser wants pdf"/>
     <filter type="xslt" srl:local="stylesheet/2fo.xsl"/>
     <serializer type="fo2pdf"/>
    </else-if>
    <else>
     <filter type="xslt" src:local="stylesheet/2html.xsl">
     <serializer type="html"/>
    <else>
   </if>
   <else>
    <resource name="Error Page"/>
   </else>
  </process>

                      -------- O ------------

Sheesh, that was long :)

In this message I outlined a complete conditional model for pipeline
componentization. It should allow to create simple sitemaps without
problems, but, if required, contains all the syntax needed for every
kind of pipeline complexity.

Easy things should be easy, hard things should be possible :) As Larry
Wall said of Perl. I really hope this doesn't become a huge blob of
different design patterns, so I tried to analyze all possible ways to
simplify the model.

The sitemap is starting to look a lot like the marked-up version of
httpd.conf + mod_rewrite + components + separation of concerns and I
really hope we didn't go too far with the functionality.

Anyway, let's decompose all this and find the holes/strenghts so that we
can move forward (I already have two other main concerns about the
future I would like to address directly in the sitemap... but more on
this when we settled this issue of the conditional model)

Well, time to hit the pillow now :)

Stefano disconnecting...

Re: Pipeline conditional model

Posted by Hans Ulrich Niedermann <ni...@isd.uni-stuttgart.de>.

Hi Stefano,

just a small remark on the beginning of you mail. I haven't had time
to work through the main topics thoroughly yet.

Stefano Mazzocchi <st...@apache.org> writes:

> I think a pipeline conditional model should be componentizable just like
> the pipeline itself.
> 
> To do this, one possible solution is to introduce boolean elements that
> operate on these matching components. For example,
> 
>  <process uri="...">
>   <AND>
>    <matcher type="A"/>
>    <matcher type="B"/>
>    <OR>
>     <marcher type="C"/>
>    </OR>
>   </AND>
>   ...(pipeline)...
>  </process>
> 
> which is the logical equivalent of (using Java syntax)
> 
>   ((A && B) || C)
> 
> and reminds of inverse polish notation.

One could also combine the && and || operators with the surrounding
brackets. This results in a lisp-like expression "(or (and A B) C)"
and could be expressed in XML like

  <process uri="...">
    <OR>
      <AND>
        <matcher type="A"/>
        <matcher type="B"/>
      </AND>
      <matcher type="C"/>
    <OR>
    ...(pipeline)...
  </process>

So you are not forced to use some weird kind of postfix notation at
all. The XML element nesting even nicely reflects the bracket nesting.

Uli

Re: Sitemap Definition

Posted by Stefano Mazzocchi <st...@apache.org>.

Donald Ball wrote:
> 
> Going through the sitemap discussion from over the weekend made we want to
> sit down and remind myself what the sitemap should accomplish. I threw
> together this assessment:
> 
> Ultimately, we want to present information to the user in a fashion
> appropriate to them as much as possible (HTML to a browser, WML to a wap
> phone, PDF to a browser that requests it specifically, etc.) Internally,
> we want the sitemap to tell us, given a request, what initial data we
> start with and what actions are needed to transform it into the results.
> 
> We want to encode the rules for doing this in a sitemap file. We want the
> rule set to be sufficient to enable a given request to resolve to a data
> source and a set of filters. We want the function that handles the
> resolution to be able to depend on any request-time information (e.g.
> requested URI, request parameters, language preference, HTTP headers,
> etc.). We want the ruleset to be chosen to minimize both the number of
> rules and the average length of the sitemap file. (Really, we want to
> minimize ruleset creation and maintenance time, but that's a difficult
> metric to measure.)

> Does this cover everything or did I forget something?

Scalability.

We want to make this scalable with the site growth as well as allow XML
web applications to be easily "plugged-in", for example, to allow
projects like JetSpeed-next or the planned Bugoon to be dropped my like
WAR files for Tomcat.

Also, I would add, the sitemap should be future compatible with visual
component-building GUIs.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

Sitemap Definition

Posted by Donald Ball <ba...@webslingerZ.com>.

Going through the sitemap discussion from over the weekend made we want to
sit down and remind myself what the sitemap should accomplish. I threw
together this assessment:

Ultimately, we want to present information to the user in a fashion
appropriate to them as much as possible (HTML to a browser, WML to a wap
phone, PDF to a browser that requests it specifically, etc.) Internally,
we want the sitemap to tell us, given a request, what initial data we
start with and what actions are needed to transform it into the results.

We want to encode the rules for doing this in a sitemap file. We want the
rule set to be sufficient to enable a given request to resolve to a data
source and a set of filters. We want the function that handles the
resolution to be able to depend on any request-time information (e.g.
requested URI, request parameters, language preference, HTTP headers,
etc.). We want the ruleset to be chosen to minimize both the number of
rules and the average length of the sitemap file. (Really, we want to
minimize ruleset creation and maintenance time, but that's a difficult
metric to measure.)

Does this cover everything or did I forget something?

- donald

Re: Pipeline conditional model

Posted by Donald Ball <ba...@webslingerZ.com>.

On 31 May 2000, John Prevost wrote:

> > I don't like either models so I proposed
> 
> > <if>
> > <else-if>
> > <else>
> 
> Um.  Is there really that great a difference between:
> 
> <case>
>  <when ...>
>   ...
>  </when>
>  <when ...>
>   ...
>  </when>
>  <otherwise>
>   ...
>  </otherwise>
> </case>
> 
> and your "if else-if else" model?  As far as I can tell, there is not,
> except that the bounds of the subtree covered by the "if else endif"
> tags is less clear.

I concur with this assessment. The XSLT conditional element set is
sometimes cumbersome, but always crystal clear.

- donald

Re: Pipeline conditional model

Posted by John Prevost <pr...@maya.com>.

> I don't like either models so I proposed

> <if>
> <else-if>
> <else>

Um.  Is there really that great a difference between:

<case>
 <when ...>
  ...
 </when>
 <when ...>
  ...
 </when>
 <otherwise>
  ...
 </otherwise>
</case>

and your "if else-if else" model?  As far as I can tell, there is not,
except that the bounds of the subtree covered by the "if else endif"
tags is less clear.

Maybe we should be attacking the really important question: what level
of test can be done in the sitemap, and what must appeal to an outside
definition.  Or more importantly, what do we expect to use in the
sitemap itself?

Once we know what we want to do, then is the right time to address how
exactly to do it.

John.

Re: Pipeline conditional model

Posted by Giacomo Pati <Gi...@pwr.ch>.

Stefano Mazzocchi wrote:
> 
> Donald Ball wrote:
> >
> > > To do this, one possible solution is to introduce boolean elements that
> > > operate on these matching components. For example,
> > >
> > >  <process uri="...">
> > >   <AND>
> > >    <matcher type="A"/>
> > >    <matcher type="B"/>
> > >    <OR>
> > >     <marcher type="C"/>
> > >    </OR>
> > >   </AND>
> > >   ...(pipeline)...
> > >  </process>
> > >
> > > which is the logical equivalent of (using Java syntax)
> > >
> > >   ((A && B) || C)
> > >
> > > and reminds of inverse polish notation.
> >
> > (hopefully this hasn't been addressed already - i'm still paging through
> > my thousands of messages - y'all are some chatty people). I'm having
> > trouble reconciling the XML representation with the algebraic one. I would
> > write it like this:
> >
> > <OR>
> >  <AND>
> >   <match type="A"/>
> >   <match type="B"/>
> >  </AND>
> >  <match type="C"/>
> > </OR>
> 
> I don't like either models so I proposed
> 
> <if>
> <else-if>
> <else

+1

Giacomo

> 
> instead.

-- 
PWR GmbH, Organisation & Entwicklung      Tel:   +41 (0)1 856 2202
Giacomo Pati, CTO/CEO                     Fax:   +41 (0)1 856 2201
Hintereichenstrasse 7                     Mailto:Giacomo.Pati@pwr.ch
CH-8166 Niederweningen                    Web:   http://www.pwr.ch

Re: Pipeline conditional model

Posted by Stefano Mazzocchi <st...@apache.org>.

Donald Ball wrote:
> 
> > To do this, one possible solution is to introduce boolean elements that
> > operate on these matching components. For example,
> >
> >  <process uri="...">
> >   <AND>
> >    <matcher type="A"/>
> >    <matcher type="B"/>
> >    <OR>
> >     <marcher type="C"/>
> >    </OR>
> >   </AND>
> >   ...(pipeline)...
> >  </process>
> >
> > which is the logical equivalent of (using Java syntax)
> >
> >   ((A && B) || C)
> >
> > and reminds of inverse polish notation.
> 
> (hopefully this hasn't been addressed already - i'm still paging through
> my thousands of messages - y'all are some chatty people). I'm having
> trouble reconciling the XML representation with the algebraic one. I would
> write it like this:
> 
> <OR>
>  <AND>
>   <match type="A"/>
>   <match type="B"/>
>  </AND>
>  <match type="C"/>
> </OR>

I don't like either models so I proposed

<if>
<else-if>
<else

instead.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

Re: Pipeline conditional model

Posted by Donald Ball <ba...@webslingerZ.com>.

> To do this, one possible solution is to introduce boolean elements that
> operate on these matching components. For example,
> 
>  <process uri="...">
>   <AND>
>    <matcher type="A"/>
>    <matcher type="B"/>
>    <OR>
>     <marcher type="C"/>
>    </OR>
>   </AND>
>   ...(pipeline)...
>  </process>
> 
> which is the logical equivalent of (using Java syntax)
> 
>   ((A && B) || C)
> 
> and reminds of inverse polish notation.

(hopefully this hasn't been addressed already - i'm still paging through
my thousands of messages - y'all are some chatty people). I'm having
trouble reconciling the XML representation with the algebraic one. I would
write it like this:

<OR>
 <AND>
  <match type="A"/>
  <match type="B"/>
 </AND>
 <match type="C"/>
</OR>

- donald