You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@forrest.apache.org by Ferdinand Soethe <sa...@soethe.net> on 2005/02/16 14:49:31 UTC

Re: Reusing legacy HTML

Hi David,

RG> Yes. Please everyone, it is a big problem when committers (and other
RG> developers) are contacted off-list. One-to-one discussions reduce our
RG> effectiveness and could even lead to burn-out.

My fault. Sorry. I was trying to understand a mechanism and didn't
want to clog the list with a lengthy discussion on my rather low level
of understanding. The goal was to document this and put it
up for discussion as soon as I knew what I was talking about.

Well anyway, Ross - after being very helpful in understanding it -
also pointed this out and asked that I post this early version to
the list for comments.

So here it is. Your comments much appreciated before I format it for
inclusion in the Forrest documention?

I was going to use document.dtd and write a second shorter and more to
the point how-to on the specifics of processing your own legacy html.
OK?

Thanks,
Ferdinand Soethe

-------------------------------

So this tries to explain what happens internally when a clients asks Forrest to serve
"mytests/mybad.html", a legacy html-file with lots of junk in
it.

0. Clients asks Forrest to serve ".../xdocs/mytests/mybad.html"

1. Forrest looks for a matching pipeline in
   "...\forrest\main\webapp\sitemap.xmap".

2. This Pattern  would in fact match the request but generates no xml since the
   map:parts match no cocoon pipeline and thus no xml is generated.
   
   <map:match pattern="*.html">
                       {0}=mytests/mybad.html
          <map:aggregate element="site">
            <map:part src="cocoon:/skinconf.xml"/>
            <map:part src="cocoon:/build-info"/>
            <map:part src="cocoon:/tab-{0}"/>
                           => cocoon:/tab-mytests/mybad.html
            <map:part src="cocoon:/menu-{0}"/>
                                   => cocoon:/menu-mytests/mybad.html
            <map:part src="cocoon:/body-{0}"/>
                           => cocoon:/body-mytests/mybad.html
          </map:aggregate>
          <map:call resource="skinit">
            <map:parameter name="type" value="site2xhtml"/>
            <map:parameter name="path" value="{0}"/>
                                            => mytests/mybad.html
          </map:call>
        </map:match>
        
3. This pattern also matches the request and is used to continue
   processing
   
        <map:match pattern="**/*.html">
                                                {0}= mytests/mybad.html
                                                {1}= mytests
                                                {2}= mybad
            <map:aggregate element="site">
              <map:part src="cocoon:/skinconf.xml"/> adds skin info
              <map:part src="cocoon:/build-info"/> adds meta data
              <map:part src="cocoon:/{1}/tab-{2}.html"/> creates tabs
                           =>cocoon:/mytests/tab-mybad.html
              <map:part src="cocoon:/{1}/menu-{2}.html"/> creates menus
                          =>cocoon:/mytests/menu-mybad.html
                          
              Below a cocoon pipeline is called to generate the body 
              <map:part src="cocoon:/{1}/body-{2}.html"/>
                          =>cocoon:/mytests/body-mybad.html
                          
             return here for the rest of this pipeline in step 9
              
          
4.  This is the pipeline called in step 3
    Check if there is an ehtml-file (deprecated embedded html)
    
        <map:match pattern="**body-*.html">
                                            {0}= mytests/body-mybad.html
                                                {1}= mytests/
                                                {2}= mybad
        <map:select type="exists">
          <map:when test="{project:content.xdocs}{1}{2}.ehtml">
                        =>.../xdocs/mytests/mybad.ehtml 
            <map:generate src="{project:content.xdocs}{1}{2}.ehtml" />
                             =>.../xdocs/mytests/mybad.ehtml 
            <map:transform src="{forrest:stylesheets}/html2htmlbody.xsl" />
            <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
                                                  =>cocoon://mytests/linkmap-mybad.html 
            <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
            <map:serialize type="xml" />
          </map:when>
        </map:select>
      </map:match>        

      Since file does not exist, pipeline generates nothing so Forrest
      keeps looking for next matching pipeline ...
        
        
5. ... and finds another pipeline for the same matches

  <!-- Default matches -->
  <!-- (HTML rendered from doc-v11 intermediate format -->
  <map:match pattern="**body-*.html">
                                            {0}= mytests/body-mybad.html
                                                {1}= mytests/
                                                {2}= mybad
  
    In the following step we ask Forrest to call the pipeline for mybad.xml.
    This triggers a new matching attempt starting from the top of the pipeline section.
  
    <map:generate src="cocoon:/{1}{2}.xml"/>
                     =>cocoon:/mytests/mybad.xml
    
    Return here for the rest of this pipeline in step
      
  
6. This below is relevant now as it loads the project sitemap and
   inserts it right at this position of the main sitemap. (This project
   sitemap was also loaded before, but was irrelevant since there were no matches
   in the project sitemap)
    

  <!-- 
     This is the user pipeline, that can answer requests instead
     of the Forrest one, or let requests pass through.
     To take over the rendering of a file it must match the file name and path.
     To take over the generation of the intermediate format, it must give
     Forrest the same filename but ending with xml, and a DTD that Forrest
     recognizes.
  -->
  <map:pipeline internal-only="false"> 4t!!!h step patterns above first
       <map:select type="exists">
         <map:when test="{project:sitemap}">
           <map:mount uri-prefix="" 
                      src="{project:sitemap}" 
                      check-reload="yes" 
                      pass-through="true"/>
         </map:when>  
       </map:select>
  </map:pipeline> 

7. In the project sitemap we find this match for our call for an XML-file!

        <map:match pattern="**/mybad.xml">
                                                {0}= mytests/mybad.xml
                                                {1}= mytests
                                                
  

        Load my file with the html-generator. This generator
        internally uses jtidy to clean up the html and make it xhtml.
        
        <map:generate src="{project:content.xdocs}{1}/mybad.html" type="html"/>

        Now we call my special stylesheet to remove all
        elements that I don't want in the forrest page.
        I place it in the same directory as the source document as it
        is very specific.
        
        <map:transform src="{project:content.xdocs}{1}/mybadHTMLfixer.xsl"/>

        Finally call the existing stylesheet to convert html to document1.1
        <map:transform src="{forrest:stylesheets}/html2document.xsl" />
        
        Serialize result as xml (it is now the body of my Forrest page
        and uses in document.dtd)
        <map:serialize type="xml"/>
        
      </map:match>
      
  
 8. Return to calling routine in step 5 and execute the rest of the pipeline
    to finalize the body of my Forrest page.
    
                        {0}= mytests/body-mybad.html
                                                {1}= mytests/
                                                {2}= mybad
    
            ???
            <map:transform type="idgen"/>

            ???
            <map:transform type="xinclude"/>

            Adjust links
            <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
                                                  => cocoon:/mytests/linkmap-mybad.html
            <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
            <map:call resource="skinit">
              <map:parameter name="type" value="document2html"/>
              <map:parameter name="path" value="{1}{2}.html"/>
                                              => mytests/mybad.html 
              <map:parameter name="notoc" value="false"/>
            </map:call>
   </map:match>
  
   At the end of the pipeline this is the page body in Html with all
   links adjusted.
   
   
9. Return to the calling routine in step 3 and finish processing

                                {0}= mytests/mybad.html
                                                {1}= mytests
                                                {2}= mybad

                </map:aggregate>
                
                At this point the body (as html) is aggregated with the menus and tabs
                and the next part just adds the final touches to the presentation.
                
            <map:call resource="skinit">
              <map:parameter name="type" value="site2xhtml"/>
              <map:parameter name="path" value="{0}"/>
                                              => mytests/mybad.html
            </map:call>
          </map:match>      
          

         At the end, the result is delivered to the browser. 





Re: Reusing legacy HTML

Posted by Ferdinand Soethe <sa...@soethe.net>.
I just checked the new howto in as issue
http://issues.cocoondev.org/browse/FOR-446.

Somebody checking for content and language much appreciated.

--
Ferdinand Soethe



Re: Reusing legacy HTML

Posted by Ross Gardler <rg...@apache.org>.
Ferdinand Soethe wrote:

> As far as reuse is concerned, I'm not convinced. A stylesheet to treat
> such legacy html should be written to be reusable whenever possible
> and should then go to the stylesheet directory.
> 
> But it also might be a poorly written or very special transformation
> that is as reusable as this mail. Putting all these files in the stylesheet
> directory in my eyes hinders reuse because eventually you have to find
> reusable pieces within that heystack of other stuff.
> 
> Would it make sense to have a separate directory with
> reusable stylesheet snippets (perhaps as a subdir 'lib' of the
> stylesheet directory) that can be included into any stylesheet?


We do have a similar concept in skins. There is a common skin that has
the reusable parts of all the other skins. This is possible because all
the stylesheets have the same basic purpose, but it's not quite as
simple for stylesheets that do many and varied things, like those in the
resources directory.

However, there is nothing to stop us having a diretory structure within
the resources/stylesheets directory. So we could put things like
copyover.xsl in a "common" subdirectory and things like your stylesheet
here into a "legacyhtml" subdir. I'm not sure if this will help or
hinder, perhaps others have an opinion.

>>>In this case there would not be any in legacy html so nothing happens
>>>in this step, right?
> 
> 
> RG> That is right, but of course we strive to make the pipelines as general
> RG> as possible to accommodate as many use cases as possible.
> 
> Sorry if this sounded like being critical, was not meant to be. I
> merely wanted to be clear about what is happening.

I din't think it was critical, I was just mentioning it because I
thought such clarifcation would be appropriate in your entry level doc.

Ross


Re[2]: Reusing legacy HTML

Posted by Ferdinand Soethe <sa...@soethe.net>.
Hi Ross,

>> RG> Stylesheets should be placed in the
>> RG> {project:resources.stylesheets} 
>> RG> directory, only displayable content should be in the xdocs directory.
>> 
>> Is this a must, does it cause functional problems. I knew about the
>> stylesheet directory but decided not to fill it with stylesheets that
>> are really only used for one file. Is that a nono?

RG> It is not a must in terms of functionality. But it is a must (in my
RG> opinion, and I believe the vast majority of Forrest Devs) in terms of
RG> design. It really becomes a nightmare to maintain a system when files
RG> can't be found where they are expected to be found. Furthermore, it
RG> makes reuse harder. You say this is a special case, but I'll bet some at
RG> least some of the stuff in the XSL could be reused in other similar
RG> "special" cases. Keeping all stylesheets in one directory promotes this
RG> reuse since all developers are going to know where to look.

RG> It is not a requirement of Forrest, the solution you suggest would work.
RG> However, I'm not sure our documentation should encourage such "bad"
RG> behaviour.

OK, I can see the point about having resources in clearly
defined locations. So the advantages in terms of maintaining and
enhancing the system makes this a NO-NO and I will stress that
in the documentation.

As far as reuse is concerned, I'm not convinced. A stylesheet to treat
such legacy html should be written to be reusable whenever possible
and should then go to the stylesheet directory.

But it also might be a poorly written or very special transformation
that is as reusable as this mail. Putting all these files in the stylesheet
directory in my eyes hinders reuse because eventually you have to find
reusable pieces within that heystack of other stuff.

Would it make sense to have a separate directory with
reusable stylesheet snippets (perhaps as a subdir 'lib' of the
stylesheet directory) that can be included into any stylesheet?

RG> Forrest must ensure that there are ID's for all key elements so that
RG> things like the Table of Contents will work.

>> Will those names and references get replaced? Because some names cannot
>> be used as IDs?

RG> If a value is provided in the source document then that is used, if not
RG> a new (and valid one) is generated.

Cool stuff!

>> In this case there would not be any in legacy html so nothing happens
>> in this step, right?

RG> That is right, but of course we strive to make the pipelines as general
RG> as possible to accommodate as many use cases as possible.

Sorry if this sounded like being critical, was not meant to be. I
merely wanted to be clear about what is happening.

Regards,
Ferdinand Soethe



Re: Reusing legacy HTML

Posted by Ross Gardler <rg...@apache.org>.
Ferdinand Soethe wrote:

> RG> Stylesheets should be placed in the
> RG> {project:resources.stylesheets} 
> RG> directory, only displayable content should be in the xdocs directory.
> 
> Is this a must, does it cause functional problems. I knew about the
> stylesheet directory but decided not to fill it with stylesheets that
> are really only used for one file. Is that a nono?

It is not a must in terms of functionality. But it is a must (in my 
opinion, and I believe the vast majority of Forrest Devs) in terms of 
design. It really becomes a nightmare to maintain a system when files 
can't be found where they are expected to be found. Furthermore, it 
makes reuse harder. You say this is a special case, but I'll bet some at 
least some of the stuff in the XSL could be reused in other similar 
"special" cases. Keeping all stylesheets in one directory promotes this 
reuse since all developers are going to know where to look.

It is not a requirement of Forrest, the solution you suggest would work. 
However, I'm not sure our documentation should encourage such "bad" 
behaviour.

>>>            ???
>>>            <map:transform type="idgen"/>
> 
> 
> RG> generates ID attributes for elements that are used for internal linking
> RG> (i.e. <a href="thispage.html#thatPosition">Go to thatPosition</a>)
> 
> If I'm coming from an html source, why would I need to do that. If
> there are any anchors within the page they'd already have an name.

Forrest must ensure that there are ID's for all key elements so that 
things like the Table of Contents will work.

> Will those names and references get replaced? Because some names cannot
> be used as IDs?

If a value is provided in the source document then that is used, if not 
a new (and valid one) is generated.

>>>            ???
>>>            <map:transform type="xinclude"/>
> 
> 
> RG> XInclude content from other files (see
> RG> http://www.w3.org/TR/xinclude/)
> 
> In this case there would not be any in legacy html so nothing happens
> in this step, right?

That is right, but of course we strive to make the pipelines as general 
as possible to accommodate as many use cases as possible. We can afford 
to do this because of Cocoon's powerful caching, the overhead is minimal 
since most pages will be generated once then cached.

Ross

Re[2]: Reusing legacy HTML

Posted by Ferdinand Soethe <sa...@soethe.net>.
Hi Ross,

thanks for editing this.

RG> That would be brilliant (don't forget we have a HowTo DTD).

I was gonna use the HowTo for the second version. Since the first one
if more of the general documentation type, I'd rather not use it
there.

RG> Actually this does not match because *.html will only match something
RG> like "mybad.html". Here you have a directory structure so you need "**"
RG> in order to match it. "*" means a single part of an URL, "**" means any
RG> number of parts to an URL.

The stars against me, as usual :-) Will take out that para oder change
it.

RG> The number of slashes is significant. "cocoon:/" means look for a
RG> pipeline in the current sitemap, "cocoon://" means look for a match in
RG> any xmap. This is not significant a this point but will be later.

Later when. If it is not relevant to this topic, I'd rather not
mention it here as it really is a side issue.

>> 6. This below is relevant now as it loads the project sitemap and
>>    inserts it right at this position of the main sitemap. (This project
>>    sitemap was also loaded before, but was irrelevant since there were no matches
>>    in the project sitemap)

RG> Probably best, in the documentation to describe what happens *without* a
RG> project sitemap first since this will be the default behaviour. Then
RG> your howto can describe how the project sitemap intercepts this request.

Good idea, I'll go back to the sitemap and try to understand what
would happen ...

>>
>>         Now we call my special stylesheet to remove all
>>         elements that I don't want in the forrest page.
>>         I place it in the same directory as the source document as it
>>         is very specific.
>>         
>>         <map:transform
>> src="{project:content.xdocs}{1}/mybadHTMLfixer.xsl"/>

RG> Stylesheets should be placed in the
RG> {project:resources.stylesheets} 
RG> directory, only displayable content should be in the xdocs directory.

Is this a must, does it cause functional problems. I knew about the
stylesheet directory but decided not to fill it with stylesheets that
are really only used for one file. Is that a nono?

>>             ???
>>             <map:transform type="idgen"/>

RG> generates ID attributes for elements that are used for internal linking
RG> (i.e. <a href="thispage.html#thatPosition">Go to thatPosition</a>)

If I'm coming from an html source, why would I need to do that. If
there are any anchors within the page they'd already have an name.
Will those names and references get replaced? Because some names cannot
be used as IDs?
Or will it just add ids to elements like headers that are needed for
Forrest local navigation (Headers)?

>>             ???
>>             <map:transform type="xinclude"/>

RG> XInclude content from other files (see
RG> http://www.w3.org/TR/xinclude/)

In this case there would not be any in legacy html so nothing happens
in this step, right?


RG> Looks good to me.

Thanks for your help in understanding this.

--
Ferdinand Soethe



Re: Reusing legacy HTML

Posted by Ross Gardler <rg...@apache.org>.
Ferdinand Soethe wrote:
> So here it is. Your comments much appreciated before I format it for
> inclusion in the Forrest documention?

I've moved to the dev list rather than the user list. The devs want to 
see this develop, the users want the final document.

> I was going to use document.dtd and write a second shorter and more to
> the point how-to on the specifics of processing your own legacy html.
> OK?

That would be brilliant (don't forget we have a HowTo DTD).

> So this tries to explain what happens internally when a clients asks Forrest to serve
> "mytests/mybad.html", a legacy html-file with lots of junk in
> it.

junk = things like legacy navigation?

> 
> 0. Clients asks Forrest to serve ".../xdocs/mytests/mybad.html"

"http://some.domain.org/mystest/mybad.html"

> 1. Forrest looks for a matching pipeline in
>    "...\forrest\main\webapp\sitemap.xmap".
> 
> 2. This Pattern  would in fact match the request but generates no xml since the
>    map:parts match no cocoon pipeline and thus no xml is generated.

Actually this does not match because *.html will only match something 
like "mybad.html". Here you have a directory structure so you need "**" 
in order to match it. "*" means a single part of an URL, "**" means any 
number of parts to an URL.


> 3. This pattern also matches the request and is used to continue
>    processing
>    
>         <map:match pattern="**/*.html">
>                                                 {0}= mytests/mybad.html
>                                                 {1}= mytests
>                                                 {2}= mybad
>             <map:aggregate element="site">
>               <map:part src="cocoon:/skinconf.xml"/> adds skin info
>               <map:part src="cocoon:/build-info"/> adds meta data
>               <map:part src="cocoon:/{1}/tab-{2}.html"/> creates tabs
>                            =>cocoon:/mytests/tab-mybad.html
>               <map:part src="cocoon:/{1}/menu-{2}.html"/> creates menus
>                           =>cocoon:/mytests/menu-mybad.html
>                           
>               Below a cocoon pipeline is called to generate the body 
>               <map:part src="cocoon:/{1}/body-{2}.html"/>
>                           =>cocoon:/mytests/body-mybad.html
>                           
>              return here for the rest of this pipeline in step 9
>               
>           
> 4.  This is the pipeline called in step 3
>     Check if there is an ehtml-file (deprecated embedded html)
>     
>         <map:match pattern="**body-*.html">
>                                             {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>         <map:select type="exists">
>           <map:when test="{project:content.xdocs}{1}{2}.ehtml">
>                         =>.../xdocs/mytests/mybad.ehtml 

The value of {project:content.xdocs} depends on the settings in 
forrest.properties

>             <map:generate src="{project:content.xdocs}{1}{2}.ehtml" />
>                              =>.../xdocs/mytests/mybad.ehtml 
>             <map:transform src="{forrest:stylesheets}/html2htmlbody.xsl" />
>             <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
>                                                   =>cocoon://mytests/linkmap-mybad.html 
>             <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
>             <map:serialize type="xml" />
>           </map:when>
>         </map:select>
>       </map:match>        
> 
>       Since file does not exist, pipeline generates nothing so Forrest
>       keeps looking for next matching pipeline ...

Yes, a pipeline is only considered as having executed when it generates 
some content, the test for the *.ehtml file fails so no processing is done.

> 5. ... and finds another pipeline for the same matches
> 
>   <!-- Default matches -->
>   <!-- (HTML rendered from doc-v11 intermediate format -->
>   <map:match pattern="**body-*.html">
>                                             {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>   
>     In the following step we ask Forrest to call the pipeline for mybad.xml.
>     This triggers a new matching attempt starting from the top of the pipeline section.
>   
>     <map:generate src="cocoon:/{1}{2}.xml"/>
>                      =>cocoon:/mytests/mybad.xml
>     
>     Return here for the rest of this pipeline in step

The number of slashes is significant. "cocoon:/" means look for a 
pipeline in the current sitemap, "cocoon://" means look for a match in 
any xmap. This is not significant a this point but will be later.

> 6. This below is relevant now as it loads the project sitemap and
>    inserts it right at this position of the main sitemap. (This project
>    sitemap was also loaded before, but was irrelevant since there were no matches
>    in the project sitemap)

Probably best, in the documentation to describe what happens *without* a 
project sitemap first since this will be the default behaviour. Then 
your howto can describe how the project sitemap intercepts this request.

>   <!-- 
>      This is the user pipeline, that can answer requests instead
>      of the Forrest one, or let requests pass through.
>      To take over the rendering of a file it must match the file name and path.
>      To take over the generation of the intermediate format, it must give
>      Forrest the same filename but ending with xml, and a DTD that Forrest
>      recognizes.
>   -->
>   <map:pipeline internal-only="false"> 4t!!!h step patterns above first
>        <map:select type="exists">
>          <map:when test="{project:sitemap}">
>            <map:mount uri-prefix="" 
>                       src="{project:sitemap}" 
>                       check-reload="yes" 
>                       pass-through="true"/>
>          </map:when>  
>        </map:select>
>   </map:pipeline> 
> 
> 7. In the project sitemap we find this match for our call for an XML-file!
> 
>         <map:match pattern="**/mybad.xml">
>                                                 {0}= mytests/mybad.xml
>                                                 {1}= mytests
>                                                 
>   
> 
>         Load my file with the html-generator. This generator
>         internally uses jtidy to clean up the html and make it xhtml.
>         
>         <map:generate src="{project:content.xdocs}{1}/mybad.html" type="html"/>
> 
>         Now we call my special stylesheet to remove all
>         elements that I don't want in the forrest page.
>         I place it in the same directory as the source document as it
>         is very specific.
>         
>         <map:transform src="{project:content.xdocs}{1}/mybadHTMLfixer.xsl"/>


Stylesheets should be placed in the {project:resources.stylesheets} 
directory, only displayable content should be in the xdocs directory.

>         Finally call the existing stylesheet to convert html to document1.1
>         <map:transform src="{forrest:stylesheets}/html2document.xsl" />
>         
>         Serialize result as xml (it is now the body of my Forrest page
>         and uses in document.dtd)
>         <map:serialize type="xml"/>
>         
>       </map:match>
>       
>   
>  8. Return to calling routine in step 5 and execute the rest of the pipeline
>     to finalize the body of my Forrest page.
>     
>                         {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>     
>             ???
>             <map:transform type="idgen"/>

generates ID attributes for elements that are used for internal linking 
(i.e. <a href="thispage.html#thatPosition">Go to thatPosition</a>)

> 
>             ???
>             <map:transform type="xinclude"/>

XInclude content from other files (see http://www.w3.org/TR/xinclude/)


>             Adjust links
>             <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
>                                                   => cocoon:/mytests/linkmap-mybad.html
>             <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
>             <map:call resource="skinit">
>               <map:parameter name="type" value="document2html"/>
>               <map:parameter name="path" value="{1}{2}.html"/>
>                                               => mytests/mybad.html 
>               <map:parameter name="notoc" value="false"/>
>             </map:call>
>    </map:match>
>   
>    At the end of the pipeline this is the page body in Html with all
>    links adjusted.

Yes, to see it you can type http://localhost:8888/mytest/body-mybad.html 
into your browser.

> 9. Return to the calling routine in step 3 and finish processing
> 
>                                 {0}= mytests/mybad.html
>                                                 {1}= mytests
>                                                 {2}= mybad
> 
>                 </map:aggregate>
>                 
>                 At this point the body (as html) is aggregated with the menus and tabs
>                 and the next part just adds the final touches to the presentation.
>                 
>             <map:call resource="skinit">
>               <map:parameter name="type" value="site2xhtml"/>
>               <map:parameter name="path" value="{0}"/>
>                                               => mytests/mybad.html
>             </map:call>
>           </map:match>      
>           
> 
>          At the end, the result is delivered to the browser. 

Looks good to me.

Ross

Re: Reusing legacy HTML

Posted by Ross Gardler <rg...@apache.org>.
Ferdinand Soethe wrote:
> So here it is. Your comments much appreciated before I format it for
> inclusion in the Forrest documention?

I've moved to the dev list rather than the user list. The devs want to 
see this develop, the users want the final document.

> I was going to use document.dtd and write a second shorter and more to
> the point how-to on the specifics of processing your own legacy html.
> OK?

That would be brilliant (don't forget we have a HowTo DTD).

> So this tries to explain what happens internally when a clients asks Forrest to serve
> "mytests/mybad.html", a legacy html-file with lots of junk in
> it.

junk = things like legacy navigation?

> 
> 0. Clients asks Forrest to serve ".../xdocs/mytests/mybad.html"

"http://some.domain.org/mystest/mybad.html"

> 1. Forrest looks for a matching pipeline in
>    "...\forrest\main\webapp\sitemap.xmap".
> 
> 2. This Pattern  would in fact match the request but generates no xml since the
>    map:parts match no cocoon pipeline and thus no xml is generated.

Actually this does not match because *.html will only match something 
like "mybad.html". Here you have a directory structure so you need "**" 
in order to match it. "*" means a single part of an URL, "**" means any 
number of parts to an URL.


> 3. This pattern also matches the request and is used to continue
>    processing
>    
>         <map:match pattern="**/*.html">
>                                                 {0}= mytests/mybad.html
>                                                 {1}= mytests
>                                                 {2}= mybad
>             <map:aggregate element="site">
>               <map:part src="cocoon:/skinconf.xml"/> adds skin info
>               <map:part src="cocoon:/build-info"/> adds meta data
>               <map:part src="cocoon:/{1}/tab-{2}.html"/> creates tabs
>                            =>cocoon:/mytests/tab-mybad.html
>               <map:part src="cocoon:/{1}/menu-{2}.html"/> creates menus
>                           =>cocoon:/mytests/menu-mybad.html
>                           
>               Below a cocoon pipeline is called to generate the body 
>               <map:part src="cocoon:/{1}/body-{2}.html"/>
>                           =>cocoon:/mytests/body-mybad.html
>                           
>              return here for the rest of this pipeline in step 9
>               
>           
> 4.  This is the pipeline called in step 3
>     Check if there is an ehtml-file (deprecated embedded html)
>     
>         <map:match pattern="**body-*.html">
>                                             {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>         <map:select type="exists">
>           <map:when test="{project:content.xdocs}{1}{2}.ehtml">
>                         =>.../xdocs/mytests/mybad.ehtml 

The value of {project:content.xdocs} depends on the settings in 
forrest.properties

>             <map:generate src="{project:content.xdocs}{1}{2}.ehtml" />
>                              =>.../xdocs/mytests/mybad.ehtml 
>             <map:transform src="{forrest:stylesheets}/html2htmlbody.xsl" />
>             <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
>                                                   =>cocoon://mytests/linkmap-mybad.html 
>             <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
>             <map:serialize type="xml" />
>           </map:when>
>         </map:select>
>       </map:match>        
> 
>       Since file does not exist, pipeline generates nothing so Forrest
>       keeps looking for next matching pipeline ...

Yes, a pipeline is only considered as having executed when it generates 
some content, the test for the *.ehtml file fails so no processing is done.

> 5. ... and finds another pipeline for the same matches
> 
>   <!-- Default matches -->
>   <!-- (HTML rendered from doc-v11 intermediate format -->
>   <map:match pattern="**body-*.html">
>                                             {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>   
>     In the following step we ask Forrest to call the pipeline for mybad.xml.
>     This triggers a new matching attempt starting from the top of the pipeline section.
>   
>     <map:generate src="cocoon:/{1}{2}.xml"/>
>                      =>cocoon:/mytests/mybad.xml
>     
>     Return here for the rest of this pipeline in step

The number of slashes is significant. "cocoon:/" means look for a 
pipeline in the current sitemap, "cocoon://" means look for a match in 
any xmap. This is not significant a this point but will be later.

> 6. This below is relevant now as it loads the project sitemap and
>    inserts it right at this position of the main sitemap. (This project
>    sitemap was also loaded before, but was irrelevant since there were no matches
>    in the project sitemap)

Probably best, in the documentation to describe what happens *without* a 
project sitemap first since this will be the default behaviour. Then 
your howto can describe how the project sitemap intercepts this request.

>   <!-- 
>      This is the user pipeline, that can answer requests instead
>      of the Forrest one, or let requests pass through.
>      To take over the rendering of a file it must match the file name and path.
>      To take over the generation of the intermediate format, it must give
>      Forrest the same filename but ending with xml, and a DTD that Forrest
>      recognizes.
>   -->
>   <map:pipeline internal-only="false"> 4t!!!h step patterns above first
>        <map:select type="exists">
>          <map:when test="{project:sitemap}">
>            <map:mount uri-prefix="" 
>                       src="{project:sitemap}" 
>                       check-reload="yes" 
>                       pass-through="true"/>
>          </map:when>  
>        </map:select>
>   </map:pipeline> 
> 
> 7. In the project sitemap we find this match for our call for an XML-file!
> 
>         <map:match pattern="**/mybad.xml">
>                                                 {0}= mytests/mybad.xml
>                                                 {1}= mytests
>                                                 
>   
> 
>         Load my file with the html-generator. This generator
>         internally uses jtidy to clean up the html and make it xhtml.
>         
>         <map:generate src="{project:content.xdocs}{1}/mybad.html" type="html"/>
> 
>         Now we call my special stylesheet to remove all
>         elements that I don't want in the forrest page.
>         I place it in the same directory as the source document as it
>         is very specific.
>         
>         <map:transform src="{project:content.xdocs}{1}/mybadHTMLfixer.xsl"/>


Stylesheets should be placed in the {project:resources.stylesheets} 
directory, only displayable content should be in the xdocs directory.

>         Finally call the existing stylesheet to convert html to document1.1
>         <map:transform src="{forrest:stylesheets}/html2document.xsl" />
>         
>         Serialize result as xml (it is now the body of my Forrest page
>         and uses in document.dtd)
>         <map:serialize type="xml"/>
>         
>       </map:match>
>       
>   
>  8. Return to calling routine in step 5 and execute the rest of the pipeline
>     to finalize the body of my Forrest page.
>     
>                         {0}= mytests/body-mybad.html
>                                                 {1}= mytests/
>                                                 {2}= mybad
>     
>             ???
>             <map:transform type="idgen"/>

generates ID attributes for elements that are used for internal linking 
(i.e. <a href="thispage.html#thatPosition">Go to thatPosition</a>)

> 
>             ???
>             <map:transform type="xinclude"/>

XInclude content from other files (see http://www.w3.org/TR/xinclude/)


>             Adjust links
>             <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
>                                                   => cocoon:/mytests/linkmap-mybad.html
>             <map:transform src="resources/stylesheets/declare-broken-site-links.xsl" />
>             <map:call resource="skinit">
>               <map:parameter name="type" value="document2html"/>
>               <map:parameter name="path" value="{1}{2}.html"/>
>                                               => mytests/mybad.html 
>               <map:parameter name="notoc" value="false"/>
>             </map:call>
>    </map:match>
>   
>    At the end of the pipeline this is the page body in Html with all
>    links adjusted.

Yes, to see it you can type http://localhost:8888/mytest/body-mybad.html 
into your browser.

> 9. Return to the calling routine in step 3 and finish processing
> 
>                                 {0}= mytests/mybad.html
>                                                 {1}= mytests
>                                                 {2}= mybad
> 
>                 </map:aggregate>
>                 
>                 At this point the body (as html) is aggregated with the menus and tabs
>                 and the next part just adds the final touches to the presentation.
>                 
>             <map:call resource="skinit">
>               <map:parameter name="type" value="site2xhtml"/>
>               <map:parameter name="path" value="{0}"/>
>                                               => mytests/mybad.html
>             </map:call>
>           </map:match>      
>           
> 
>          At the end, the result is delivered to the browser. 

Looks good to me.

Ross