You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by David Crossley <cr...@apache.org> on 2006/01/17 23:54:54 UTC

Cocoon cli confirm-extensions (Was: xml output plugin and filename extension .xml)

Ross Gardler wrote:
> Thorsten Scherler wrote:
> >David Crossley escribi??:
> >>David Crossley wrote:
> >>>Ross Gardler wrote:
> >>>
> >>>>Is anyone familiar with configuration of the Cocoon crawler? We need to 
> >>>>modify it so that it will follow links defined in whatever format the 
> >>>>output document creates rather than just HTML format documents.
> >>>
> >>>In our main/webapp/WEB-INF/cli.xconf
> >>>
> >>>   |    confirm-extensions: check the mime type for the generated page
> >>>   |                        and adjust filename and links extensions
> >>>   |                        to match the mime type
> >>>   |                        (e.g. text/html->.html)
> >>>
> >>>at the moment it is set to false.
> >>>
> >>>I have never understood how to use it.
> >>>
> >>>Are you suggesting that we might be able to get rid of
> >>>the need for responding on filename extensions.
> >>>
> >>>http://cocoon.apache.org/2.1/userdocs/offline/
> >>>http://wiki.apache.org/cocoon/CommandLine
> >>>
> >>>I notice from those docs that the default is
> >>>confirm-extensions=true (opposite to us).
> >>
> >>I tried this today ...
> >>
> >>Edit main/webapp/WEB-INF/cli.xconf and
> >>set "confirm-extensions=true".
> >>
> >>Do 'forrest site' ...
> >>
> >>* [1/0]     [0/0]     5.633s 10.5Kb  linkmap.html
> >>Total time: 0 minutes 7 seconds,  Site size: 10,782 Site pages: 1
> >>
> >>So it processed the first page but did not gather any links
> >>from the page (the third column numbers are empty).

Perhaps internally Cocoon is now appending a
filename extension, which confuses the linkgatherer.
I don't even know if "confirm-extensions" does that.
One should look at the Cocoon code.

> >>Unfortunately we cannot see any logs in 'forrest site' mode
> >>due to issue:

Cannot find the Jira issue. It does cause big problems
for being able to debug.

> >Just a shot in the dark, we have/had a similar problem in v2. The
> >crawler expect certain markup such as <a href=""/> AFAIR. 
> 
> According to the CLI docs (if I remember correctly) the crawler should 
> follow links in @href, @src, etc. regardless of the parent element.

I think that is tangential.

My experiment is with our site-author docs. It has
stacks of links that are normally processed.

[ snip ]

> What is forrest run doing?

All okay in 'forrest run' mode because this is the
Cocoon configuration cli.xconf, i.e. command-line
i.e. 'forrest site'

-David