You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by David Crossley <cr...@apache.org> on 2006/03/29 08:39:31 UTC

Re: [RT] The environment abstraction, part II

On Thu, Feb 02, 2006 Daniel Fagerstrom wrote:
> Carsten Ziegeler skrev:
> >Daniel Fagerstrom wrote:
> >>AFAIK you can't call filters and listeners from within servlets, so they 
> >>are at the servlet container level, and I don't see how a block would 
> >>need them. A block could certainly need something that a listener put in 
> >>a context attribute or that a filter did to the request, but that is 
> >>another question.
> >>
> >There was recently a discussion here about adding a servlet listener for
> >some functionality in the xsp block - I don't know what, but the
> >important message is that this will happen and already happens: some
> >blocks need more than just the servlet: like listeners or filters or
> >whatever the servlet spec requires.
> 
> As I said, servlet listeners are for putting something in a context 
> attribute so that it is available to the servlet. With blocks we have a 
> sophisticated component layer that is available for all blocks, so I 
> fail to see why anyone would need to use a servlet listener at the block 
> level. You need to go into the details of the above usecase to convince me.
> 
> Filters might be usable within blocks, but it is a rather crude 
> mechanism compared to pipelines, so I need usecases to become convinced 
> that it would be worthwhile to support.
> 
> >And as soon as a block uses these
> >features you need a full servlet environment and not just a http
> >request, response and context.
> 
> Question is still why a block would need them.
> 
> >In addition, I could imagine that Cocoon provides filters which might be
> >used by other web frameworks running in the same web app as Cocoon.
> >I might be wrong, but I think another issue is if you are using spring.
> >Spring is initialized through a servlet listener and is assuming a
> >servlet context to work properly. So as soon as you don't have the
> >listener, you can use Spring in that way.
> 
> IIUC, the idea is that you start a Spring container complete with 
> component configurations and everything in a servlet listener, then it 
> makes it service manager available from within the servlets through a 
> context attribute.
> 
> I'm rather certain that we would prefer a less primitive Spring 
> integration in the blocks architecture. One where the Spring container 
> is started within the blocks architecture and where a block can contain 
> Spring managed components with its own component configuration.
> 
> >Of course there are other ways
> >of using Spring which would also work in a CLI but they do not leverage
> >the special web functionality. So you either don't use that or you have
> >two versions, one for the Cli and one for the web.
> 
> As described above "the special web functionality" is a hack around the 
> fact that servlets are more isolated from each other than one would 
> prefer from a system building design POV, blocks was created to address 
> this issues with the servlet architecture.
> 
> >I can imagine more scenarios for these kind of things and we could avoid
> >all of them. The only drawback - if you want to call it drawback - is
> >that the CLI is firing up internally a servlet engine. But I could
> >imagine that this "clarification of environments" would also make the
> >work for Forrest easier in the end.
> 
> I think it would be enough if we provide a lightweight CLI that just set 
> up context, request and response. If someone want to use Forrest in a 
> full servlet container I wonder if there really is any usecases for 
> needing to do that from the CLI, why not just use a web server in that case.

Hi Daniel, sorry i cannot understand that last sentence.
Would you please re-phrase it.

We currently have the three ways:

'forrest run'
Starts its packaged Jetty and uses Forrest/Cocoon as a webapp.

'forrest war'
Builds a projectName.war ready for deployment in a full Jetty
or Tomcat.

'forrest site'
Calls the Cocoon CLI to generate a static set of docs.

-David

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Upayavira <uv...@odoko.co.uk>.
Thorsten Scherler wrote:
> El lun, 03-04-2006 a las 12:34 +0100, Upayavira escribió:
>> Thorsten Scherler wrote:
>>> El lun, 03-04-2006 a las 09:00 +0100, Upayavira escribió:
>>>> David Crossley wrote:
>>>>> Upayavira wrote:
>>>>>> Sylvain Wallez wrote:
>>>>>>> Carsten Ziegeler wrote:
>>>>>>>> Sylvain Wallez wrote:
>>>>>>>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
>>>>>>>>> although the new crawler can be based on servlets, it will assume these
>>>>>>>>> servlets to answer to a ?cocoon-view=links :-)
>>>>>>>>>     
>>>>>>>> Hmm, I think we don't need the links view in this case anymore. A simple
>>>>>>>>  HTML crawler should be enough as it will follow all links on the page.
>>>>>>>> The view would only make sense in the case where you don't output html
>>>>>>>> where the usual crawler tools would not work.
>>>>>>>>   
>>>>>>> In the case of Forrest, you're probably right. Now the links view also
>>>>>>> allows to follow links in pipelines producing something that's not HTML,
>>>>>>> such as PDF, SVG, WML, etc.
>>>>>>>
>>>>>>> We have to decide if we want to loose this feature.
>>>>> I am not sure if we use this in Forrest. If not
>>>>> then we probably should be. 
>>>>>
>>>>>> In my view, the whole idea of crawling (i.e. gathering links from pages)
>>>>>> is suboptimal anyway. For example, some sites don't directly link to all
>>>>>> pages (e.g. they are accessed via javascript, or whatever) so you get
>>>>>> pages missed.
>>>>>>
>>>>>> Were I to code a new CLI, whilst I would support crawling I would mainly
>>>>>> configure the CLI to get the list of pages to visit by calling one or
>>>>>> more URLs. Those URLs would specify the pages to generate.
>>>>>>
>>>>>> Thus, Forrest would transform its site.xml file into this list of pages,
>>>>>> and drive the CLI via that.
>>>>> This is what we do do. We have a property
>>>>> "start-uri=linkmap.html"
>>>>> http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
>>>>> (we actually use corresponding xml of course).
>>>>>
>>>>> We define a few extra URIs in the Cocoon cli.xconf
>>>>>
>>>>> There are issues of course. Sometimes we want to
>>>>> include directories of files that are not referenced
>>>>> in site.xml navigation. For my sites i just use a
>>>>> DirectoryGenerator to build an index page which feeds
>>>>> the crawler. Sometime that technique is not sufficent.
>>>>>
>>>>> We also gather links from text files (e.g. CSS)
>>>>> using Chaperon. This works nicely but introduces
>>>>> some overhead.
>>>> This more or less confirms my suggested approach - allow crawling at the
>>>> 'end-point' HTML, but more importantly, use a page/URL to identify the
>>>> pages to be crawled. The interesting thing from what you say is that
>>>> this page could itself be nothing more than HTML.
>>> Well, yes and not really, since e.g. Chaperon is text based and no
>>> markup. You need a lex-writer to generate links for the crawler. 
>> Yes. You misunderstand me I think.
> 
> Yes, sorry I did misunderstood you.
> 
>>  Even if you use Chaperon etc to parse
>> markup, there'd be no difficulty expressing the links that you found as
>> an HTML page - one intended to be consumed by the CLI, not to be
>> publically viewed.
> 
> Well in the case of css you want them as well publically viewed but I
> got your point. ;)
> 
>>  In fact, if it were written to disc, forrest would
>> probably delete it afterwards.
>>
>>> Forrest actually is *not* aimed for html only support and one can think
>>> of the situation that you want your site to be only txt (kind of a
>>> book). Here you need to crawler the lex-rewriter outcome and follow the
>>> links.
>> Hopefully I've shown that I had understood that already :-)
> 
> yeah ;)
> 
>>> The current limitation of forrest regarding the crawler are IMO not
>>> caused by the crawler design but rather by our (as in forrest) usage of
>>> it.
>> Yep, fair enough. But if the CLI is going to survive the shift that is
>> happening in Cocoon trunk, something big needs to be done by someone. It
>> cannot survive in its current form as the code it uses is changing
>> almost beyond recognition.
>>
>> Heh, perhaps the Cocoon CLI should just be a Maven plugin.
> 
> ...or forrest plugin. ;) This would makes it possible that cocoon, lenya
> and forrest committer can help.
> 
> Kind of http://svn.apache.org/viewcvs.cgi/lenya/sandbox/doco/ ;)

Well, in the end, it is he who implements that decides.

Upayavira

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Thorsten Scherler <th...@apache.org>.
El lun, 03-04-2006 a las 12:34 +0100, Upayavira escribió:
> Thorsten Scherler wrote:
> > El lun, 03-04-2006 a las 09:00 +0100, Upayavira escribió:
> >> David Crossley wrote:
> >>> Upayavira wrote:
> >>>> Sylvain Wallez wrote:
> >>>>> Carsten Ziegeler wrote:
> >>>>>> Sylvain Wallez wrote:
> >>>>>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
> >>>>>>> although the new crawler can be based on servlets, it will assume these
> >>>>>>> servlets to answer to a ?cocoon-view=links :-)
> >>>>>>>     
> >>>>>> Hmm, I think we don't need the links view in this case anymore. A simple
> >>>>>>  HTML crawler should be enough as it will follow all links on the page.
> >>>>>> The view would only make sense in the case where you don't output html
> >>>>>> where the usual crawler tools would not work.
> >>>>>>   
> >>>>> In the case of Forrest, you're probably right. Now the links view also
> >>>>> allows to follow links in pipelines producing something that's not HTML,
> >>>>> such as PDF, SVG, WML, etc.
> >>>>>
> >>>>> We have to decide if we want to loose this feature.
> >>> I am not sure if we use this in Forrest. If not
> >>> then we probably should be. 
> >>>
> >>>> In my view, the whole idea of crawling (i.e. gathering links from pages)
> >>>> is suboptimal anyway. For example, some sites don't directly link to all
> >>>> pages (e.g. they are accessed via javascript, or whatever) so you get
> >>>> pages missed.
> >>>>
> >>>> Were I to code a new CLI, whilst I would support crawling I would mainly
> >>>> configure the CLI to get the list of pages to visit by calling one or
> >>>> more URLs. Those URLs would specify the pages to generate.
> >>>>
> >>>> Thus, Forrest would transform its site.xml file into this list of pages,
> >>>> and drive the CLI via that.
> >>> This is what we do do. We have a property
> >>> "start-uri=linkmap.html"
> >>> http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
> >>> (we actually use corresponding xml of course).
> >>>
> >>> We define a few extra URIs in the Cocoon cli.xconf
> >>>
> >>> There are issues of course. Sometimes we want to
> >>> include directories of files that are not referenced
> >>> in site.xml navigation. For my sites i just use a
> >>> DirectoryGenerator to build an index page which feeds
> >>> the crawler. Sometime that technique is not sufficent.
> >>>
> >>> We also gather links from text files (e.g. CSS)
> >>> using Chaperon. This works nicely but introduces
> >>> some overhead.
> >> This more or less confirms my suggested approach - allow crawling at the
> >> 'end-point' HTML, but more importantly, use a page/URL to identify the
> >> pages to be crawled. The interesting thing from what you say is that
> >> this page could itself be nothing more than HTML.
> > 
> > Well, yes and not really, since e.g. Chaperon is text based and no
> > markup. You need a lex-writer to generate links for the crawler. 
> 
> Yes. You misunderstand me I think.

Yes, sorry I did misunderstood you.

>  Even if you use Chaperon etc to parse
> markup, there'd be no difficulty expressing the links that you found as
> an HTML page - one intended to be consumed by the CLI, not to be
> publically viewed.

Well in the case of css you want them as well publically viewed but I
got your point. ;)

>  In fact, if it were written to disc, forrest would
> probably delete it afterwards.
> 
> > Forrest actually is *not* aimed for html only support and one can think
> > of the situation that you want your site to be only txt (kind of a
> > book). Here you need to crawler the lex-rewriter outcome and follow the
> > links.
> 
> Hopefully I've shown that I had understood that already :-)

yeah ;)

> 
> > The current limitation of forrest regarding the crawler are IMO not
> > caused by the crawler design but rather by our (as in forrest) usage of
> > it.
> 
> Yep, fair enough. But if the CLI is going to survive the shift that is
> happening in Cocoon trunk, something big needs to be done by someone. It
> cannot survive in its current form as the code it uses is changing
> almost beyond recognition.
> 
> Heh, perhaps the Cocoon CLI should just be a Maven plugin.

...or forrest plugin. ;) This would makes it possible that cocoon, lenya
and forrest committer can help.

Kind of http://svn.apache.org/viewcvs.cgi/lenya/sandbox/doco/ ;)

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Upayavira <uv...@odoko.co.uk>.
Thorsten Scherler wrote:
> El lun, 03-04-2006 a las 09:00 +0100, Upayavira escribió:
>> David Crossley wrote:
>>> Upayavira wrote:
>>>> Sylvain Wallez wrote:
>>>>> Carsten Ziegeler wrote:
>>>>>> Sylvain Wallez wrote:
>>>>>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
>>>>>>> although the new crawler can be based on servlets, it will assume these
>>>>>>> servlets to answer to a ?cocoon-view=links :-)
>>>>>>>     
>>>>>> Hmm, I think we don't need the links view in this case anymore. A simple
>>>>>>  HTML crawler should be enough as it will follow all links on the page.
>>>>>> The view would only make sense in the case where you don't output html
>>>>>> where the usual crawler tools would not work.
>>>>>>   
>>>>> In the case of Forrest, you're probably right. Now the links view also
>>>>> allows to follow links in pipelines producing something that's not HTML,
>>>>> such as PDF, SVG, WML, etc.
>>>>>
>>>>> We have to decide if we want to loose this feature.
>>> I am not sure if we use this in Forrest. If not
>>> then we probably should be. 
>>>
>>>> In my view, the whole idea of crawling (i.e. gathering links from pages)
>>>> is suboptimal anyway. For example, some sites don't directly link to all
>>>> pages (e.g. they are accessed via javascript, or whatever) so you get
>>>> pages missed.
>>>>
>>>> Were I to code a new CLI, whilst I would support crawling I would mainly
>>>> configure the CLI to get the list of pages to visit by calling one or
>>>> more URLs. Those URLs would specify the pages to generate.
>>>>
>>>> Thus, Forrest would transform its site.xml file into this list of pages,
>>>> and drive the CLI via that.
>>> This is what we do do. We have a property
>>> "start-uri=linkmap.html"
>>> http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
>>> (we actually use corresponding xml of course).
>>>
>>> We define a few extra URIs in the Cocoon cli.xconf
>>>
>>> There are issues of course. Sometimes we want to
>>> include directories of files that are not referenced
>>> in site.xml navigation. For my sites i just use a
>>> DirectoryGenerator to build an index page which feeds
>>> the crawler. Sometime that technique is not sufficent.
>>>
>>> We also gather links from text files (e.g. CSS)
>>> using Chaperon. This works nicely but introduces
>>> some overhead.
>> This more or less confirms my suggested approach - allow crawling at the
>> 'end-point' HTML, but more importantly, use a page/URL to identify the
>> pages to be crawled. The interesting thing from what you say is that
>> this page could itself be nothing more than HTML.
> 
> Well, yes and not really, since e.g. Chaperon is text based and no
> markup. You need a lex-writer to generate links for the crawler. 

Yes. You misunderstand me I think. Even if you use Chaperon etc to parse
markup, there'd be no difficulty expressing the links that you found as
an HTML page - one intended to be consumed by the CLI, not to be
publically viewed. In fact, if it were written to disc, forrest would
probably delete it afterwards.

> Forrest actually is *not* aimed for html only support and one can think
> of the situation that you want your site to be only txt (kind of a
> book). Here you need to crawler the lex-rewriter outcome and follow the
> links.

Hopefully I've shown that I had understood that already :-)

> The current limitation of forrest regarding the crawler are IMO not
> caused by the crawler design but rather by our (as in forrest) usage of
> it.

Yep, fair enough. But if the CLI is going to survive the shift that is
happening in Cocoon trunk, something big needs to be done by someone. It
cannot survive in its current form as the code it uses is changing
almost beyond recognition.

Heh, perhaps the Cocoon CLI should just be a Maven plugin.

Upayavira

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Thorsten Scherler <th...@apache.org>.
El lun, 03-04-2006 a las 09:00 +0100, Upayavira escribió:
> David Crossley wrote:
> > Upayavira wrote:
> >> Sylvain Wallez wrote:
> >>> Carsten Ziegeler wrote:
> >>>> Sylvain Wallez wrote:
> >>>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
> >>>>> although the new crawler can be based on servlets, it will assume these
> >>>>> servlets to answer to a ?cocoon-view=links :-)
> >>>>>     
> >>>> Hmm, I think we don't need the links view in this case anymore. A simple
> >>>>  HTML crawler should be enough as it will follow all links on the page.
> >>>> The view would only make sense in the case where you don't output html
> >>>> where the usual crawler tools would not work.
> >>>>   
> >>> In the case of Forrest, you're probably right. Now the links view also
> >>> allows to follow links in pipelines producing something that's not HTML,
> >>> such as PDF, SVG, WML, etc.
> >>>
> >>> We have to decide if we want to loose this feature.
> > 
> > I am not sure if we use this in Forrest. If not
> > then we probably should be. 
> > 
> >> In my view, the whole idea of crawling (i.e. gathering links from pages)
> >> is suboptimal anyway. For example, some sites don't directly link to all
> >> pages (e.g. they are accessed via javascript, or whatever) so you get
> >> pages missed.
> >>
> >> Were I to code a new CLI, whilst I would support crawling I would mainly
> >> configure the CLI to get the list of pages to visit by calling one or
> >> more URLs. Those URLs would specify the pages to generate.
> >>
> >> Thus, Forrest would transform its site.xml file into this list of pages,
> >> and drive the CLI via that.
> > 
> > This is what we do do. We have a property
> > "start-uri=linkmap.html"
> > http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
> > (we actually use corresponding xml of course).
> > 
> > We define a few extra URIs in the Cocoon cli.xconf
> > 
> > There are issues of course. Sometimes we want to
> > include directories of files that are not referenced
> > in site.xml navigation. For my sites i just use a
> > DirectoryGenerator to build an index page which feeds
> > the crawler. Sometime that technique is not sufficent.
> > 
> > We also gather links from text files (e.g. CSS)
> > using Chaperon. This works nicely but introduces
> > some overhead.
> 
> This more or less confirms my suggested approach - allow crawling at the
> 'end-point' HTML, but more importantly, use a page/URL to identify the
> pages to be crawled. The interesting thing from what you say is that
> this page could itself be nothing more than HTML.

Well, yes and not really, since e.g. Chaperon is text based and no
markup. You need a lex-writer to generate links for the crawler. 

Forrest actually is *not* aimed for html only support and one can think
of the situation that you want your site to be only txt (kind of a
book). Here you need to crawler the lex-rewriter outcome and follow the
links.

The current limitation of forrest regarding the crawler are IMO not
caused by the crawler design but rather by our (as in forrest) usage of
it.

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Upayavira <uv...@odoko.co.uk>.
David Crossley wrote:
> Upayavira wrote:
>> Sylvain Wallez wrote:
>>> Carsten Ziegeler wrote:
>>>> Sylvain Wallez wrote:
>>>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
>>>>> although the new crawler can be based on servlets, it will assume these
>>>>> servlets to answer to a ?cocoon-view=links :-)
>>>>>     
>>>> Hmm, I think we don't need the links view in this case anymore. A simple
>>>>  HTML crawler should be enough as it will follow all links on the page.
>>>> The view would only make sense in the case where you don't output html
>>>> where the usual crawler tools would not work.
>>>>   
>>> In the case of Forrest, you're probably right. Now the links view also
>>> allows to follow links in pipelines producing something that's not HTML,
>>> such as PDF, SVG, WML, etc.
>>>
>>> We have to decide if we want to loose this feature.
> 
> I am not sure if we use this in Forrest. If not
> then we probably should be. 
> 
>> In my view, the whole idea of crawling (i.e. gathering links from pages)
>> is suboptimal anyway. For example, some sites don't directly link to all
>> pages (e.g. they are accessed via javascript, or whatever) so you get
>> pages missed.
>>
>> Were I to code a new CLI, whilst I would support crawling I would mainly
>> configure the CLI to get the list of pages to visit by calling one or
>> more URLs. Those URLs would specify the pages to generate.
>>
>> Thus, Forrest would transform its site.xml file into this list of pages,
>> and drive the CLI via that.
> 
> This is what we do do. We have a property
> "start-uri=linkmap.html"
> http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
> (we actually use corresponding xml of course).
> 
> We define a few extra URIs in the Cocoon cli.xconf
> 
> There are issues of course. Sometimes we want to
> include directories of files that are not referenced
> in site.xml navigation. For my sites i just use a
> DirectoryGenerator to build an index page which feeds
> the crawler. Sometime that technique is not sufficent.
> 
> We also gather links from text files (e.g. CSS)
> using Chaperon. This works nicely but introduces
> some overhead.

This more or less confirms my suggested approach - allow crawling at the
'end-point' HTML, but more importantly, use a page/URL to identify the
pages to be crawled. The interesting thing from what you say is that
this page could itself be nothing more than HTML.

Regards, Upayavira

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by David Crossley <cr...@apache.org>.
Upayavira wrote:
> Sylvain Wallez wrote:
> > Carsten Ziegeler wrote:
> >> Sylvain Wallez wrote:
> > 
> >>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
> >>> although the new crawler can be based on servlets, it will assume these
> >>> servlets to answer to a ?cocoon-view=links :-)
> >>>     
> >> Hmm, I think we don't need the links view in this case anymore. A simple
> >>  HTML crawler should be enough as it will follow all links on the page.
> >> The view would only make sense in the case where you don't output html
> >> where the usual crawler tools would not work.
> >>   
> > 
> > In the case of Forrest, you're probably right. Now the links view also
> > allows to follow links in pipelines producing something that's not HTML,
> > such as PDF, SVG, WML, etc.
> > 
> > We have to decide if we want to loose this feature.

I am not sure if we use this in Forrest. If not
then we probably should be. 

> In my view, the whole idea of crawling (i.e. gathering links from pages)
> is suboptimal anyway. For example, some sites don't directly link to all
> pages (e.g. they are accessed via javascript, or whatever) so you get
> pages missed.
> 
> Were I to code a new CLI, whilst I would support crawling I would mainly
> configure the CLI to get the list of pages to visit by calling one or
> more URLs. Those URLs would specify the pages to generate.
> 
> Thus, Forrest would transform its site.xml file into this list of pages,
> and drive the CLI via that.

This is what we do do. We have a property
"start-uri=linkmap.html"
http://forrest.zones.apache.org/ft/build/cocoon-docs/linkmap.html
(we actually use corresponding xml of course).

We define a few extra URIs in the Cocoon cli.xconf

There are issues of course. Sometimes we want to
include directories of files that are not referenced
in site.xml navigation. For my sites i just use a
DirectoryGenerator to build an index page which feeds
the crawler. Sometime that technique is not sufficent.

We also gather links from text files (e.g. CSS)
using Chaperon. This works nicely but introduces
some overhead.

-David

> Whilst gathering links from within pipelines is clever, it always struck
> me as awkward at the same time.
> 
> Regards, Upayavira

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Carsten Ziegeler <cz...@apache.org>.
Sylvain Wallez wrote:
> Carsten Ziegeler wrote:
> In the case of Forrest, you're probably right. Now the links view also
> allows to follow links in pipelines producing something that's not HTML,
> such as PDF, SVG, WML, etc.
Yepp.

> 
> We have to decide if we want to loose this feature.
Right. So the question is, if someone is using this feature :)

Carsten
-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Upayavira <uv...@odoko.co.uk>.
Sylvain Wallez wrote:
> Carsten Ziegeler wrote:
>> Sylvain Wallez wrote:
>>   
> 
>>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
>>> although the new crawler can be based on servlets, it will assume these
>>> servlets to answer to a ?cocoon-view=links :-)
>>>     
>> Hmm, I think we don't need the links view in this case anymore. A simple
>>  HTML crawler should be enough as it will follow all links on the page.
>> The view would only make sense in the case where you don't output html
>> where the usual crawler tools would not work.
>>   
> 
> In the case of Forrest, you're probably right. Now the links view also
> allows to follow links in pipelines producing something that's not HTML,
> such as PDF, SVG, WML, etc.
> 
> We have to decide if we want to loose this feature.

In my view, the whole idea of crawling (i.e. gathering links from pages)
is suboptimal anyway. For example, some sites don't directly link to all
pages (e.g. they are accessed via javascript, or whatever) so you get
pages missed.

Were I to code a new CLI, whilst I would support crawling I would mainly
configure the CLI to get the list of pages to visit by calling one or
more URLs. Those URLs would specify the pages to generate.

Thus, Forrest would transform its site.xml file into this list of pages,
and drive the CLI via that.

Whilst gathering links from within pipelines is clever, it always struck
me as awkward at the same time.

Regards, Upayavira


Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:
> Sylvain Wallez wrote:
>   

>> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
>> although the new crawler can be based on servlets, it will assume these
>> servlets to answer to a ?cocoon-view=links :-)
>>     
> Hmm, I think we don't need the links view in this case anymore. A simple
>  HTML crawler should be enough as it will follow all links on the page.
> The view would only make sense in the case where you don't output html
> where the usual crawler tools would not work.
>   

In the case of Forrest, you're probably right. Now the links view also
allows to follow links in pipelines producing something that's not HTML,
such as PDF, SVG, WML, etc.

We have to decide if we want to loose this feature.

Sylvain

-- 
Sylvain Wallez
http://bluxte.net
Apache Software Foundation Member


Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Carsten Ziegeler <cz...@apache.org>.
Sylvain Wallez wrote:
> Upayavira wrote:
> 
>> Ah, I wasn't getting that subtle. I was simply saying that I can agree
>> with using the servlet API for _all_ environments. The CLI becomes
>> nothing more than a custom servlet container that uses a servlet to
>> generate its pages.
>>
>> In fact, having said that, it becomes yet another tool that is actually
>> independent of Cocoon - it could be used to crawl pages generated by
>> _any_ servlet, not just the Cocoon one.
>>   
> 
> Hmm... the current CLI uses Cocoon's links view to crawl the website. So
> although the new crawler can be based on servlets, it will assume these
> servlets to answer to a ?cocoon-view=links :-)
> 
Hmm, I think we don't need the links view in this case anymore. A simple
 HTML crawler should be enough as it will follow all links on the page.
The view would only make sense in the case where you don't output html
where the usual crawler tools would not work.

Carsten

-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/

Re: A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Sylvain Wallez <sy...@apache.org>.
Upayavira wrote:

> Ah, I wasn't getting that subtle. I was simply saying that I can agree
> with using the servlet API for _all_ environments. The CLI becomes
> nothing more than a custom servlet container that uses a servlet to
> generate its pages.
>
> In fact, having said that, it becomes yet another tool that is actually
> independent of Cocoon - it could be used to crawl pages generated by
> _any_ servlet, not just the Cocoon one.
>   

Hmm... the current CLI uses Cocoon's links view to crawl the website. So
although the new crawler can be based on servlets, it will assume these
servlets to answer to a ?cocoon-view=links :-)

Sylvain

-- 
Sylvain Wallez
http://bluxte.net
Apache Software Foundation Member


A new CLI (was Re: [RT] The environment abstraction, part II)

Posted by Upayavira <uv...@odoko.co.uk>.
Carsten Ziegeler wrote:
> Upayavira wrote:
>> David Crossley wrote:
>>> Carsten Ziegeler wrote:
>>>> I can't speak for Daniel, but my idea/suggestion was to forget about the
>>>> different environments and let Cocoon always run in a servlet container.
>>>> The CLI would then be kind of a http client which starts up jetty and
>>>> then generates the site using http requests. This would simplify some
>>>> things in Cocoon, the question is if this would make the life of Forrest
>>>> too hard?
>>> Thanks to you all for the followup. I don't have a
>>> ready answer yet. Will make sure that the other
>>> Forrest people are aware.
>> In the end, it doesn't really matter that much, and will be up to
>> whoever volunteers to implement the new CLI.
> 
> It depends a little bit on how we see things. My opinion :) is to remove
> the environment
> abstraction completly and simply use the servlet environment while
> others might think that we should only base our environment abstraction
> on the servlet api but allow to run Cocoon in a different environment
> which provides *some* features of the servlet environment but not all.
> The difference might be subtle, but its not the same.

Ah, I wasn't getting that subtle. I was simply saying that I can agree
with using the servlet API for _all_ environments. The CLI becomes
nothing more than a custom servlet container that uses a servlet to
generate its pages.

In fact, having said that, it becomes yet another tool that is actually
independent of Cocoon - it could be used to crawl pages generated by
_any_ servlet, not just the Cocoon one.

Regards, Upayavira

Re: [RT] The environment abstraction, part II

Posted by Carsten Ziegeler <cz...@apache.org>.
Upayavira wrote:
> David Crossley wrote:
>> Carsten Ziegeler wrote:
>>> I can't speak for Daniel, but my idea/suggestion was to forget about the
>>> different environments and let Cocoon always run in a servlet container.
>>> The CLI would then be kind of a http client which starts up jetty and
>>> then generates the site using http requests. This would simplify some
>>> things in Cocoon, the question is if this would make the life of Forrest
>>> too hard?
>> Thanks to you all for the followup. I don't have a
>> ready answer yet. Will make sure that the other
>> Forrest people are aware.
> 
> In the end, it doesn't really matter that much, and will be up to
> whoever volunteers to implement the new CLI.

It depends a little bit on how we see things. My opinion :) is to remove
the environment
abstraction completly and simply use the servlet environment while
others might think that we should only base our environment abstraction
on the servlet api but allow to run Cocoon in a different environment
which provides *some* features of the servlet environment but not all.
The difference might be subtle, but its not the same.

Carsten
-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/

Re: [RT] The environment abstraction, part II

Posted by Upayavira <uv...@odoko.co.uk>.
David Crossley wrote:
> Carsten Ziegeler wrote:
>> I can't speak for Daniel, but my idea/suggestion was to forget about the
>> different environments and let Cocoon always run in a servlet container.
>> The CLI would then be kind of a http client which starts up jetty and
>> then generates the site using http requests. This would simplify some
>> things in Cocoon, the question is if this would make the life of Forrest
>> too hard?
> 
> Thanks to you all for the followup. I don't have a
> ready answer yet. Will make sure that the other
> Forrest people are aware.

In the end, it doesn't really matter that much, and will be up to
whoever volunteers to implement the new CLI.

Having said that, I think it makes sense for the CLI to have its own,
minimal servlet container. It could just use Jetty, but it would be
better/faster if the container didn't serve over http, i.e. require an
http client too.

Regards, Upayavira

Re: [RT] The environment abstraction, part II

Posted by Ross Gardler <rg...@apache.org>.
David Crossley wrote:
> Carsten Ziegeler wrote:
> 
>>I can't speak for Daniel, but my idea/suggestion was to forget about the
>>different environments and let Cocoon always run in a servlet container.
>>The CLI would then be kind of a http client which starts up jetty and
>>then generates the site using http requests. This would simplify some
>>things in Cocoon, the question is if this would make the life of Forrest
>>too hard?
> 
> 
> Thanks to you all for the followup. I don't have a
> ready answer yet. Will make sure that the other
> Forrest people are aware.

Which David has done, with a post to the Forrest list - you may see a 
few of us post now.

I've reviewed this thread, in my opinion and *only* from a Forrest 
perspective I feel it is unimportant how we generate the static site 
from Forrest. To be honest, I don't really care if we end up bundling a 
building/bundling a small crawler along the lines of wget in order to do 
the static generation (although it is great we won't need to).

It looks to me like the proposals here will make life inside Cocoon much 
easier. This will, undoubtedly, have a knock on effect for projects like 
Forrest. I'm sure we will have to go through a period of pain before 
reaping the rewards - but I believe it will be worth it.

---

Speaking in a from a wider perspective. I would ask one important 
question "how many other users, besides Forrest, make heavy use of the 
CLI and would they be damaged by this proposal?"

I would *guess* this is a low number since Cocoon is a *web* framework.

Ross

Re: [RT] The environment abstraction, part II

Posted by Thorsten Scherler <th...@apache.org>.
El vie, 31-03-2006 a las 14:32 +1100, David Crossley escribió:
> Carsten Ziegeler wrote:
> >
> > I can't speak for Daniel, but my idea/suggestion was to forget about the
> > different environments and let Cocoon always run in a servlet container.
> > The CLI would then be kind of a http client which starts up jetty and
> > then generates the site using http requests. This would simplify some
> > things in Cocoon, the question is if this would make the life of Forrest
> > too hard?
> 
> Thanks to you all for the followup. I don't have a
> ready answer yet. Will make sure that the other
> Forrest people are aware.
> 

El mié, 29-03-2006 a las 22:54 +0200, Daniel Fagerstrom escribió:
> 
> So the current CLI is a minimal command line Processor container, we 
> could have a minimal command line Servlet container instead.

I do not see a problem. I guess noone has to even notice the differents.
There is http://jakarta.apache.org/commons/httpclient/ I do not see a
problem for forrest.

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Re: [RT] The environment abstraction, part II

Posted by David Crossley <cr...@apache.org>.
Carsten Ziegeler wrote:
>
> I can't speak for Daniel, but my idea/suggestion was to forget about the
> different environments and let Cocoon always run in a servlet container.
> The CLI would then be kind of a http client which starts up jetty and
> then generates the site using http requests. This would simplify some
> things in Cocoon, the question is if this would make the life of Forrest
> too hard?

Thanks to you all for the followup. I don't have a
ready answer yet. Will make sure that the other
Forrest people are aware.

-David

Re: [RT] The environment abstraction, part II

Posted by Upayavira <uv...@odoko.co.uk>.
Daniel Fagerstrom wrote:
> Carsten Ziegeler skrev:
>> David Crossley wrote:
>>
>>  
>>> Hi Daniel, sorry i cannot understand that last sentence.
>>> Would you please re-phrase it.
>>>
>>> We currently have the three ways:
>>>
>>> 'forrest run'
>>> Starts its packaged Jetty and uses Forrest/Cocoon as a webapp.
>>>
>>> 'forrest war'
>>> Builds a projectName.war ready for deployment in a full Jetty
>>> or Tomcat.
>>>
>>> 'forrest site'
>>> Calls the Cocoon CLI to generate a static set of docs.
>>>
>>>     
>> I can't speak for Daniel, but my idea/suggestion was to forget about the
>> different environments and let Cocoon always run in a servlet container.
>> The CLI would then be kind of a http client which starts up jetty and
>> then generates the site using http requests. This would simplify some
>> things in Cocoon, the question is if this would make the life of Forrest
>> too hard?
>>   
> In Cocoon today, the Cocoon object that implements the Processor
> interface is in some way the top level interface against "Cocoon
> functionality". Then the CocoonServlet and the CLI booth sets up and use
> the Cocoon object. When creating the blocks fw, Processor didn't work as
> abstraction as it contains lots of tree processor specifics. So I
> decided to use the Servlet and javax.servlet.http set of  interfaces
> instead (as discussed on the list a couple of times). This means that
> the CLI in it current state (working against the Processor interface)
> doesn't work with the blocks fw. So the CLI needs to be refactored so
> that it works with a Servlet rather than a Processor.
> 
> To some extent this is actually an advantage as the CocoonServlet and
> the CLI has a lot of overlap and the servlet part has been maintained
> and developed while the CLI part hasn't. By using Servlet as the "top
> level" interface of Cocoon the CLI will be much smaller and reuse more
> of the Servlet work.
> 
> Back to your question, my incomprehensible sentence was an answer to
> something like what Carsten propose above. In many cases I agree with
> Carsten that it makes most sense to run Cocoon in a full servlet
> container but in some cases, e.g. testing and for a minimal OSGi setup,
> it makes IMO sense to have a really  light weight and minimal servlet
> container instead. I have built a such one for creating the servlet
> environment needed for running a servlet within a block and make it
> possible for it to communicate with other blocks. It is also used for
> the block protocol (the block correspondence to the Cocoon protocol). We
> could reuse part of this for the CLI.
> 
> So the current CLI is a minimal command line Processor container, we
> could have a minimal command line Servlet container instead.

This makes complete sense to me and is exactly how I would have proposed
implementing it.

Upayavira

Re: [RT] The environment abstraction, part II

Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Carsten Ziegeler skrev:
> David Crossley wrote:
>
>   
>> Hi Daniel, sorry i cannot understand that last sentence.
>> Would you please re-phrase it.
>>
>> We currently have the three ways:
>>
>> 'forrest run'
>> Starts its packaged Jetty and uses Forrest/Cocoon as a webapp.
>>
>> 'forrest war'
>> Builds a projectName.war ready for deployment in a full Jetty
>> or Tomcat.
>>
>> 'forrest site'
>> Calls the Cocoon CLI to generate a static set of docs.
>>
>>     
> I can't speak for Daniel, but my idea/suggestion was to forget about the
> different environments and let Cocoon always run in a servlet container.
> The CLI would then be kind of a http client which starts up jetty and
> then generates the site using http requests. This would simplify some
> things in Cocoon, the question is if this would make the life of Forrest
> too hard?
>   
In Cocoon today, the Cocoon object that implements the Processor 
interface is in some way the top level interface against "Cocoon 
functionality". Then the CocoonServlet and the CLI booth sets up and use 
the Cocoon object. When creating the blocks fw, Processor didn't work as 
abstraction as it contains lots of tree processor specifics. So I 
decided to use the Servlet and javax.servlet.http set of  interfaces 
instead (as discussed on the list a couple of times). This means that 
the CLI in it current state (working against the Processor interface) 
doesn't work with the blocks fw. So the CLI needs to be refactored so 
that it works with a Servlet rather than a Processor.

To some extent this is actually an advantage as the CocoonServlet and 
the CLI has a lot of overlap and the servlet part has been maintained 
and developed while the CLI part hasn't. By using Servlet as the "top 
level" interface of Cocoon the CLI will be much smaller and reuse more 
of the Servlet work.

Back to your question, my incomprehensible sentence was an answer to 
something like what Carsten propose above. In many cases I agree with 
Carsten that it makes most sense to run Cocoon in a full servlet 
container but in some cases, e.g. testing and for a minimal OSGi setup, 
it makes IMO sense to have a really  light weight and minimal servlet 
container instead. I have built a such one for creating the servlet 
environment needed for running a servlet within a block and make it 
possible for it to communicate with other blocks. It is also used for 
the block protocol (the block correspondence to the Cocoon protocol). We 
could reuse part of this for the CLI.

So the current CLI is a minimal command line Processor container, we 
could have a minimal command line Servlet container instead.

WDYT?

/Daniel


Re: [RT] The environment abstraction, part II

Posted by Carsten Ziegeler <cz...@apache.org>.
David Crossley wrote:

> Hi Daniel, sorry i cannot understand that last sentence.
> Would you please re-phrase it.
> 
> We currently have the three ways:
> 
> 'forrest run'
> Starts its packaged Jetty and uses Forrest/Cocoon as a webapp.
> 
> 'forrest war'
> Builds a projectName.war ready for deployment in a full Jetty
> or Tomcat.
> 
> 'forrest site'
> Calls the Cocoon CLI to generate a static set of docs.
> 
I can't speak for Daniel, but my idea/suggestion was to forget about the
different environments and let Cocoon always run in a servlet container.
The CLI would then be kind of a http client which starts up jetty and
then generates the site using http requests. This would simplify some
things in Cocoon, the question is if this would make the life of Forrest
too hard?

Carsten
-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/