You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Upayavira <uv...@upaya.co.uk> on 2003/08/14 17:02:58 UTC

Extending the Bean (non-HTML)

[here's a non-HTML version - mailer misbehaved again :-( ]

In another message I mentioned I've done a lot of reworking of the bean/CLI. I  
thought I'd mention what I've done so far, and what I have planned. Doesn't really  
qualify for [RT] status, as it doesn't seem all that random to me! 

As to the reworking, I've: 

 * split the bean into a CocoonWrapper that handles configuring a Cocoon object  
and handling a single request, and a CocoonBean which handles crawling 

 * Made the CocoonBean use a Crawler class (derived from the one in the  
scratchpad Ant task)  

 * Moved all of the URI logic (mangling URIs etc) into the Target class 

 * made it report the time taken to generate a single page 

Next I want to: 

 * moving the member variables of the wrapper and bean into a Context object, so  
that the Bean can be used in a ThreadSafe environment. 

 * rework the way the bean is configured (possibly using Configuration objects) 

 * improve reporting so that it reports pages generated, time taken per page, the  
links found in a page, stack trace from errors, pages that contain broken links, and  
more. 

 * Make this reporting use SAX (to a file), so that in future it can be the basis of a  
publishing service 

 * Get caching working properly, and make it use ifModifiedSince() to determine  
whether to save the file or not. 

 * Build a simple Ant task to replace Main.java for ant driven processes 

 * Make Cocoon work with an external Cocoon object, again for the sake of a  
PublishingService 

 * replace the contents of the cli.xconf file with correct settings for generating  
documentation from the built webapp, keeping the documentation system working 

 * implement exclude/include, a la Ant in the cli.xconf  

 * make it configurable as to which pages are scanned for links (why generate  
/docs/logo.gif?cocoon-view=links)? 

 * work out how to implement Vadim's idea for a single pipeline with an  
XMLTeePipe to generate both a link view and page view in one hit 

 * improve the cli.xconf format to be more flexible, e.g: generate multiple pages to  
a single destination, and to have links followed on some pages but not others, etc 

Phew. More than I thought! And there's more I haven't mentioned... 

Regards, Upayavira

Re: Extending the Bean (non-HTML)

Posted by Vadim Gritsenko <va...@verizon.net>.

Upayavira wrote:

>Vadim wrote:
>
>  
>
>>>* split the bean into a CocoonWrapper that handles configuring a
>>>Cocoon object  
>>>and handling a single request, and a CocoonBean which handles
>>>crawling
>>>
>>>      
>>>
>>What is the API of these new beans? Please do not forget that
>>CocoonBean is out of the door with 2.1 release and people (might be)
>>already building applications with CocoonBean, meaning, you can't
>>change CocoonBean API in backward incompatible way without proper
>>deprecating and support of released functionality.
>>    
>>
>
>But we did document that the API of the bean was unstable. Doesn't that mean we 
>can change the API where necessary?
>

Ah, in this case we can. Unfortunately, class's Javadoc does not has 
this indication.


>Of course we should minimise it as much as 
>possible. Therefore, I'll redo what I've done so far, being more thorough about 
>ensuring compatibility.
>
>I'm sure I can manage the split into two classes (which I think greatly aids clarity) 
>without breaking any interfaces.
>

Sounds good.


>>>* Made the CocoonBean use a Crawler class (derived from the one in
>>>the  
>>>scratchpad Ant task)
>>>      
>>>
>>Do you mean org.apache.cocoon.components.crawler.Crawler? I don't see
>>how it can be used in CocoonBean. Can you elaborate?
>>    
>>
>
>No. There's a scratchpad Ant task which has its own crawler. I used that.
>

CocoonCrawling.java? :)


>I'd like to 
>use o.a.c.components.crawler.Crawler, but I couldn't see how to do it, because it has 
>its own link gathering code built into it.
>

It's purely for crawling external sites via URL.
...


>>>Next I want to: 
>>>
>>>* moving the member variables of the wrapper and bean into a Context
>>>object, so  
>>>that the Bean can be used in a ThreadSafe environment.
>>>      
>>>
>>AFAIU, CocoonBean.processURI is already thread safe. All addTarget()
>>methods are obviously not. addTarget() methods can easily be made
>>threadsafe (in some sense -- call to addTarget in one thread does not
>>break bean but affects process() running in another thread) by
>>synchronyzing access to the targets collection. It can be thread safe
>>in another sense too (calls to processTargets in different threads are
>>independent of each other): you just need to add
>>processTargets(targets) method.
>>    
>>
>
>All of the crawler data is in member variables that will be shared between threads. 
>Therefore processTargets(targets) wouldn't in itself be enough.
>
>I can add a crawler in which encapsulates the necessary data. Then a 
>processTargets(targets) could be threadsafe.
>

Agreed.


>>>* rework the way the bean is configured (possibly using
>>>Configuration objects)
>>>      
>>>
>>Why would you need those Configuration objects?
>>    
>>
>
>Er. Good point :-)
>
>I'll stick with what we've got until we've got a good reason to change it. (The original, 
>now redundant, reason for this was to share xconf reading code between Main.java 
>and an Ant class, but that isn't really possible as far as I can see).
>

:)

...

>>>* Get caching working properly, and make it use ifModifiedSince() to
>>>determine  
>>>whether to save the file or not.
>>>      
>>>
>>Must-have feature. Top priority. I hope you've seen my emails on
>>persistent store subject.
>>    
>>
>
>I certainly did. I got your code, and downloaded and compiled the latest Excalibur 
>Store. Unfortunately, on first tests, the CLI seems to have actually got slower. I did 
>those tests without stepping through the code, so I've got to check out more of what's 
>going on. I agree this is a top priority. I guess I just got a little downhearted at those 
>results and needed a few days to recover my enthusiasm!
>
>  
>
>>>* Build a simple Ant task to replace Main.java for ant driven
>>>processes
>>>      
>>>
>>Good.
>>
>>    
>>
>>>* Make Cocoon work with an external Cocoon object, again for the
>>>sake of a  
>>>PublishingService
>>>      
>>>
>>I don't get this. What Cocoon with which external Cocoon?
>>    
>>
>
>This is something that Unico talked about in relation to a publishing service running 
>within a Cocoon servlet. Again, I'll wait until we've got an actual plan for such a 
>service.
>

Ah, I see. But there, you will have to go over the wire, as Crawler 
does. Right?


>>>* replace the contents of the cli.xconf file with correct settings
>>>for generating  
>>>documentation from the built webapp, keeping the documentation system
>>>working
>>>      
>>>
>>Don't know what you mean.
>>    
>>
>
>At the moment, $COCOON/cli.xconf is set up for use by the documentation building 
>system (in build/cocoon-x.x/documentation/). That is a very specific use, and thus 
>should have a cli.xconf of its own (if that system is still required). The cli.xconf in the 
>root should, IMO, show how to generate sites from within build/webapp, for example 
>generating from the documentation that is in build/webapp/docs/. That would be 
>much more sensible for users trying to work out how to use a cli.xconf to configure 
>the CLI.
>

Got it.
...


>>>* work out how to implement Vadim's idea for a single pipeline with
>>>an  
>>>XMLTeePipe to generate both a link view and page view in one hit
>>>      
>>>
>>Yep. Should increase performance and conformance!
>>    
>>
>
>I've spent some time trying to work out how to do this. It seems quite complicated. As 
>each pipeline, when built, is made up of generator, set of translators and serializer, to 
>build a pipeline which splits into two, one half completing normally and the other going 
>off into a separate 'link-view' pipeline, would require a specifically built Pipeline class, 
>and would require changes to the treeprocessor to be able to build it. Am I right, or do 
>you know of a simpler way?
>

You are right. As currently sitemap implementation adds link gatherer 
automagically, in the same way links view should be automagically 
assembled and attached at the branch point.
...


>>>Phew. More than I thought! And there's more I haven't mentioned...
>>>      
>>>
>>I'm scared! :)
>>    
>>
>
>No need to worry, I'm going to follow your incremental steps idea, so you'll see it all 
>as it comes along :-)
>
>Thanks for taking the time to reply. I appreciate it.
>  
>

Thanks, and you are welcome.

Vadim

Re: Extending the Bean (non-HTML)

Posted by Upayavira <uv...@upaya.co.uk>.

Vadim wrote:

> OT: Have you tried mozilla mail client?

Installing as we speak.

> > * split the bean into a CocoonWrapper that handles configuring a
> > Cocoon object  
> >and handling a single request, and a CocoonBean which handles
> >crawling 
> >
> 
> What is the API of these new beans? Please do not forget that
> CocoonBean is out of the door with 2.1 release and people (might be)
> already building applications with CocoonBean, meaning, you can't
> change CocoonBean API in backward incompatible way without proper
> deprecating and support of released functionality.

But we did document that the API of the bean was unstable. Doesn't that mean we 
can change the API where necessary? Of course we should minimise it as much as 
possible. Therefore, I'll redo what I've done so far, being more thorough about 
ensuring compatibility.

I'm sure I can manage the split into two classes (which I think greatly aids clarity) 
without breaking any interfaces.

> > * Made the CocoonBean use a Crawler class (derived from the one in
> > the  
> >scratchpad Ant task)
> 
> Do you mean org.apache.cocoon.components.crawler.Crawler? I don't see
> how it can be used in CocoonBean. Can you elaborate?

No. There's a scratchpad Ant task which has its own crawler. I used that. I'd like to 
use o.a.c.components.crawler.Crawler, but I couldn't see how to do it, because it has 
its own link gathering code built into it.

> > * Moved all of the URI logic (mangling URIs etc) into the Target
> > class
> 
> Sounds good.
> 
> > * made it report the time taken to generate a single page
> 
> Ok.
>
> >Next I want to: 
> >
> > * moving the member variables of the wrapper and bean into a Context
> > object, so  
> >that the Bean can be used in a ThreadSafe environment.
> 
> AFAIU, CocoonBean.processURI is already thread safe. All addTarget()
> methods are obviously not. addTarget() methods can easily be made
> threadsafe (in some sense -- call to addTarget in one thread does not
> break bean but affects process() running in another thread) by
> synchronyzing access to the targets collection. It can be thread safe
> in another sense too (calls to processTargets in different threads are
> independent of each other): you just need to add
> processTargets(targets) method.

All of the crawler data is in member variables that will be shared between threads. 
Therefore processTargets(targets) wouldn't in itself be enough.

I can add a crawler in which encapsulates the necessary data. Then a 
processTargets(targets) could be threadsafe.

> > * rework the way the bean is configured (possibly using
> > Configuration objects)
> 
> Why would you need those Configuration objects?

Er. Good point :-)

I'll stick with what we've got until we've got a good reason to change it. (The original, 
now redundant, reason for this was to share xconf reading code between Main.java 
and an Ant class, but that isn't really possible as far as I can see).

> > * improve reporting so that it reports pages generated, time taken
> > per page, the  
> >links found in a page, stack trace from errors, pages that contain
> >broken links, and  more.
>
> Ok.
>
> >  * Make this reporting use SAX (to a file), so that in future it can
> >  be the basis of a  
> >publishing service
> 
> I think that's overkill. Especially writing to the file part. Extend
> BeanListener interface if you like, implement FileBeanListener if you
> need, but I don't think SAX is really what you need here.

Again, I'll leave this until I have a real need.

> > * Get caching working properly, and make it use ifModifiedSince() to
> > determine  
> >whether to save the file or not.
> 
> Must-have feature. Top priority. I hope you've seen my emails on
> persistent store subject.

I certainly did. I got your code, and downloaded and compiled the latest Excalibur 
Store. Unfortunately, on first tests, the CLI seems to have actually got slower. I did 
those tests without stepping through the code, so I've got to check out more of what's 
going on. I agree this is a top priority. I guess I just got a little downhearted at those 
results and needed a few days to recover my enthusiasm!

> > * Build a simple Ant task to replace Main.java for ant driven
> > processes
>
> Good.
> 
> > * Make Cocoon work with an external Cocoon object, again for the
> > sake of a  
> >PublishingService
> 
> I don't get this. What Cocoon with which external Cocoon?

This is something that Unico talked about in relation to a publishing service running 
within a Cocoon servlet. Again, I'll wait until we've got an actual plan for such a 
service.

> > * replace the contents of the cli.xconf file with correct settings
> > for generating  
> >documentation from the built webapp, keeping the documentation system
> >working
> 
> Don't know what you mean.

At the moment, $COCOON/cli.xconf is set up for use by the documentation building 
system (in build/cocoon-x.x/documentation/). That is a very specific use, and thus 
should have a cli.xconf of its own (if that system is still required). The cli.xconf in the 
root should, IMO, show how to generate sites from within build/webapp, for example 
generating from the documentation that is in build/webapp/docs/. That would be 
much more sensible for users trying to work out how to use a cli.xconf to configure 
the CLI.

> > * implement exclude/include, a la Ant in the cli.xconf
> 
> Ok.
> 
> > * make it configurable as to which pages are scanned for links (why
> > generate  
> >/docs/logo.gif?cocoon-view=links)?
> 
> Set of extensions which are not quieried for the links (configuration
> parameter don't-follow-links=gif, jpg, png)?

Exactly.

> > * work out how to implement Vadim's idea for a single pipeline with
> > an  
> >XMLTeePipe to generate both a link view and page view in one hit
> 
> Yep. Should increase performance and conformance!

I've spent some time trying to work out how to do this. It seems quite complicated. As 
each pipeline, when built, is made up of generator, set of translators and serializer, to 
build a pipeline which splits into two, one half completing normally and the other going 
off into a separate 'link-view' pipeline, would require a specifically built Pipeline class, 
and would require changes to the treeprocessor to be able to build it. Am I right, or do 
you know of a simpler way?

> > * improve the cli.xconf format to be more flexible, e.g: generate
> > multiple pages to  
> >a single destination, and to have links followed on some pages but
> >not others, etc
> 
> Ok.
> 
> >Phew. More than I thought! And there's more I haven't mentioned...
> 
> I'm scared! :)

No need to worry, I'm going to follow your incremental steps idea, so you'll see it all 
as it comes along :-)

Thanks for taking the time to reply. I appreciate it.

Regards, Upayavira

Re: Extending the Bean (non-HTML)

Posted by Vadim Gritsenko <va...@verizon.net>.

Upayavira wrote:

>[here's a non-HTML version - mailer misbehaved again :-( ]
>

OT: Have you tried mozilla mail client?

> * split the bean into a CocoonWrapper that handles configuring a Cocoon object  
>and handling a single request, and a CocoonBean which handles crawling 
>

What is the API of these new beans? Please do not forget that CocoonBean 
is out of the door with 2.1 release and people (might be) already 
building applications with CocoonBean, meaning, you can't change 
CocoonBean API in backward incompatible way without proper deprecating 
and support of released functionality.

> * Made the CocoonBean use a Crawler class (derived from the one in the  
>scratchpad Ant task)
>

Do you mean org.apache.cocoon.components.crawler.Crawler? I don't see 
how it can be used in CocoonBean. Can you elaborate?

> * Moved all of the URI logic (mangling URIs etc) into the Target class
>

Sounds good.

> * made it report the time taken to generate a single page
>

Ok.

>Next I want to: 
>
> * moving the member variables of the wrapper and bean into a Context object, so  
>that the Bean can be used in a ThreadSafe environment.
>

AFAIU, CocoonBean.processURI is already thread safe. All addTarget() 
methods are obviously not. addTarget() methods can easily be made 
threadsafe (in some sense -- call to addTarget in one thread does not 
break bean but affects process() running in another thread) by 
synchronyzing access to the targets collection. It can be thread safe in 
another sense too (calls to processTargets in different threads are 
independent of each other): you just need to add processTargets(targets) 
method.

> * rework the way the bean is configured (possibly using Configuration objects)
>

Why would you need those Configuration objects?

> * improve reporting so that it reports pages generated, time taken per page, the  
>links found in a page, stack trace from errors, pages that contain broken links, and  
>more.
>

Ok.

>  * Make this reporting use SAX (to a file), so that in future it can be the basis of a  
>publishing service
>

I think that's overkill. Especially writing to the file part. Extend 
BeanListener interface if you like, implement FileBeanListener if you 
need, but I don't think SAX is really what you need here.

> * Get caching working properly, and make it use ifModifiedSince() to determine  
>whether to save the file or not.
>

Must-have feature. Top priority. I hope you've seen my emails on 
persistent store subject.

> * Build a simple Ant task to replace Main.java for ant driven processes
>

Good.

> * Make Cocoon work with an external Cocoon object, again for the sake of a  
>PublishingService
>

I don't get this. What Cocoon with which external Cocoon?

> * replace the contents of the cli.xconf file with correct settings for generating  
>documentation from the built webapp, keeping the documentation system working
>

Don't know what you mean.

> * implement exclude/include, a la Ant in the cli.xconf
>

Ok.

> * make it configurable as to which pages are scanned for links (why generate  
>/docs/logo.gif?cocoon-view=links)?
>

Set of extensions which are not quieried for the links (configuration 
parameter don't-follow-links=gif, jpg, png)?

> * work out how to implement Vadim's idea for a single pipeline with an  
>XMLTeePipe to generate both a link view and page view in one hit
>

Yep. Should increase performance and conformance!

> * improve the cli.xconf format to be more flexible, e.g: generate multiple pages to  
>a single destination, and to have links followed on some pages but not others, etc
>

Ok.

>Phew. More than I thought! And there's more I haven't mentioned...
>

I'm scared! :)

Vadim