You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Vadim Gritsenko <va...@verizon.net> on 2003/07/01 20:47:35 UTC

Re: Link view goodness (Re: residuals of MIME type bug ?)

Jeff Turner wrote:

>I'm not very familiar with the code; is there some cost in keeping the
>two-pass CLI alive, in the faint hope that caching comes to its rescue
>one day?
>

Guys,

Before you implement some approach here... Let me suggest something.

Right now sitemap implementation automatically adds link gatherer to the 
pipeline when it is invoked by CLI. This link gatherer is in fact is 
"hard-coded links view". I suggest to replace this "hard-coded links 
view" a.k.a link gatherer with the "real" links view, BUT attach it as a 
tee to a main pipeline instead of running it as a pipeline by itself. As 
a result, links view "baby" will be used, two-pass "water" will be 
drained, and sitemap syntax will stay the same. Moreover, the links view 
will be still accessible from the outside, meaning that you can spider 
the site using out-of-the-process spiders.

Example:
Given the pipeline:
  G --> T1 (label="content") --> T2 --> S,

And the links view:
  from-label="content" --> T3 --> LinkSerializer,

The pipeline built for the CLI request should be:
  G --> T1 --> Tee --> T2 --> S --> OutputStream
                 \
                   --> LinkSerializer --> NullOutputStream
                           \
                             --> List of links in environment

In one request, you will get:
 * Regular output of the pipeline which will go to the destination Source
 * List of links in the environment which is what link gatherer was made for

Comments?

Vadim

Re: Link view goodness (Re: residuals of MIME type bug ?)

Posted by Upayavira <uv...@upaya.co.uk>.

On 1 Jul 2003 at 14:47, Vadim Gritsenko wrote:

> Jeff Turner wrote:
> 
> >I'm not very familiar with the code; is there some cost in keeping
> >the two-pass CLI alive, in the faint hope that caching comes to its
> >rescue one day?
> >
> 
> Guys,
> 
> Before you implement some approach here... Let me suggest something.
> 
> Right now sitemap implementation automatically adds link gatherer to
> the pipeline when it is invoked by CLI. This link gatherer is in fact
> is "hard-coded links view". I suggest to replace this "hard-coded
> links view" a.k.a link gatherer with the "real" links view, BUT attach
> it as a tee to a main pipeline instead of running it as a pipeline by
> itself. As a result, links view "baby" will be used, two-pass "water"
> will be drained, and sitemap syntax will stay the same. Moreover, the
> links view will be still accessible from the outside, meaning that you
> can spider the site using out-of-the-process spiders.
> 
> Example:
> Given the pipeline:
>   G --> T1 (label="content") --> T2 --> S,
> 
> And the links view:
>   from-label="content" --> T3 --> LinkSerializer,
> 
> The pipeline built for the CLI request should be:
>   G --> T1 --> Tee --> T2 --> S --> OutputStream
>                  \
>                    --> LinkSerializer --> NullOutputStream
>                            \
>                              --> List of links in environment
> 
> In one request, you will get:
>  * Regular output of the pipeline which will go to the destination
>  Source * List of links in the environment which is what link gatherer
>  was made for

Splendid. I think that is exactly what I would want to do. We'd then have single(ish) 
pass generation with the benefits of link view. And if you just feed directly from the 
label into a serializer, it'll be pretty much the same in terms of performance as the 
LinkGatherer that we have now.

I would need help implementing this. Are you able to explain how?

There's a lot of pipeline building there that I wouldn't yet know how to do (but I'm 
willing to give it a go with guidance).

If we're to use my current approach, we'd add a different serializer at the end of the 
second sub-pipe, which would take the links and put them into a specific List in the 
ObjectModel. In fact, we could create a LinkGatheringOutputStream that'd be handed 
to the LinkSerializer to do that. That would leave most of the complexity simply in 
building the pipeline.

Can you guarantee that cocoon.process() will not complete until both sub-pipelines 
have completed their work?

I'll take a bit of a look into the pipeline building code (if I can find it) to see what I can 
work out.

This approach excites me. With help, I'd like to see if I can make it happen.

Regards, Upayavira