You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Tony Collen <tc...@neuagency.com> on 2003/04/18 18:56:37 UTC

[RT] Decentralized Content Pipelines

Here's some guy's [RT] about flowing RSS and other information
around the Net through a decentralized system of filters.  Sounds like a
gigantic P2P Cocoon :)

http://radio.weblogs.com/0108971/2003/04/11.html#a154




Tony

--
Tony Collen
ICQ: 12410567
--
Cocoon: Internet Glue (A Cocoon Weblog)
http://manero.org/weblog/
--

Re: [RT] Decentralized Content Pipelines

Posted by Stefano Mazzocchi <st...@apache.org>.

on 4/18/03 6:56 PM Tony Collen wrote:

> Here's some guy's [RT] about flowing RSS and other information
> around the Net through a decentralized system of filters.  Sounds like a
> gigantic P2P Cocoon :)
> 
> http://radio.weblogs.com/0108971/2003/04/11.html#a154

He wants to send the routing information *within* the pipeline itself.
This is the infamous reactor pattern that cocoon 1.x used.

Cocoon 2.x has shown that such an pattern can be considered harmful, but
cocoon is not a distributed environment.

I admit I've thought about a distributed p2p pipeline system, but it
always smelled like FS to me. Why? well, while I do see the reason for a
request/response web service, I don't see the reason for a
request/process/forward type of processing where several servers are
involved.

Consider

 a ---> b ---> c ---> a

where

a sends the invoice to b which fills it with its own data and forward it
to c which then does some processing and bounces back.

Assuming that you can trust all parties involved (and this is a *big*
if) how would you debug the above if you don't have access to the
intermediate streams?

The above is transformed into

 a ----> b ----> a ----> c -----> a

in general, for N processing steps involved, a pipeline approach
requires at least N+1 transmissions, while the direct request/response
equivalent requires at least 2*N transmissions.

It is obvious that a pipeline system would be much more efficient in
terms of network consumption, but in terms of usage in reality, such a
system is simply too weak because it depends on *all* behaviors being
respected and without information flowing thru.

It is true that email, for example, works using a decentralized pipeline
system. Same is true for HTTP proxying. Still, the contracts between
those system don't involve messing up significantly with the payload,
but only with the headers and in a much fault-tollerant way.

Any serious web service is very likely to mess around with your data
rather drastically. Increasing the chance of a schema failure
exponentially with the payload size raised to the power of the number of
stages involved.

The chance of your data coming back as you expected it are, well, very
scarce :-) and you don't even have a way to understand what went wrong
and at what stage, unless each stage sends back to you a report before
sending to the next stage, and at that point, the number of
transmissions required becomes 2*N, just like the direct
request-response topology.

The conclusions of all the above are left as an exercise to the reader :-)

-- 
Stefano.