You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Daniel Fagerstrom <da...@nada.kth.se> on 2002/12/17 01:43:58 UTC
[RT] Input Pipelines (long)
Input Pipelines
===============
There is, IMO, a need for better support for input handling in
Cocoon. I believe that the introduction of "input pipelines" can be an
important step in this direction. In the rest of this (long) RT I will
discuss use cases for them, a possible definition of input pipelines,
compare them with the existing pipeline concept in Cocoon (henceforth
called output pipelines), discuss what kind of components that would
be useful in them, how they can be used in the sitemap and from
flowscripts, and also relate them to the current discussion about how
to reuse functionality "Cocoon services" between blocks.
Use cases
---------
There is an ongoing trend of packaging all kinds of application as web
applications or to decompose them as sets of web services. At the same
time web browsers are more and more becoming a universal GUI for all
kinds of applications (e.g. XUL).
This leads to an increasing need for handling of structured input data
in web applications. SOAP might be the most important example, we also
have XML-RPC and most certainly numerous home brewn formats, some might
even be binary non-xml legacy formats. WebDAV is another example of
xml-input, and next generation form handling, XForms, use xml as
transport format.
As people are building more and more advanced Cocoon-systems there is
also a growing need for reusing functionality in a structured way,
there have been discussions about how to package and reuse "Cocoon
services" in the context of blocks [1] and [2]. Here there is also a
need for handling xml-input.
The company I work for build data warehouses, some of our customer are
starting to get interested in using the functionality of the data
warehouses, not only from the the web interfaces that we usually build
but also as parts of their own webapps. This means that we want,
besides Cocoons flexibility in presenting data in different forms,
also flexibility in asking for the data through different input
formats.
There is thus a world of input beyond the request parameters, and a
world of rapidly growing importance.
Does Cocoon support the abovementioned use cases? Yes and no: there
are numerous components that implements SOAP, WebDAV, parts of XForms
etc. But while the components designed for publishing are highly
reusable in various context, this is not the case for input
components. IMO the reason for this is that Cocoon as a framework does
not have much support for input handling.
IMO Cocoon could be as good in handling input as it currently is in
creating output, by reusing exactly the same concept: pipelines. We
can however not use the existing "output pipelines" as is, there are
some assymetries in their design that makes them unsuitable for input.
The term "input pipeline" has sometimes been used on the list, it is
time to try to define what it could be.
What is an Input Pipeline
-------------------------
An input pipeline typically starts by reading octet data from the
input stream of the request object. The input data could be xml, tab
separated data, text that is structured according to a certain
grammar, binary legacy formats like Excel or Word or anything else
that could be translated to xml. The first step in the input pipeline
is an adapter from octet data to a sax events. This sounds quite
similar to a generator, we will return to this in the next session.
The structure of the xml from the first step in the pipeline might not
be in a form that is suitable for the data model that we would like to
use internally in the system. Reasons for this can be that the xml
input is supposed to follow some standard or some customer defined
format. Input adapters for legacy formats will probably produce xml
that is similar to the input format and repeat all kinds of
idiosyncrasies from that format. There is thus a need to transform the
input xml to an xml format more suited to our application specific
needs. One or several xslt-transformer steps would therefore be
useful in the input pipeline.
As a last step in the input pipeline the sax events should be adapted
to some binary format so that e.g. the business logic in the system
can be applied to it. The xml input could e.g. be serialized to an
octet stream for storage in a file (as text, xml, pdf, images, ...),
transformed to java objects for storage in the session object, be put
into an xml db or into an relational db.
Isn't this exactly what an output pipeline does?
Comparison to Output Pipelines
------------------------------
Booth an input and an output pipeline consists of a an adaptor from
a binary format to sax events followed by a (possibly empty) sequence
of transformers that take sax events as input as well as output. The
last step is an adaptor from sax events to a binary format. The main
difference (and the one I will focus on) is how the binary input and
output is connected to the pipeline.
Let us look at an example of an output pipeline:
<match pattern="*.html"/>
<generate type="xml" src="{1}.xml"/>
<transform type="xsl" src="foo.xsl"/>
<serialize type="html"/>
</match>
The input to the pipeline is controlled from the sitemap by the src
attribute in the generator, while the output from the serializer can't
be controlled from the sitemap, the context in which the sitemap is
used is responsible for directing the output to an appropriate
place. If the pipeline is used from a servlet, the output will be
directed to the output stream of the response object in the serlet. If
it is used from the command line, the output will be redirected to a
file. If it is used in the cocoon: protocol the output will be
redirected to be used as input from the src attribute of e.g. a
generator or a transformer (cf. with Carstens and mine writings in
[1] about the semantics of the cocoon: protocol).
Here is another example:
<match pattern="bar.pdf"/>
<generate type="xsp" src="bar.xsp"/>
<transform type="xsl" src="foo.xsl"/>
<serialize type="pdf"/>
</match>
In this case the binary input is taken from the object model and the
component manager in Cocoon and the input file to the generator,
"bar.xsp" describes how to extract the input and how to structure it
as an xml document.
If we compare a Cocoon output pipeline with a unix pipeline, it always
ignore standard input and always write to standard output. An input
pipeline would be the opposite: it would always read from standard
input and ignore standard output. In Cocoon this would mean that the
input source would be set by the context. In a servlet, input would be
taken from the input stream of the request object. We could also have
a writable cocoon: protocol where the input stream would be set by the
user of the protocol, more about that later, (see also my post in the
thread [1]).
An example:
<match pattern="**.xls"/>
<generate type="xls"/>
<transform type="xsl" src="foo.xsl"/>
<serialize type="xml" dest="context://repository/{1}.xml"/>
</match>
Here the generator reads an Excel document from the input stream that
is submitted by the context, and translate it to some xml format. The
serializer write its xml input in the file system. I reused the names
generator and serializer partly because I didn't found any good names
(deserializer is the inverse to serializer, but what is the inverse of
a generator?), and partly because it IMO would be the best solution if
the generator and serializer from output pipelines can be extended to
be usable in input pipelines as well. Several of the existing
generators would be highly usable in input pipelines if they were
modified in such a way that they read from "standard input" when no
src attribute is given. There are also some serializers that would be
usefull in the input pipelines as well, in this case the output stream
given i the dest attribute should be used instead of the one that is
supplied by the context. It can of course be problematic to extend the
definition of generators anda serializers as it might lead to back
compabillity problems.
Another example of an input pipeline:
<match pattern="in"/>
<generate type="textparser">
<parameter name="grammar" value="example.txt"/>
</generate>
<transform type="xsl" src="foo.xsl"/>
<serialize type="xsp" src="toSql.xsp"/>
</match>
In this example the serializer modify the content of components that
can be found from the object model and the component manager. We use a
hypothetical "output xsp" language to describe how to modify the
environment. Such a language could be a little bit like xslt in the
sense that it recursively applies templates (rules) with matching
xpath patterns. But the template would contain custom tags that have
side effects instead of just emitting xml. Could such a language be
implemented in Jelly? It would be useful with custom tags that modify
the session object, that writes to sql databases, connect with business
logic and so on.
Error Handling
--------------
Error handling in input pipelines is even more important than in
output pipelines: We must protect the system against non well formed
input and the user must be given detailed enough information about
whats wrong, while they in many cases has no access to log files or
access to the internals of the system.
Examples of things that can go wrong is that the input not is parsable
or that it isn't valid with respect to some grammar or scheme. If we
want input pipelines to work in streaming mode, without unnecessary
buffering, it is impossible to know that the input data is correct until all
of it is processed. This means that serializer might already have
stored some parts of the pipeline data when an error is detected. I
think that serializers where faulty input data would be unacceptable,
should use some kind of transactions and that they should be notified
when something goes wrong earlier in the pipeline so that they are
able to roll back the transaction.
I have not studied the error handling system in Cocoon, maybe there
already are mechanisms that could be used in input pipelines as well?
In Sitemaps
-----------
In a sitemap an input pipeline could be used e.g. for implementing a
web service:
<match pattern="myservice">
<generate type="xml">
<parameter name="scheme" value="myInputFormat.scm"/>
</generate>
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
<serialize type="dom-session" non-terminating="true">
<parameter name="dom-name" value="input"/>
</serialize>
<select type="pipeline-state">
<when test="success">
<act type="my-business-logic"/>
<generate type="xsp" src="collectTheResult.xsp"/>
<serialize type="xml"/>
</when>
<when test="non-valid">
<!-- produce an error document -->
</when>
</select>
</match>
Here we have first an input pipeline that reads and validates xml
input, transforms it to some appropriate format and store the result
as a dom-tree in a session attribute. A serializer normally means that
the pipeline should be executed and thereafter an exit from the
sitemap. I used the attribute non-terminating="true", to mark that
the input pipeline should be executed but that there is more to do in
the sitemap afterwards.
After the input pipeline there is a selector that select the output
pipeline depending of if the input pipeline succeed or not. This use
of selection have some relation to the discussion about pipe-aware
selection (see [3] and the references therein). It would solve at
least my main use cases for pipe-aware selection, without having its
drawbacks: Stefano considered pipe-aware selection mix of concern,
selection should be based on meta data (pipeline state) rather than on
data (pipeline content). There were also some people who didn't like
my use of buffering of all input to the pipe-aware selector. IMO the
use of selectors above solves booth of these issues.
The output pipeline start with an action that takes care about the
business logic for the application. This is IMHO a more legitimate use
for actions than the current mix of input handling and business logic.
In Flowscripts
--------------
IIRC the discussion and examples of input for flowscripts this far has
mainly dealed with request parameter based input. If we want to use
flowscripts for describing e.g. web service flow, more advanced input
handling is needed. IMO it would be an excelent SOC to use output
pipelines for the presentation of the data used in the system, input
pipelines for going from input to system data, java objects (or some
other programming language) for describing business logic working on
the data within the system, and flowscripts for connecting all this in
an appropriate temporal order.
For Reuseability Between Blocks
-------------------------------
There have been some discussions about how to reuse functionality
between blocks in Cocoon (see the threads [1] and [2] for
background). IMO (cf. my post in the thread [1]), a natural way of
exporting pipeline functionality is by extending the cocoon pseudo
protocol, so that it accepts input as well as produces output. The
protocol should also be extended so that input as well as output can
be any octet stream, not just xml.
If we extend generators so that their input can be set by the
environment (as proposed in the discussion about input pipelines), we
have what is needed for creating a writable cocoon protocol. The web
service example in the section "In Sitemaps" could also be used as an
internal service, exported from a block.
Booth input and output for the extended cocoon protocol can be booth
xml and non-xml, this give us 4 cases:
xml input, xml output: could be used from a "pipeline"-transformer,
the input to the transformer is redirected to the protocol and the
output from the protocol is redirected to the output of the
transformer.
non-xml input, xml output: could be used from a generator.
xml input, non-xml output: could be used from a serializer.
non-xml input, non-xml output: could be used from a reader if the
input is ignored, from a "writer" if the output is ignored and from a
"reader-writer", if booth are used.
Generators that accepts xml should of course also accept sax-events
for efficiency reasons, and serializer that produces xml should of the
same reason also be able to produce sax-events.
Conclusion
----------
The ability to handle structured input (e.g. xml) in a convenient way,
will probably be an important requirement on webapp frameworks in the
near future.
By removing the asymmetry between generators and serializers, by letting
the input of a generator be set by the context and the output of a
serializer be set from the sitemap, Cocoon could IMO be as good in
handling input as it is today in producing output.
This would also make it possible to introduce a writable as well as
readable Cocoon pseudo protocol, that would be a good way to export
functionality from blocks.
There are of course many open questions, e.g. how to implement those
ideas without introducing to much back incompability.
What do you think?
/Daniel Fagerstrom
References
----------
[1] [RT] Using pipeline as sitemap components (long)
http://marc.theaimsgroup.com/?t=103787330400001&r=1&w=2
[2] [RT] reconsidering pipeline semantics
http://marc.theaimsgroup.com/?t=102562575200001&r=2&w=2
[3] [Contribution] Pipe-aware selection
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines: Storage and Selection (was Re: [RT] Input
Pipelines (long))
Posted by Stefano Mazzocchi <st...@apache.org>.
Sorry for taking me so long.
Daniel Fagerstrom wrote:
> The discussion about input pipelines can be divided in two parts:
> 1. Improving the handling of the input stream in Cocoon. This is needed
> for web services, it is also needed for making it possible to implement
> a writable cocoon:-protocol, something that IMO would be very useful for
> reusing functionality in Cocoon, especially from blocks.
>
> 2. The second part of the proposal is to use two pipelines, executed in
> sequence, to respond to input in Cocoon. The first pipeline (called
> input pipeline) is responsible for reading the input and from request
> parameters or from the input stream, transform it to an appropriate
> format and store it in e.g. a session parameter, a file or a db. After
> the input pipeline there is an ordinary (output) pipeline that is
> responsible for generating the response. The output pipeline is executed
> after that the execution of the input pipeline is completed, as a
> consequence actions and selections in the output pipeline can be
> dependent e.g. on if the handling of input succeeded or not and on the
> data that was stored by the input pipeline.
>
> Here I will focus on your comments on the second part of the proposal.
Ok.
I'm leaving a bunch of stuff uncut because I don't know where to cut the
context.
> > Daniel Fagerstrom wrote:
> <snip/>
> >> In Sitemaps
> >> -----------
> >>
> >> In a sitemap an input pipeline could be used e.g. for implementing a
> >> web service:
> >>
> >> <match pattern="myservice">
> >> <generate type="xml">
> >> <parameter name="scheme" value="myInputFormat.scm"/>
> >> </generate>
> >> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> >> <serialize type="dom-session" non-terminating="true">
> >> <parameter name="dom-name" value="input"/>
> >> </serialize>
> >> <select type="pipeline-state">
> >> <when test="success">
> >> <act type="my-business-logic"/>
> >> <generate type="xsp" src="collectTheResult.xsp"/>
> >> <serialize type="xml"/>
> >> </when>
> >> <when test="non-valid">
> >> <!-- produce an error document -->
> >> </when>
> >> </select>
> >> </match>
> >>
> >> Here we have first an input pipeline that reads and validates xml
> >> input, transforms it to some appropriate format and store the result
> >> as a dom-tree in a session attribute. A serializer normally means that
> >> the pipeline should be executed and thereafter an exit from the
> >> sitemap. I used the attribute non-terminating="true", to mark that
> >> the input pipeline should be executed but that there is more to do in
> >> the sitemap afterwards.
> >>
> >> After the input pipeline there is a selector that select the output
> >> pipeline depending of if the input pipeline succeed or not. This use
> >> of selection have some relation to the discussion about pipe-aware
> >> selection (see [3] and the references therein). It would solve at
> >> least my main use cases for pipe-aware selection, without having its
> >> drawbacks: Stefano considered pipe-aware selection mix of concern,
> >> selection should be based on meta data (pipeline state) rather than on
> >> data (pipeline content). There were also some people who didn't like
> >> my use of buffering of all input to the pipe-aware selector. IMO the
> >> use of selectors above solves booth of these issues.
> >>
> >> The output pipeline start with an action that takes care about the
> >> business logic for the application. This is IMHO a more legitimate use
> >> for actions than the current mix of input handling and business logic.
> >
> >
> > Wouldn't the following pipeline achieve the same functionality you want
> > without requiring changes to the architecture?
> >
> > <match pattern="myservice">
> > <generate type="payload"/>
> > <transform type="validator">
> > <parameter name="scheme" value="myInputFormat.scm"/>
> > </transform>
> > <select type="pipeline-state">
> > <when test="valid">
> > <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> > <transform type="my-business-logic"/>
> > <serialize type="xml"/>
> > </when>
> > <otherwise>
> > <!-- produce an error document -->
> > </otherwise>
> > </select>
> > </match>
>
> Yes, it would achieve about the same functionality as I want and it
> could easily be implemented with the help of the small extensions of the
> sitemap interpreter that I implemented for pipe aware selection [3].
>
> I think it could be interesting to do a detailed comparison between the
> differences in our proposals: How the input stream and validation is
> handled, how the selection based on pipeline state is performed, if
> storage of the input is done in a serializer or in a transformer, and
> how the new output is created.
Ok, let's go.
> Input Stream
> ------------
>
> For input stream handling you used
>
> <generate type="payload"/>
>
> Is the payload generator equivalent to the StreamGenerator? Or does it
> something more, like switching parser depending on mime type for the
> input stream?
I really don't think this is important. We are basically discussing if
the current sitemap architecture is good enough for what you want.
Once the Cocoon Environment is more balanced toward input, you can have
a uber-payload-generator that does everything and brews beer, or you can
have your own small personal generator that does what you want.
My point was: why asking for two pipelines when you can do the same
thing with one?
> I used
>
> <generate type="xml"/>
>
> The idea is that if no src attribute is given the sitemap interpreter
> automatically connect the generator to the input stream of the
> environment (the input stream from the http request in the servlet case,
> in other cases it is more unclear). This behavior was inspired by the
> handling of std input in unix pipelines.
Hmmm, interesting concept indeed, but I wonder if it's really meaninful
in our context. I mean, maybe there are generators that don't need src
and don't rely on input. But an idiotic TimeGenerator is the only one I
can think of... and that really doesn't stand up as an argument, does it?
> Nicola Ken proposed:
>
> <generate type="xml" src="inputstream://"/>
>
> I prefer this solution compared to mine as it doesn't require any change
> of the sitemap interpreter, I also believe that it it easier to
> understand as it is more explicit. It also (as Nicola Ken has explained)
> gives a good SoC, the uri in the src attribute describes where to read
> the resource from, e.g. input stream, file, cvs, http, ftp, etc and the
> generator is responsible for how to parse the resource. If we develop a
> input stream protocol, all the work invested in the existing generators,
> can immediately be reused in web services.
It is true that reduces the number of required generators. But there is
something about this that disturbs me even if I can't really tell you
what it is rationally... hmmm...
> Validation
> ----------
>
> Should validation be part of the parsing of input as in:
>
> <generate type="xml">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </generate>
>
> or should it be a separate transformation step:
>
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
>
> or maybe the responsibility of the protocol as Nicola Ken proposed in
> one of his posts:
>
> <generate type="xml" src="inputstream:myInputFormat.scm"/>
>
> This is not a question about architecture but rather one about finding
> "best practices".
>
> I don't think validation should be part of the protocol.
I disagree. Quite strongly, actually. Consider xinclude or any xml
expansion that changes the stream infoset. You could have valid
templates and valid fragments and still have invalid results (namespaces
make the whole thing very tricky... and in the future we'll need the
ability to mix tons of them, think FO+SVG+MathML for a normal example)
Now, if our xml-processing architecture is balanced enough, people might
want to use xinclude transformers to juice-up their SOAP-processing
pipelines. At that point, where do you validate?
Keeping the validation at a separate level helps because:
1) validation becomes explicit and infoset-transparent, in the spirit
of RelaxNG.
2) multiple validation is possible (in the spirit of Xpipe)
3) pipeline authors are more aware of validation issues as pipeline
processing stages.
> It means that
> the protocol has to take care of the parsing and that would mumble the
> SoC where the protocol is responsible for locating and delivering the
> stream and the generator is responsible for parsing it, that Nicola Ken
> have argued for in his other posts.
Well, the problem is that relating the concept of validation to the
concept of parsing and infoset production/augmentation is a *MISTAKE*
that the XML specification perpetuated from the SGML days.
Please, let's stop it once for all. Putting validation as an implicit
stage of parsing would set us back at least 5 years in markup
technologies design.
> Should validation be part of the generator or a transform step? I don't
> know.
Transformation, for the simple reason that you might need to validate a
pipeline more than once.
> If the input not is xml as for the ParserGenerator, I guess that
> the validation must take place in the generator. If the xml parser
> validates the input as a part of the parsing it is more practical to let
> the generator be responsible for validation (IIRC Xerces2 has an
> internal pipeline structure and performs validation in a transformer
> like way, so for Xerces2 it would probably be as efficient to do
> validation in a transformer as in a generator).
Note that the fact of including the *location* of a schema inside a
document is another huge mistake perpetuated because XML failed to
describe schema catalogs.
A document should indicate what "type" of document it is (something like
the public DTD identifier) and let the system find out *how* to validate
that document type.
> Otherwise it seem to
> give better SoC to separate the parsing and the validation step, so that
> we can have one validation transformer for each scheme language.
No, if the description of the document is done properly (NOTE: even
JClark hasn't still figured out a way to address the issue )
I would do it like this
<?xml version="1.0"?>
<document xml:type="http://apache.org/document/1.1/">
...
</document>
and then it's up to the processor to understand how to validate a
document type indicated by that URI.
NOTE: it's not a namespace URI, but an indentifier for the type of
document that we are using. Of course, the same identifier can be used
in both cases. For example
<?xml version="1.0"?>
<d:document
xml:type="http://apache.org/document/1.1/"
xmlns:d="http://apache.org/document/1.1/">
...
</d:document>
> In some cases it might be practical to augment the xml document with
> error information to be able to give more exact user feedback on where
> the errors are located. For such applications it seem more natural to me
> to have validation in a transformer.
>
> A question that might have architectural consequences is how the
> validation step should report validation errors.
Agreed.
> If the input is not
> parseable at all there is not much more to do than throwing an exception
> and letting the ordinary internal error handler report the situation. If
> some of the elements or attributes in the input has the wrong type we
> probably want to return more detailed feedback than just the internal
> error page. Some possible validation error report mechanisms are:
> storing an error report object in the environment e.g. in the object
> model, augmenting the xml document with error reporting attributes or
> elements, throwing an exception object that contains a detailed error
> description object or a combination of some of these mechanisms.
>
> Mixing data and state information was considered to be a bad practice in
> the discussion about pipe-aware selection (se references in [3]), that
> rules out using only augmentation of the xml document as error reporting
> mechanism. Throwing an exeption would AFAIU lead to difficulties in
> giving customized error reports. So I believe it would be best to put
> some kind of state describing object in the environment and possibly
> combine this whith augmentation of the xml document.
Yes, that would be my assumption too. And in case there is the need to
incorporate those validation mistakes back into the content, a
transformer (maybe even an XSLT stylesheet) can do that.
This seems the cleanest solution to me.
> Pipe State Dependent Selection
> ------------------------------
>
> For selecting response based on if the input document is valid or not
> you suggest the following:
>
> ...
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
> <select type="pipeline-state">
> <when test="valid">
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> ...
>
> As I mentioned earlier this could easily be implemented with the
> "pipe-aware selection" code I submitted in [3]. Let us see how it would
> work:
>
> The PipelineStateSelector can not be executed at pipeline construction
> time as for ordinary selectors.
Gosh, you're right, I didn't think about that.
> The pipeline before the selector
> including the ValidatorTransformer must have been executed before the
> selection is performed. This can be implemented by letting the
> PipelineStateSelector implement a special marker interface, say
> PipelineStateAware, so that it can have special treatment in the
> selection part of the sitemap interpreter.
yes
> When the sitemap interpreter gets a PipelineStateAware selector it first
> ends the currently constructed pipeline with a serializer that store its
> sax input in e.g. a dom-tree and the pipeline is processed and the dom
> tree thith the cashed result is stored in e.g. the object model. In the
> next step the selector is executed and it can base its decision on
> result from the first part of the pipeline. If the ValidationTransformer
> puts a validation result descriptor in the object model, the
> PipelineStateSelector can perform tests on this result descriptor. In
> the last step a new pipeline is constructed where the generator reads
> from the stored dom tree, and in the example above, the first
> transformer will be an XSLTransformer.
we are reaching the point where pipeline selection cannot be processed
"a-priori" but must include information on the run-time environment.
As much as I didn't like pipe-aware selection, I do agree that
validation-aware selection is a special pipe-aware selection but it *IS*
very important and must be taken in to consideration.
Hmmm, this kinda shades a totally different light on the concept of
selection. (which has an interesting side effect in making selectors and
matchers even more different than they are today).
> An alternative and more explicit way to describe the pipeline state
> dependent selection above, is:
>
> ...
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
> <serialize type="object-model-dom" non-terminating="true">
> <parameter name="name" value="validated-input"/>
> </serialize>
> <select type="pipeline-state">
> <when test="valid">
> <generate type="object-model-dom">
> <parameter name="name" value="validated-input"/>
> </generate>
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> ...
>
> Here the extensions to the current Cocoon semantics is put in the
> serializer instead of the selector. The sitemap interpreter treats a
> non-terminating serializer as ordinary serializer in the sense that it
> puts the serializer in the end of the current pipeline and executes it.
> The difference is that it instead of returning to the caller of the
> sitemap interpreter, it creates a new current pipeline and continue to
> interpret the component after the serializer, in this case a selector.
> The sitemap interpreter will also ignore the output stream of the
> serializer, the serializer is suposed to have side effects. The new
> current pipeline will then get a ObjectModelDOMGenerator as generator
> and an XSLTTransformer as its first transformer.
No, I'm sorry but I don't like this. I totally don't like the abuse of
serialiers for this concept of 'intermetiate-non-sax-stream' components.
It's potentially very dangerous, I see an incredible potential for abuse.
What do others think about this concept of pipelining pipelines? isn't
this kind of recursion the mark of FS?
> I prefer this construction compared to the more implicit one because it
> is more obvious what it does and also as it gives more freedom about how
> to store the user input.
True, but it also gives people more ability to abuse the system. Think
about internal pipelines, and views, and resources and aggregation...
have you thought about all the potential uses of these pipeline
pipelining on all current sitemap usecases?
you are, in fact, proposing a *MAJOR* change in the way the pipelines
are setup. In short, more freedom and less pipeline granularity... but
sometimes it's good to make it harder for them to come up with
something... so they *THINK* about it.
Maybe I'm being too conservative, but I'm very afraid of all those
unplanned (and unwanted) changes that these new chained pipelines could
produce... (besides, how do you stop them from wanting more than two
pipelines? should we? would you also like to chain a pipeline with a
reader and then another pipeline?)
> Some people seem to prefer to store user input
> in Java beans, in some applications session parameters might be a better
> place then the object model.
I've seen the ugliest sitemaps coming out of exactly that concept of
storing everything in the sitemap and then parsing it back into the
pipeline... believe me, it's more abused than used correctly as it is
right now.
>
> Pipelines with Side Effects
> ---------------------------
>
> A common pattern in pipelines that handle input (at least in the
> application that I write) is that the first half of the pipeline takes
> care of the input and ends with a transformer that stores the input. The
> transformer can be e.g. the SQLTransformer (with insert or update
> statements), the WriteDOMSessionTransformer, the
> SourceWritingTransformer. These transformers has side effects, they
> store something, and returns an xml document that tells if it succeeded
> or not. A conclusion from the threads about pipe aware selection was
> that sending meta data, like if the operation succeeded or not, in the
> pipeline is a bad practice and especially that we don't should allow
> selection based on such content. Given that these transformers basically
> translate xml input to a binary format and generates an xml output that
> we are supposed to ignore, it would IMO be more natural to see them as
> some kind of serializer.
>
> The next half of the pipeline creates the response, here it is less
> obvious what transformer to use. I normally use an XSLTTransformer and
> typically ignore its input stream and only create an xml document that
> is rendered into e.g. html in a sub sequent transformer.
>
> I think that it would be more natural to replace the pattern:
>
> ...
> <transform type="store something, return state info"/>
> <transform type="create a response document, ignore input"/>
> ...
>
> with
>
> ...
> <serialize type="store something, put state info in the environment"
> non-terminating="true"/>
> <generate type="create a response document" src="response document"/>
> ...
>
> If we give the serializer a destination attribute as well, all the
> existing serializers could be used for storing input in files etc as well.
>
> ...
> <serialize type="xml" dest="xmldb://..." non-terminating="true"/>
Now, let me ask you something: how much have you been playing with the
FlowScript?
A while ago I proposed the ability to call a pipeline from the
flowscript but specifying the outputstream that the serializer should
use. Basically, the flow now can use a pipeline as a tool to do stuff
without necessarely be tied to the client.
In all your discussion you have been placing a bunch of flow logic (how
to move from one pipeline to the next) into the sitemap. I'd suggest to
move it where it belongs (the flow) and let the sitemap do its job
(defining pipelines that others can use).
Why? well, while the concept of stateless output is inherently
declerative, the concept of stateless input + output is declarative for
the match and procedural for its internals.
So, I wonder, why don't we leave the declarative part to the sitemaps
and use the flow as our procedural glue?
> ...
>
> This would give the same SoC that i argued in favour of in the context
> of input: The serializer is responsible for how to serialize from xml to
> the binary data format and the destination is responsible for where to
> store the data.
This can be achieved with a flow method that includes a way to specific
the output stream (or a WriteableSource, probably better) that the
serializer has to use.
> Conclusion
> ----------
>
> I am afraid that I put more question than I answer in this RT. Many of
> them are of "best practice" character, and do not have any architectural
> consequences, and does not have to be answered right now. There are
> however some questions that need an answer:
>
> How should pipeline components, like the validation transformer, report
> state information? Placing some kind of state object in the object model
> would be one possibility, but I don't know.
The real problem is not where to store the data, IMO, but the fact that
you showed that there is a serious need for run-time selection that
can't be addressed with our today's architecture.
> We seem to agree about that there is a need for selection in pipelines
> based on the state of the computation in the pipeline that precedes the
> selection.
Yes. I finally got to this conclusion.
> Here we have two proposals:
>
> 1. Introduce pipeline state aware selectors (e.g. by letting the
> selector implement a marker interface), and give such selectors special
> treatment in the sitemap interpreter.
>
> 2. Extend the semantics of serializers so that the sitemap interpreter
> can continue to interpret the sitemap after a serializer, (e.g. by a new
> non-terminating attribute for serializers).
>
> I prefer the second proposal.
I prefer the first :)
> Booth proposals can be implemented with no back compatibility problems
> at all by requiring the selectors or serializer that need the extended
> semantics, to implement a special marker interface, and by adding code
> that reacts on the marker interface in the sitemap interpreter.
Yes, I see that.
> To use serializers more generally for storing things, as I propsed
> above, the Serializer interface would need to extend the
> SitemapModelComponent interface.
Don't know about that. I like serializers the way they are, but I'd like
to be able to detach them from the client output stream but using the
flowscript.
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
[RT] Input Pipelines: Storage and Selection (was Re: [RT] Input Pipelines
(long))
Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Stefano Mazzocchi wrote:
> Hmmm, maybe deep architectural discussions are good during holydays
> seasons... we'll see :)
Not for me, I've been away from computers for a while. But you and
Nicola Ken seem to have had an interesting discussion :)
The discussion about input pipelines can be divided in two parts:
1. Improving the handling of the input stream in Cocoon. This is needed
for web services, it is also needed for making it possible to implement
a writable cocoon:-protocol, something that IMO would be very useful for
reusing functionality in Cocoon, especially from blocks.
2. The second part of the proposal is to use two pipelines, executed in
sequence, to respond to input in Cocoon. The first pipeline (called
input pipeline) is responsible for reading the input and from request
parameters or from the input stream, transform it to an appropriate
format and store it in e.g. a session parameter, a file or a db. After
the input pipeline there is an ordinary (output) pipeline that is
responsible for generating the response. The output pipeline is executed
after that the execution of the input pipeline is completed, as a
consequence actions and selections in the output pipeline can be
dependent e.g. on if the handling of input succeeded or not and on the
data that was stored by the input pipeline.
Here I will focus on your comments on the second part of the proposal.
> Daniel Fagerstrom wrote:
<snip/>
>> In Sitemaps
>> -----------
>>
>> In a sitemap an input pipeline could be used e.g. for implementing a
>> web service:
>>
>> <match pattern="myservice">
>> <generate type="xml">
>> <parameter name="scheme" value="myInputFormat.scm"/>
>> </generate>
>> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>> <serialize type="dom-session" non-terminating="true">
>> <parameter name="dom-name" value="input"/>
>> </serialize>
>> <select type="pipeline-state">
>> <when test="success">
>> <act type="my-business-logic"/>
>> <generate type="xsp" src="collectTheResult.xsp"/>
>> <serialize type="xml"/>
>> </when>
>> <when test="non-valid">
>> <!-- produce an error document -->
>> </when>
>> </select>
>> </match>
>>
>> Here we have first an input pipeline that reads and validates xml
>> input, transforms it to some appropriate format and store the result
>> as a dom-tree in a session attribute. A serializer normally means that
>> the pipeline should be executed and thereafter an exit from the
>> sitemap. I used the attribute non-terminating="true", to mark that
>> the input pipeline should be executed but that there is more to do in
>> the sitemap afterwards.
>>
>> After the input pipeline there is a selector that select the output
>> pipeline depending of if the input pipeline succeed or not. This use
>> of selection have some relation to the discussion about pipe-aware
>> selection (see [3] and the references therein). It would solve at
>> least my main use cases for pipe-aware selection, without having its
>> drawbacks: Stefano considered pipe-aware selection mix of concern,
>> selection should be based on meta data (pipeline state) rather than on
>> data (pipeline content). There were also some people who didn't like
>> my use of buffering of all input to the pipe-aware selector. IMO the
>> use of selectors above solves booth of these issues.
>>
>> The output pipeline start with an action that takes care about the
>> business logic for the application. This is IMHO a more legitimate use
>> for actions than the current mix of input handling and business logic.
>
>
> Wouldn't the following pipeline achieve the same functionality you want
> without requiring changes to the architecture?
>
> <match pattern="myservice">
> <generate type="payload"/>
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
> <select type="pipeline-state">
> <when test="valid">
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> <transform type="my-business-logic"/>
> <serialize type="xml"/>
> </when>
> <otherwise>
> <!-- produce an error document -->
> </otherwise>
> </select>
> </match>
Yes, it would achieve about the same functionality as I want and it
could easily be implemented with the help of the small extensions of the
sitemap interpreter that I implemented for pipe aware selection [3].
I think it could be interesting to do a detailed comparison between the
differences in our proposals: How the input stream and validation is
handled, how the selection based on pipeline state is performed, if
storage of the input is done in a serializer or in a transformer, and
how the new output is created.
Input Stream
------------
For input stream handling you used
<generate type="payload"/>
Is the payload generator equivalent to the StreamGenerator? Or does it
something more, like switching parser depending on mime type for the
input stream?
I used
<generate type="xml"/>
The idea is that if no src attribute is given the sitemap interpreter
automatically connect the generator to the input stream of the
environment (the input stream from the http request in the servlet case,
in other cases it is more unclear). This behavior was inspired by the
handling of std input in unix pipelines.
Nicola Ken proposed:
<generate type="xml" src="inputstream://"/>
I prefer this solution compared to mine as it doesn't require any change
of the sitemap interpreter, I also believe that it it easier to
understand as it is more explicit. It also (as Nicola Ken has explained)
gives a good SoC, the uri in the src attribute describes where to read
the resource from, e.g. input stream, file, cvs, http, ftp, etc and the
generator is responsible for how to parse the resource. If we develop a
input stream protocol, all the work invested in the existing generators,
can immediately be reused in web services.
Validation
----------
Should validation be part of the parsing of input as in:
<generate type="xml">
<parameter name="scheme" value="myInputFormat.scm"/>
</generate>
or should it be a separate transformation step:
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
or maybe the responsibility of the protocol as Nicola Ken proposed in
one of his posts:
<generate type="xml" src="inputstream:myInputFormat.scm"/>
This is not a question about architecture but rather one about finding
"best practices".
I don't think validation should be part of the protocol. It means that
the protocol has to take care of the parsing and that would mumble the
SoC where the protocol is responsible for locating and delivering the
stream and the generator is responsible for parsing it, that Nicola Ken
have argued for in his other posts.
Should validation be part of the generator or a transform step? I don't
know. If the input not is xml as for the ParserGenerator, I guess that
the validation must take place in the generator. If the xml parser
validates the input as a part of the parsing it is more practical to let
the generator be responsible for validation (IIRC Xerces2 has an
internal pipeline structure and performs validation in a transformer
like way, so for Xerces2 it would probably be as efficient to do
validation in a transformer as in a generator). Otherwise it seem to
give better SoC to separate the parsing and the validation step, so that
we can have one validation transformer for each scheme language.
In some cases it might be practical to augment the xml document with
error information to be able to give more exact user feedback on where
the errors are located. For such applications it seem more natural to me
to have validation in a transformer.
A question that might have architectural consequences is how the
validation step should report validation errors. If the input is not
parseable at all there is not much more to do than throwing an exception
and letting the ordinary internal error handler report the situation. If
some of the elements or attributes in the input has the wrong type we
probably want to return more detailed feedback than just the internal
error page. Some possible validation error report mechanisms are:
storing an error report object in the environment e.g. in the object
model, augmenting the xml document with error reporting attributes or
elements, throwing an exception object that contains a detailed error
description object or a combination of some of these mechanisms.
Mixing data and state information was considered to be a bad practice in
the discussion about pipe-aware selection (se references in [3]), that
rules out using only augmentation of the xml document as error reporting
mechanism. Throwing an exeption would AFAIU lead to difficulties in
giving customized error reports. So I believe it would be best to put
some kind of state describing object in the environment and possibly
combine this whith augmentation of the xml document.
Pipe State Dependent Selection
------------------------------
For selecting response based on if the input document is valid or not
you suggest the following:
...
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
<select type="pipeline-state">
<when test="valid">
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
...
As I mentioned earlier this could easily be implemented with the
"pipe-aware selection" code I submitted in [3]. Let us see how it would
work:
The PipelineStateSelector can not be executed at pipeline construction
time as for ordinary selectors. The pipeline before the selector
including the ValidatorTransformer must have been executed before the
selection is performed. This can be implemented by letting the
PipelineStateSelector implement a special marker interface, say
PipelineStateAware, so that it can have special treatment in the
selection part of the sitemap interpreter.
When the sitemap interpreter gets a PipelineStateAware selector it first
ends the currently constructed pipeline with a serializer that store its
sax input in e.g. a dom-tree and the pipeline is processed and the dom
tree thith the cashed result is stored in e.g. the object model. In the
next step the selector is executed and it can base its decision on
result from the first part of the pipeline. If the ValidationTransformer
puts a validation result descriptor in the object model, the
PipelineStateSelector can perform tests on this result descriptor. In
the last step a new pipeline is constructed where the generator reads
from the stored dom tree, and in the example above, the first
transformer will be an XSLTransformer.
An alternative and more explicit way to describe the pipeline state
dependent selection above, is:
...
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
<serialize type="object-model-dom" non-terminating="true">
<parameter name="name" value="validated-input"/>
</serialize>
<select type="pipeline-state">
<when test="valid">
<generate type="object-model-dom">
<parameter name="name" value="validated-input"/>
</generate>
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
...
Here the extensions to the current Cocoon semantics is put in the
serializer instead of the selector. The sitemap interpreter treats a
non-terminating serializer as ordinary serializer in the sense that it
puts the serializer in the end of the current pipeline and executes it.
The difference is that it instead of returning to the caller of the
sitemap interpreter, it creates a new current pipeline and continue to
interpret the component after the serializer, in this case a selector.
The sitemap interpreter will also ignore the output stream of the
serializer, the serializer is suposed to have side effects. The new
current pipeline will then get a ObjectModelDOMGenerator as generator
and an XSLTTransformer as its first transformer.
I prefer this construction compared to the more implicit one because it
is more obvious what it does and also as it gives more freedom about how
to store the user input. Some people seem to prefer to store user input
in Java beans, in some applications session parameters might be a better
place then the object model.
Pipelines with Side Effects
---------------------------
A common pattern in pipelines that handle input (at least in the
application that I write) is that the first half of the pipeline takes
care of the input and ends with a transformer that stores the input. The
transformer can be e.g. the SQLTransformer (with insert or update
statements), the WriteDOMSessionTransformer, the
SourceWritingTransformer. These transformers has side effects, they
store something, and returns an xml document that tells if it succeeded
or not. A conclusion from the threads about pipe aware selection was
that sending meta data, like if the operation succeeded or not, in the
pipeline is a bad practice and especially that we don't should allow
selection based on such content. Given that these transformers basically
translate xml input to a binary format and generates an xml output that
we are supposed to ignore, it would IMO be more natural to see them as
some kind of serializer.
The next half of the pipeline creates the response, here it is less
obvious what transformer to use. I normally use an XSLTTransformer and
typically ignore its input stream and only create an xml document that
is rendered into e.g. html in a sub sequent transformer.
I think that it would be more natural to replace the pattern:
...
<transform type="store something, return state info"/>
<transform type="create a response document, ignore input"/>
...
with
...
<serialize type="store something, put state info in the environment"
non-terminating="true"/>
<generate type="create a response document" src="response document"/>
...
If we give the serializer a destination attribute as well, all the
existing serializers could be used for storing input in files etc as well.
...
<serialize type="xml" dest="xmldb://..." non-terminating="true"/>
...
This would give the same SoC that i argued in favour of in the context
of input: The serializer is responsible for how to serialize from xml to
the binary data format and the destination is responsible for where to
store the data.
Conclusion
----------
I am afraid that I put more question than I answer in this RT. Many of
them are of "best practice" character, and do not have any architectural
consequences, and does not have to be answered right now. There are
however some questions that need an answer:
How should pipeline components, like the validation transformer, report
state information? Placing some kind of state object in the object model
would be one possibility, but I don't know.
We seem to agree about that there is a need for selection in pipelines
based on the state of the computation in the pipeline that precedes the
selection. Here we have two proposals:
1. Introduce pipeline state aware selectors (e.g. by letting the
selector implement a marker interface), and give such selectors special
treatment in the sitemap interpreter.
2. Extend the semantics of serializers so that the sitemap interpreter
can continue to interpret the sitemap after a serializer, (e.g. by a new
non-terminating attribute for serializers).
I prefer the second proposal.
Booth proposals can be implemented with no back compatibility problems
at all by requiring the selectors or serializer that need the extended
semantics, to implement a special marker interface, and by adding code
that reacts on the marker interface in the sitemap interpreter.
To use serializers more generally for storing things, as I propsed
above, the Serializer interface would need to extend the
SitemapModelComponent interface.
------
What do you think?
Daniel Fagerstrom
<snip/>
[3] [Contribution] Pipe-aware selection
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Nicola Ken Barozzi <ni...@apache.org>.
Stefano Mazzocchi wrote:
> Nicola Ken Barozzi wrote:
>
>>> This said, do we really want to abstract our Environment objects so
>>> that they are capable of handling all web, CLI and mail environments?
>>> Isn't this FS?
>>
>> Is the Environment itself FS?
>
> Good question.
>
> Many people that don't like Cocoon told me so. I'm still debating in
> between myself since we came out with that concept two years ago. I
> still haven't decided.
IMHO it's useful, but it still needs a bit of work for the new
environments that will be done now.
>> We have been using it just to make a CLI that users seems to hate
>> because it's slow
>
> *some* users.
Yes, some.
>> while making angry many developers that had to change all the objects
>> the has HttpXXX servlet APIs hardcoded to use our abstraction.
>
> No, I don't buy that. The reason why we provided a way to obtain the
> original Servlet request was exactly to avoid them having to do it.
Yeah, but *if* they found out how to do it (not everyone did easily, if
ever), they had to change the code that got that, because originally
they just got the Request, that after was not the right object...
>> And now, we should let the dependency leak in again?
>
> Nicola, just because you didn't know the dependency was there it doesn't
> mean that it's *leaking* in. It has been that way since the day we
> created the environment.
Yup, but it was deemed as a minor hack just to get a specific feature be
used. But now it's not so specific, since getting the inputstream from
the Request is not only from servlets. I don't want this hack that was
used specifically for one case leak in the general abstraction
definition and be used as a normal worksforme.
> Looks hacky? well, yes and no. When the JDK introduced Java2D they came
> out with a new Graphic2D object but the paint() method passed a Graphic
> object. So it's up to you to up-cast it.
>
> void paint(Graphic g) {
> Graphic2D g2d = (Graphic2D) g;
> ...
> }
>
> It's terrible, I know. It hurts my elegance feeling like nuts.
>
> So it does our abstracted Environment.... but the Servlet API are not
> abstracted enough and the abstraction job is a stinking hard one!
> expecially if you have to provide back-compatibility.
>
> I'm all in favor of adding input capabilities to the environment, but
> only after a sound and well-thoughtout discussion.
I agree. That's why I was discussing with Vadim, and now with you :-)
>> If we really want an evvironment, we should make it as generic as
>> *reasonably* possible.
>
> I agree.
>
>> Now the HTTPServlet request has a getInputStream. We don't.
>> The day we will make Cocoon work directly in Avalon, we will break
>> every Cocoon app using it, unless the Avalon container implements the
>> same HTTPServlet classes... which simply makes out environment
>> abstraction unnecessary, since the HTTPServlet classes become the used
>> abstraction.
>
>
> Look, I agree. I just don't want to add things to such a critical
> contract without *extremely* careful thinking.
Same here.
>> Aha, here you say it too.
>>
>> Environment = Request + ServletRequest
>
>
> Oh, yes. I've always known that Request didn't have a way to get
> input... but people stated that it was *impossible* to do so and this is
> where I got nervous.
>
>> So Cocoon is instrincially asymetric unless we are in a servlet
>> environment?
>
>
> Today? yes.
>
> Must it be so? no.
>
> Is it easy to abstract the input out of any possible client/server
> architecture environment? god no!
Yup. In Morphos I abstracted it by using "Object", but it's really a
leaky abstraction generally speaking.
> Is it true that all client/server architectures are symmetric? NO!!!
>
>> Why *servlet* and not *web*? Shall we decide that all symmetric
>> environments give a ServletRequest? Is the ServletRequest then part of
>> the contract?
>
> No, I much rather see input abstracted in our Environment. I'm just
> concerned about careful thinking.
>
[...]
>> The fact is that Generators should not care where the source comes
>> from, just take an object and transform it to xml.
>
>
> If that was the case, we shouldn't need plaggable generators, but just
> different sources and one parsing generator. But we would be back to the
> same thing, just with different names and sitemap semantics.
I disagree. The above text is correct only if the source gives you xml
data, which is not necessarily the case.
A source can give be a stream, that can contain xml, html, pdf, doc,
whatever, and all of these need different generators.
>> By mixing the locator phase with the generator phase, we loose easy to
>> get flexibility.
>
> Careful here: I agree that the difference between a source and generator
> is subdle, expecially since we added a method for a source to generate
> sax events directly.
>
> But I find the concept of 'locating' a resource is very weak in our
> current sitemap context.
I don't understand what you mean here.
>> In fact, I would not see as bad this:
>>
>> <map:locate src="blah.xml"/>
>> <map:generate type="xml"/>
>> <map:transform src="cocoon:/givemetheinput">
>> <map:serialize/>
>>
>> This has come out of the Morphos effort, where it has been more than
>> evident that locating a resource and injecting it into the pipeline
>> are different concerns.
>
> I don't this this 'more than evident'-ness.
Errr, the 'more than evident'-ness came from creating Morphos, not from
the above snippet.
Let me try to explain.
When I want to generate a SAX event stream, I have to do two things:
1) get the stuff
2) transform the stuff to xml
For example, if I want to do it with an XML parser, I can do:
//what to get
String urlString = "...";
//get it
URL location = new URL(urlString);
//parse it
xmlparser.parse(location);
Imagine that I want the parser to parse from a xmldb.
I just need to be able to make the URL open the correct stream, and give
it to him.
The URL is not *the* data, but a *handle* to the data.
The string, instead, is nothing. Just a string.
The URL, the "locator", is what takes the string and is able to get the
stream that that string points to.
The Parser, just take a URL and generates SAX events from the stuff that
the URL (locator) gets for him.
> What does your above locator do? what is the difference between
> that and a Reader?
Good question. Not much, other than the fact that a locator should only
get the source, while a reader can be made to be a stream "transformer".
It's quite easy, if we want, to make multiple readers in a pipeline, and
that would be really different from a locator.
Anyway the sources are good enough, no real need for a "locator", I just
put it there to try and explain the separation from locating a resource
and generating SAX events from it.
>> The cocoon protocol is roughly the equivalent of the locator.
>
> Maybe I'm dumb, but I don't get this.
Errr I meant the Cocoon Sources, sorry.
>> The mailet wrapper is something I'm writing now, since I'm using james
>> in my intranet, and I see the pain of not having it easy to make a
>> Cocoon mailet.
>
> That's great. We were waiting for people to be willing to use cocoon in
> their mail system before attacking the SMTP part of the Environment
> abstraction.
>
> And *that* will require careful thinking about input, since that's where
> SMTP is focused on. Unlike HTTP that is focused on output.
Yup.
>> Let's not talk about using it as a bean! How can I simply give cocoon
>> a stream to process!
>
> I'm in favor of a discussion about abstracting the Environment futher to
> be more input-friendly also for mail environments, but this must come
> out of a deep discussion *and* after some *real-life* requirements.
>
> What I'm opposed to is symmetry-driven architectural design.
Listen, my needs came from real use-cases, not symmetry-driven
architectural design. I just happened to chime in this thread because
part of what was discussed here matched with my needs.
>>> Interface Elegance driven design is one step too close to FS from
>>> where I stand.
>>>
>>> But if there are *real* needs (means stuff that can't be done nicely
>>> today with what we have), I'm more than welcome to discuss how to
>>> move forward.
>>
>> As I said, moving from a servlet container to a non-servlet web
>> container would break things, unless we have it implement the
>> httpservlet methods.
>>
>> You say that not all evnironments have the need of it, and it's true,
>> but a *class* of environments do.
>
>
> Correct.
>
> Summarizing this thread a little:
>
> 1) I don't think Cocoon pipelines are asymetric.
It's irrelevant anyway. Even if they were, who cares, as long as it
works well.
> 2) I agree that the Environment is asymettric.
>
> 3) I would like to see an effort to make Environment more symmetric in
> respect of input
(*)
Actually, the input pipelines discussion, as I understood it, is simply
about the possibility of executing two pipelines per request, with the
flow in the middle.
Cocoon2 is something that gets data from a Request, mainly the URL and
some params, and generates a response with some xml stuff.
In web services though, the "Request", needs to be actually created from
an xml stream coming in.
Hence the talk about input pipelines, that would be those pipelines that
work on the request stream to generate the request that would drive the
normal Cocoon process.
By separating the processing in two steps, it has been shown how we
fulfill the need that is needed for selecting based on pipeline content
by not doing it: first we process the xml with an input pipeline, create
an intermediate "Request", and then select based on that data.
This two stepped process made Cocoon seem asymmetric because now it
cannot explicitly do it, and this two step thing seems more symmetric,
etc etc etc.
> 4) I would like to see Environment abstract enough to work in a Mailet
> environment
>
> 5) I would like this effort to be driven by real-life needs rather than
> purity and symmetry-driven architectural design (since we've seen that
> it often leads to very bad mistakes!)
Ok, wait for a new thread on this. Let's keep this thread for the real
input pipeline discussion. (*)
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:
>> This said, do we really want to abstract our Environment objects so
>> that they are capable of handling all web, CLI and mail environments?
>> Isn't this FS?
>
>
> Is the Environment itself FS?
Good question.
Many people that don't like Cocoon told me so. I'm still debating in
between myself since we came out with that concept two years ago. I
still haven't decided.
> We have been using it just to make a CLI that users seems to hate
> because it's slow
*some* users.
> while making angry many developers that had to change
> all the objects the has HttpXXX servlet APIs hardcoded to use our
> abstraction.
No, I don't buy that. The reason why we provided a way to obtain the
original Servlet request was exactly to avoid them having to do it.
> And now, we should let the dependency leak in again?
Nicola, just because you didn't know the dependency was there it doesn't
mean that it's *leaking* in. It has been that way since the day we
created the environment.
Looks hacky? well, yes and no. When the JDK introduced Java2D they came
out with a new Graphic2D object but the paint() method passed a Graphic
object. So it's up to you to up-cast it.
void paint(Graphic g) {
Graphic2D g2d = (Graphic2D) g;
...
}
It's terrible, I know. It hurts my elegance feeling like nuts.
So it does our abstracted Environment.... but the Servlet API are not
abstracted enough and the abstraction job is a stinking hard one!
expecially if you have to provide back-compatibility.
I'm all in favor of adding input capabilities to the environment, but
only after a sound and well-thoughtout discussion.
> If we really want an evvironment, we should make it as generic as
> *reasonably* possible.
I agree.
> Now the HTTPServlet request has a getInputStream. We don't.
> The day we will make Cocoon work directly in Avalon, we will break every
> Cocoon app using it, unless the Avalon container implements the same
> HTTPServlet classes... which simply makes out environment abstraction
> unnecessary, since the HTTPServlet classes become the used abstraction.
Look, I agree. I just don't want to add things to such a critical
contract without *extremely* careful thinking.
> Aha, here you say it too.
>
> Environment = Request + ServletRequest
Oh, yes. I've always known that Request didn't have a way to get
input... but people stated that it was *impossible* to do so and this is
where I got nervous.
> So Cocoon is instrincially asymetric unless we are in a servlet
> environment?
Today? yes.
Must it be so? no.
Is it easy to abstract the input out of any possible client/server
architecture environment? god no!
Is it true that all client/server architectures are symmetric? NO!!!
> Why *servlet* and not *web*? Shall we decide that all
> symmetric environments give a ServletRequest? Is the ServletRequest then
> part of the contract?
No, I much rather see input abstracted in our Environment. I'm just
concerned about careful thinking.
>> Hmmm, between
>>
>> <map:generate type="file" src="input:web:/"/>
>>
>> and
>>
>> <map:generate type="payload"/>
>>
>> I would choose the second.
>>
>> A full URI scheme for simply getting an input stream is too much and
>> it might be *very* dangerous since people will very easily abuse it
>> like this
>>
>> <map:generate src="blah.xml"/>
>> <map:transform src="input:web:/">
>>
>> which might lead to *serious* security concerns and cross-site
>> scripting problems with injected XSLT
>
>
> Nobody prevents us from making it usable only from generators.
Yes, something does: usability coherence. you can't make a protocol
available everywhere and another available only in some spots.
> And BTW,
> what you call a *serious* security concern is in fact something that has
> been asked for.
This is not an argument.
> But you still cannot prevent people from shooting
> themselves in the foot, and use
>
> <map:generate type="payload"/>
> <map:serialize/>
>
> and then calling it here
>
> <map:generate src="blah.xml"/>
> <map:transform src="cocoon:/givemetheinput">
> <map:serialize/>
Oh, totally. I can't prevent people from doing something I consider a
mistake, but I can avoid making it *easy* for them to do it.
This is what designing a framework is all about: it's not the "there is
always more than one way of doing it" FS-inflated paradigm, it's the
"this is the way we consider best, if you don't like it, use something
else or convince us of a better alternative".
> The fact is that Generators should not care where the source comes from,
> just take an object and transform it to xml.
If that was the case, we shouldn't need plaggable generators, but just
different sources and one parsing generator. But we would be back to the
same thing, just with different names and sitemap semantics.
> By mixing the locator phase with the generator phase, we loose easy to
> get flexibility.
Careful here: I agree that the difference between a source and generator
is subdle, expecially since we added a method for a source to generate
sax events directly.
But I find the concept of 'locating' a resource is very weak in our
current sitemap context.
> In fact, I would not see as bad this:
>
> <map:locate src="blah.xml"/>
> <map:generate type="xml"/>
> <map:transform src="cocoon:/givemetheinput">
> <map:serialize/>
>
> This has come out of the Morphos effort, where it has been more than
> evident that locating a resource and injecting it into the pipeline are
> different concerns.
I don't this this 'more than evident'-ness.
What does your above locator do? what is the difference between that an
a Reader?
> The cocoon protocol is roughly the equivalent of the locator.
Maybe I'm dumb, but I don't get this.
> The mailet wrapper is something I'm writing now, since I'm using james
> in my intranet, and I see the pain of not having it easy to make a
> Cocoon mailet.
That's great. We were waiting for people to be willing to use cocoon in
their mail system before attacking the SMTP part of the Environment
abstraction.
And *that* will require careful thinking about input, since that's where
SMTP is focused on. Unlike HTTP that is focused on output.
> Let's not talk about using it as a bean! How can I simply give cocoon a
> stream to process!
I'm in favor of a discussion about abstracting the Environment futher to
be more input-friendly also for mail environments, but this must come
out of a deep discussion *and* after some *real-life* requirements.
What I'm opposed to is symmetry-driven architectural design.
>> Interface Elegance driven design is one step too close to FS from
>> where I stand.
>>
>> But if there are *real* needs (means stuff that can't be done nicely
>> today with what we have), I'm more than welcome to discuss how to move
>> forward.
>
>
> As I said, moving from a servlet container to a non-servlet web
> container would break things, unless we have it implement the
> httpservlet methods.
>
> You say that not all evnironments have the need of it, and it's true,
> but a *class* of environments do.
Correct.
Summarizing this thread a little:
1) I don't think Cocoon pipelines are asymetric.
2) I agree that the Environment is asymettric.
3) I would like to see an effort to make Environment more symmetric in
respect of input
4) I would like to see Environment abstract enough to work in a Mailet
environment
5) I would like this effort to be driven by real-life needs rather
than purity and symmetry-driven architectural design (since we've seen
that it often leads to very bad mistakes!)
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Nicola Ken Barozzi <ni...@apache.org>.
Stefano Mazzocchi wrote:
> Nicola Ken Barozzi wrote:
>
>>
>> (my comments, based on the discussions that are going on lately and my
>> work on the blocks move and doc writing)
>
>
> cool. thanks for sharing.
>
>> Stefano Mazzocchi wrote:
>> [...]
>>
>>>> If we compare a Cocoon output pipeline with a unix pipeline, it always
>>>> ignore standard input and always write to standard output.
>>>
>>>
>>>
>>>
>>> Sorry, but this is plain wrong.
>>>
>>> Cocoon already ships generators that do *NOT* ignore the request input.
>>
>> Look at the Request interface.
>> There is no method to get the input.
>
> Right. But the above sentence remains wrong, maybe Cocoon Request object
> doesn't have a a method to get input which is abstrated from the
> context, but it's *wrong* to say that there is no way to get input from
> the user.
Being pricky, but "There is no method to get the input." is correct in
the sense that there is no class method to get the input.
> The fact that the Request object doesnt' contain input is because we
> couldn't agree on what *input* meant in a context-abstracted situation.
>
> So, as Vadim, I agree that we should get it in only *after* we know what
> context-abstracted input means.
Yup, Vadim has easily convinced me too :-)
[...]
>> Vadim has proposed, after some discussion, to add the possibility of
>> returning n streams, that can be used for example in mails or in any
>> system that inputs multitype data.
>
>
> There are two ways of implementing an API:
>
> 1) forcing common ground: that is creating a sufficiently abstracted
> way to look at the problem
>
> 2) leaving context-specific hooks: the component connects to the
> context-specific hooks.
>
> Java has a known history of using patter #1, but recently this has been
> challenged very seriuosly (see Eclipse SWT vs. Swing) and with some
> *great* achievements.
Ok, then let's ditch the environment alltogether 8->
> This said, do we really want to abstract our Environment objects so that
> they are capable of handling all web, CLI and mail environments? Isn't
> this FS?
Is the Environment itself FS?
We have been using it just to make a CLI that users seems to hate
because it's slow, while making angry many developers that had to change
all the objects the has HttpXXX servlet APIs hardcoded to use our
abstraction.
And now, we should let the dependency leak in again?
If we really want an evvironment, we should make it as generic as
*reasonably* possible.
Now the HTTPServlet request has a getInputStream. We don't.
The day we will make Cocoon work directly in Avalon, we will break every
Cocoon app using it, unless the Avalon container implements the same
HTTPServlet classes... which simply makes out environment abstraction
unnecessary, since the HTTPServlet classes become the used abstraction.
> I'm not stating, just asking.
[...]
>> The fact is that the request is (down-to-earth) a URI, and a response
>> is a stream. This is not symmetry.
>
>
> ??? what about those PUT WebDAV requests that might have a 10Mb payload
> and return a simple two line http response with an error code?
>
> lack of simmetry is perceived because of the way the wep currently works
> that is 90% of the HTTP requests are problably GET, 9.99% POST and 0.01%
> all the other HTTP actions.
>
> But there is *nothing* instrincially asymetric in the web nor in how
> cocoon pipelines work (if you consider your Environment as Request +
> ServletRequest)
Aha, here you say it too.
Environment = Request + ServletRequest
So Cocoon is instrincially asymetric unless we are in a servlet
environment? Why *servlet* and not *web*? Shall we decide that all
symmetric environments give a ServletRequest? Is the ServletRequest then
part of the contract?
>>> 2) what is this pipeline returning to the requesting client? This is
>>> not SMTP, we have to return something. Sure, we might simply return
>>> an HTTP header with some error code depending on the result of the
>>> serialization, but then people will ask how to control that part.
>>
>> [...]
>>
>>>> Several of the existing
>>>> generators would be highly usable in input pipelines if they were
>>>> modified in such a way that they read from "standard input" when no
>>>> src attribute is given.
>>>
>>> I lost you here.
>>
>> My take: If you use a Generator with a source protocol, it's more
>> flexible. Add a protocol that gets data from request input, and you're
>> done.
>
> Hmmm, between
>
> <map:generate type="file" src="input:web:/"/>
>
> and
>
> <map:generate type="payload"/>
>
> I would choose the second.
>
> A full URI scheme for simply getting an input stream is too much and it
> might be *very* dangerous since people will very easily abuse it like this
>
> <map:generate src="blah.xml"/>
> <map:transform src="input:web:/">
>
> which might lead to *serious* security concerns and cross-site scripting
> problems with injected XSLT
Nobody prevents us from making it usable only from generators. And BTW,
what you call a *serious* security concern is in fact something that has
been asked for. But you still cannot prevent people from shooting
themselves in the foot, and use
<map:generate type="payload"/>
<map:serialize/>
and then calling it here
<map:generate src="blah.xml"/>
<map:transform src="cocoon:/givemetheinput">
<map:serialize/>
The fact is that Generators should not care where the source comes from,
just take an object and transform it to xml.
By mixing the locator phase with the generator phase, we loose easy to
get flexibility.
In fact, I would not see as bad this:
<map:locate src="blah.xml"/>
<map:generate type="xml"/>
<map:transform src="cocoon:/givemetheinput">
<map:serialize/>
This has come out of the Morphos effort, where it has been more than
evident that locating a resource and injecting it into the pipeline are
different concerns.
The cocoon protocol is roughly the equivalent of the locator.
>> [...]
>>
>>
>>> Wouldn't the following pipeline achieve the same functionality you
>>> want without requiring changes to the architecture?
>>>
>>> <match pattern="myservice">
>>> <generate type="payload"/>
>>> <transform type="validator">
>>> <parameter name="scheme" value="myInputFormat.scm"/>
>>> </transform>
>>> <select type="pipeline-state">
>>> <when test="valid">
>>> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>>> <transform type="my-business-logic"/>
>>> <serialize type="xml"/>
>>> </when>
>>> <otherwise>
>>> <!-- produce an error document -->
>>> </otherwise>
>>> </select>
>>> </match>
>>
>>
>>
>> I basically asked the same thing... but we cannot have a generic
>> payload generator yet.
>
>
> Who said we should? Is there a *real* (non theory-driven) need for such
> a thing?
> I've been using the request generator with good satisfaction even for
> web services-like stuff and I don't need to send any input from the
> command line (do you?) and a Mailet wrapper for Cocoon is yet to be seen.
The mailet wrapper is something I'm writing now, since I'm using james
in my intranet, and I see the pain of not having it easy to make a
Cocoon mailet.
Let's not talk about using it as a bean! How can I simply give cocoon a
stream to process!
> Interface Elegance driven design is one step too close to FS from where
> I stand.
>
> But if there are *real* needs (means stuff that can't be done nicely
> today with what we have), I'm more than welcome to discuss how to move
> forward.
As I said, moving from a servlet container to a non-servlet web
container would break things, unless we have it implement the
httpservlet methods.
You say that not all evnironments have the need of it, and it's true,
but a *class* of environments do.
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:
>
> (my comments, based on the discussions that are going on lately and my
> work on the blocks move and doc writing)
cool. thanks for sharing.
> Stefano Mazzocchi wrote:
> [...]
>
>>> If we compare a Cocoon output pipeline with a unix pipeline, it always
>>> ignore standard input and always write to standard output.
>>
>>
>>
>> Sorry, but this is plain wrong.
>>
>> Cocoon already ships generators that do *NOT* ignore the request input.
>
>
> Look at the Request interface.
> There is no method to get the input.
Right. But the above sentence remains wrong, maybe Cocoon Request object
doesn't have a a method to get input which is abstrated from the
context, but it's *wrong* to say that there is no way to get input from
the user.
The fact that the Request object doesnt' contain input is because we
couldn't agree on what *input* meant in a context-abstracted situation.
So, as Vadim, I agree that we should get it in only *after* we know what
context-abstracted input means.
>> Extending those components to perform higher-level functionality is
>> *NOT* an architectural problem. Or at least, I don't see why it should
>> be.
>
>
> If a Request has input, we should at least put it in the interface.
See above.
> Vadim has proposed, after some discussion, to add the possibility of
> returning n streams, that can be used for example in mails or in any
> system that inputs multitype data.
There are two ways of implementing an API:
1) forcing common ground: that is creating a sufficiently abstracted
way to look at the problem
2) leaving context-specific hooks: the component connects to the
context-specific hooks.
Java has a known history of using patter #1, but recently this has been
challenged very seriuosly (see Eclipse SWT vs. Swing) and with some
*great* achievements.
This said, do we really want to abstract our Environment objects so that
they are capable of handling all web, CLI and mail environments? Isn't
this FS?
I'm not stating, just asking.
> [...]
>
>>> In a servlet, input would be
>>> taken from the input stream of the request object. We could also have
>>> a writable cocoon: protocol where the input stream would be set by the
>>> user of the protocol, more about that later, (see also my post in the
>>> thread [1]).
>>>
>>> An example:
>>>
>>> <match pattern="**.xls"/>
>>> <generate type="xls"/>
>>> <transform type="xsl" src="foo.xsl"/>
>>> <serialize type="xml" dest="context://repository/{1}.xml"/>
>>> </match>
>>
>>
>>
>> I see two things here:
>>
>> 1) the current pipeline components don't seem to be asymmetric (and
>> this goes somewhat against what you wrote at the beginning of your
>> email), the asymmetry is in the fact that the serializer output is
>> *always* bound to the client response. Am I right on this assumption?
>
>
> The fact is that the request is (down-to-earth) a URI, and a response is
> a stream. This is not symmetry.
??? what about those PUT WebDAV requests that might have a 10Mb payload
and return a simple two line http response with an error code?
lack of simmetry is perceived because of the way the wep currently works
that is 90% of the HTTP requests are problably GET, 9.99% POST and 0.01%
all the other HTTP actions.
But there is *nothing* instrincially asymetric in the web nor in how
cocoon pipelines work (if you consider your Environment as Request +
ServletRequest)
>> 2) what is this pipeline returning to the requesting client? This is
>> not SMTP, we have to return something. Sure, we might simply return an
>> HTTP header with some error code depending on the result of the
>> serialization, but then people will ask how to control that part.
>
>
> [...]
>
>
>>> Several of the existing
>>> generators would be highly usable in input pipelines if they were
>>> modified in such a way that they read from "standard input" when no
>>> src attribute is given.
>>
>>
>> I lost you here.
>
>
> My take: If you use a Generator with a source protocol, it's more
> flexible. Add a protocol that gets data from request input, and you're
> done.
Hmmm, between
<map:generate type="file" src="input:web:/"/>
and
<map:generate type="payload"/>
I would choose the second.
A full URI scheme for simply getting an input stream is too much and it
might be *very* dangerous since people will very easily abuse it like this
<map:generate src="blah.xml"/>
<map:transform src="input:web:/">
which might lead to *serious* security concerns and cross-site scripting
problems with injected XSLT
> [...]
>
>
>> Wouldn't the following pipeline achieve the same functionality you
>> want without requiring changes to the architecture?
>>
>> <match pattern="myservice">
>> <generate type="payload"/>
>> <transform type="validator">
>> <parameter name="scheme" value="myInputFormat.scm"/>
>> </transform>
>> <select type="pipeline-state">
>> <when test="valid">
>> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>> <transform type="my-business-logic"/>
>> <serialize type="xml"/>
>> </when>
>> <otherwise>
>> <!-- produce an error document -->
>> </otherwise>
>> </select>
>> </match>
>
>
> I basically asked the same thing... but we cannot have a generic payload
> generator yet.
Who said we should? Is there a *real* (non theory-driven) need for such
a thing?
I've been using the request generator with good satisfaction even for
web services-like stuff and I don't need to send any input from the
command line (do you?) and a Mailet wrapper for Cocoon is yet to be seen.
Interface Elegance driven design is one step too close to FS from where
I stand.
But if there are *real* needs (means stuff that can't be done nicely
today with what we have), I'm more than welcome to discuss how to move
forward.
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Nicola Ken Barozzi <ni...@apache.org>.
(my comments, based on the discussions that are going on lately and my
work on the blocks move and doc writing)
Stefano Mazzocchi wrote:
[...]
>> If we compare a Cocoon output pipeline with a unix pipeline, it always
>> ignore standard input and always write to standard output.
>
>
> Sorry, but this is plain wrong.
>
> Cocoon already ships generators that do *NOT* ignore the request input.
Look at the Request interface.
There is no method to get the input.
> Extending those components to perform higher-level functionality is
> *NOT* an architectural problem. Or at least, I don't see why it should be.
If a Request has input, we should at least put it in the interface.
Vadim has proposed, after some discussion, to add the possibility of
returning n streams, that can be used for example in mails or in any
system that inputs multitype data.
[...]
>> In a servlet, input would be
>> taken from the input stream of the request object. We could also have
>> a writable cocoon: protocol where the input stream would be set by the
>> user of the protocol, more about that later, (see also my post in the
>> thread [1]).
>>
>> An example:
>>
>> <match pattern="**.xls"/>
>> <generate type="xls"/>
>> <transform type="xsl" src="foo.xsl"/>
>> <serialize type="xml" dest="context://repository/{1}.xml"/>
>> </match>
>
>
> I see two things here:
>
> 1) the current pipeline components don't seem to be asymmetric (and this
> goes somewhat against what you wrote at the beginning of your email),
> the asymmetry is in the fact that the serializer output is *always*
> bound to the client response. Am I right on this assumption?
The fact is that the request is (down-to-earth) a URI, and a response is
a stream. This is not symmetry.
> 2) what is this pipeline returning to the requesting client? This is not
> SMTP, we have to return something. Sure, we might simply return an HTTP
> header with some error code depending on the result of the
> serialization, but then people will ask how to control that part.
[...]
>> Several of the existing
>> generators would be highly usable in input pipelines if they were
>> modified in such a way that they read from "standard input" when no
>> src attribute is given.
>
> I lost you here.
My take: If you use a Generator with a source protocol, it's more
flexible. Add a protocol that gets data from request input, and you're done.
[...]
> Wouldn't the following pipeline achieve the same functionality you want
> without requiring changes to the architecture?
>
> <match pattern="myservice">
> <generate type="payload"/>
> <transform type="validator">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </transform>
> <select type="pipeline-state">
> <when test="valid">
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> <transform type="my-business-logic"/>
> <serialize type="xml"/>
> </when>
> <otherwise>
> <!-- produce an error document -->
> </otherwise>
> </select>
> </match>
I basically asked the same thing... but we cannot have a generic payload
generator yet.
[...]
>> The ability to handle structured input (e.g. xml) in a convenient way,
>> will probably be an important requirement on webapp frameworks in the
>> near future.
>
>
> Agreed.
>
>> By removing the asymmetry between generators and serializers, by letting
>> the input of a generator be set by the context and the output of a
>> serializer be set from the sitemap, Cocoon could IMO be as good in
>> handling input as it is today in producing output.
>
>
> I don't understand what you mean by 'setting the input by the context'.
>
> As far as allowing the serializer to have a destination semantic in the
> sitemap, I'd be against it because I see it more harmful than useful.
>
> I do agree that serializers should not be connected only to the servlet
> output stream, but this is not a concern of the pipeline itself, but of
> who assembles the pipeline... and, IMO, the flow logic is what is
> closest to that that we have today.
>
>> This would also make it possible to introduce a writable as well as
>> readable Cocoon pseudo protocol, that would be a good way to export
>> functionality from blocks.
>
> I agree that a writeable cocoon: protocol is required, expecially for
> blocks, but this doesn't mean we have to change the sitemap semantics
> for that.
>
>> There are of course many open questions, e.g. how to implement those
>> ideas without introducing to much back incompability.
>
> The best idea is to avoid changing what it doesn't require changes and
> work to minimize architectural changes from that point on.
Yup, exactly.
> But enough for now.
>
> And thanks for keeping up with the input-oriented discussions :-)
Indeed.
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Stefano Mazzocchi <st...@apache.org>.
Hmmm, maybe deep architectural discussions are good during holydays
seasons... we'll see :)
Daniel Fagerstrom wrote:
> Input Pipelines
> ===============
>
> There is, IMO, a need for better support for input handling in
> Cocoon. I believe that the introduction of "input pipelines" can be an
> important step in this direction. In the rest of this (long) RT I will
> discuss use cases for them, a possible definition of input pipelines,
> compare them with the existing pipeline concept in Cocoon (henceforth
> called output pipelines), discuss what kind of components that would
> be useful in them, how they can be used in the sitemap and from
> flowscripts, and also relate them to the current discussion about how
> to reuse functionality "Cocoon services" between blocks.
Cool, let's rock and roll.
> Use cases
> ---------
>
> There is an ongoing trend of packaging all kinds of application as web
> applications or to decompose them as sets of web services. At the same
> time web browsers are more and more becoming a universal GUI for all
> kinds of applications (e.g. XUL).
>
> This leads to an increasing need for handling of structured input data
> in web applications. SOAP might be the most important example, we also
> have XML-RPC and most certainly numerous home brewn formats, some might
> even be binary non-xml legacy formats. WebDAV is another example of
> xml-input, and next generation form handling, XForms, use xml as
> transport format.
>
> As people are building more and more advanced Cocoon-systems there is
> also a growing need for reusing functionality in a structured way,
> there have been discussions about how to package and reuse "Cocoon
> services" in the context of blocks [1] and [2]. Here there is also a
> need for handling xml-input.
>
> The company I work for build data warehouses, some of our customer are
> starting to get interested in using the functionality of the data
> warehouses, not only from the the web interfaces that we usually build
> but also as parts of their own webapps. This means that we want,
> besides Cocoons flexibility in presenting data in different forms,
> also flexibility in asking for the data through different input
> formats.
>
> There is thus a world of input beyond the request parameters, and a
> world of rapidly growing importance.
I acknowledge that and I think everybody here does.
> Does Cocoon support the abovementioned use cases? Yes and no: there
> are numerous components that implements SOAP, WebDAV, parts of XForms
> etc. But while the components designed for publishing are highly
> reusable in various context, this is not the case for input
> components.
Stop.
Before we go on I would like to point out that there is a *huge*
difference between poor 'reusability of components' depending on their
implementation or depending on architectural limitations of the
component framework.
> IMO the reason for this is that Cocoon as a framework does
> not have much support for input handling.
This is obviously debetable, but I do agree with you that it's worth
considering to challenge the very architecture of the framework and test
its balance toward input and output.
So, no matter what result this discussion will bring, it will be a good
design challenge.
> IMO Cocoon could be as good in handling input as it currently is in
> creating output, by reusing exactly the same concept: pipelines. We
> can however not use the existing "output pipelines" as is, there are
> some assymetries in their design that makes them unsuitable for input.
I fail to see the asymmetries, but let's keep going.
> The term "input pipeline" has sometimes been used on the list, it is
> time to try to define what it could be.
>
> What is an Input Pipeline
> -------------------------
>
> An input pipeline typically starts by reading octet data from the
> input stream of the request object. The input data could be xml, tab
> separated data, text that is structured according to a certain
> grammar, binary legacy formats like Excel or Word or anything else
> that could be translated to xml. The first step in the input pipeline
> is an adapter from octet data to a sax events. This sounds quite
> similar to a generator, we will return to this in the next session.
This sounds so similar to a generator that I fail to see any difference
to what a generator is... that is: whould you need any additional method
in an interface that describes such a 'generator for input pipelines'?
I'm not being ironic, but honestly curious.
> The structure of the xml from the first step in the pipeline might not
> be in a form that is suitable for the data model that we would like to
> use internally in the system. Reasons for this can be that the xml
> input is supposed to follow some standard or some customer defined
> format. Input adapters for legacy formats will probably produce xml
> that is similar to the input format and repeat all kinds of
> idiosyncrasies from that format. There is thus a need to transform the
> input xml to an xml format more suited to our application specific
> needs. One or several xslt-transformer steps would therefore be
> useful in the input pipeline.
And these sounds like transformers to me, unless I'm really missing a
big piece of the puzzle.
> As a last step in the input pipeline the sax events should be adapted
> to some binary format so that e.g. the business logic in the system
> can be applied to it. The xml input could e.g. be serialized to an
> octet stream for storage in a file (as text, xml, pdf, images, ...),
> transformed to java objects for storage in the session object, be put
> into an xml db or into an relational db.
Ah, now I'm starting to get it: you want to detach the pipeline output
to the response!
Yes, I've been thinking about this a lot and I think I do have a
solution (more below)
> Isn't this exactly what an output pipeline does?
>
> Comparison to Output Pipelines
> ------------------------------
>
> Booth an input and an output pipeline consists of a an adaptor from
> a binary format to sax events followed by a (possibly empty) sequence
> of transformers that take sax events as input as well as output. The
> last step is an adaptor from sax events to a binary format. The main
> difference (and the one I will focus on) is how the binary input and
> output is connected to the pipeline.
>
> Let us look at an example of an output pipeline:
>
> <match pattern="*.html"/>
> <generate type="xml" src="{1}.xml"/>
> <transform type="xsl" src="foo.xsl"/>
> <serialize type="html"/>
> </match>
>
> The input to the pipeline is controlled from the sitemap by the src
> attribute in the generator, while the output from the serializer can't
> be controlled from the sitemap, the context in which the sitemap is
> used is responsible for directing the output to an appropriate
> place. If the pipeline is used from a servlet, the output will be
> directed to the output stream of the response object in the serlet. If
> it is used from the command line, the output will be redirected to a
> file. If it is used in the cocoon: protocol the output will be
> redirected to be used as input from the src attribute of e.g. a
> generator or a transformer (cf. with Carstens and mine writings in
> [1] about the semantics of the cocoon: protocol).
>
> Here is another example:
>
> <match pattern="bar.pdf"/>
> <generate type="xsp" src="bar.xsp"/>
> <transform type="xsl" src="foo.xsl"/>
> <serialize type="pdf"/>
> </match>
>
> In this case the binary input is taken from the object model and the
> component manager in Cocoon and the input file to the generator,
> "bar.xsp" describes how to extract the input and how to structure it
> as an xml document.
>
> If we compare a Cocoon output pipeline with a unix pipeline, it always
> ignore standard input and always write to standard output.
Sorry, but this is plain wrong.
Cocoon already ships generators that do *NOT* ignore the request input.
Extending those components to perform higher-level functionality is
*NOT* an architectural problem. Or at least, I don't see why it should be.
> An input
> pipeline would be the opposite: it would always read from standard
> input and ignore standard output. In Cocoon this would mean that the
> input source would be set by the context.
What context? do you imply that input pipelines don't work out of
request parameter matching?
> In a servlet, input would be
> taken from the input stream of the request object. We could also have
> a writable cocoon: protocol where the input stream would be set by the
> user of the protocol, more about that later, (see also my post in the
> thread [1]).
>
> An example:
>
> <match pattern="**.xls"/>
> <generate type="xls"/>
> <transform type="xsl" src="foo.xsl"/>
> <serialize type="xml" dest="context://repository/{1}.xml"/>
> </match>
I see two things here:
1) the current pipeline components don't seem to be asymmetric (and this
goes somewhat against what you wrote at the beginning of your email),
the asymmetry is in the fact that the serializer output is *always*
bound to the client response. Am I right on this assumption?
2) what is this pipeline returning to the requesting client? This is not
SMTP, we have to return something. Sure, we might simply return an HTTP
header with some error code depending on the result of the
serialization, but then people will ask how to control that part.
> Here the generator reads an Excel document from the input stream that
> is submitted by the context, and translate it to some xml format. The
> serializer write its xml input in the file system. I reused the names
> generator and serializer partly because I didn't found any good names
> (deserializer is the inverse to serializer, but what is the inverse of
> a generator?)
There is none, because the opposite of generation would be destruction
and you are definately *not* distructing something, but still *generate*
it. Where the data the generator uses comes from is *not* an
architectural concern and should not modify the component's name.
>, and partly because it IMO would be the best solution if
> the generator and serializer from output pipelines can be extended to
> be usable in input pipelines as well.
I don't see the need to change anything in pipeline components. IoC
keeps serializers totally unaware of where they are writing and
Generators already have access to all request input.
> Several of the existing
> generators would be highly usable in input pipelines if they were
> modified in such a way that they read from "standard input" when no
> src attribute is given.
I lost you here.
> There are also some serializers that would be
> usefull in the input pipelines as well, in this case the output stream
> given i the dest attribute should be used instead of the one that is
> supplied by the context. It can of course be problematic to extend the
> definition of generators anda serializers as it might lead to back
> compabillity problems.
Please, tell me what kind of changes to those interfaces you think you'd
require to implement what you are proposing. It will be much easier to
follow.
> Another example of an input pipeline:
>
> <match pattern="in"/>
> <generate type="textparser">
> <parameter name="grammar" value="example.txt"/>
> </generate>
> <transform type="xsl" src="foo.xsl"/>
> <serialize type="xsp" src="toSql.xsp"/>
> </match>
>
> In this example the serializer modify the content of components that
> can be found from the object model and the component manager. We use a
> hypothetical "output xsp" language to describe how to modify the
> environment. Such a language could be a little bit like xslt in the
> sense that it recursively applies templates (rules) with matching
> xpath patterns. But the template would contain custom tags that have
> side effects instead of just emitting xml. Could such a language be
> implemented in Jelly? It would be useful with custom tags that modify
> the session object, that writes to sql databases, connect with business
> logic and so on.
This example is a security nightmare.
> Error Handling
> --------------
>
> Error handling in input pipelines is even more important than in
> output pipelines: We must protect the system against non well formed
> input and the user must be given detailed enough information about
> whats wrong, while they in many cases has no access to log files or
> access to the internals of the system.
>
> Examples of things that can go wrong is that the input not is parsable
> or that it isn't valid with respect to some grammar or scheme. If we
> want input pipelines to work in streaming mode, without unnecessary
> buffering, it is impossible to know that the input data is correct until
> all
> of it is processed. This means that serializer might already have
> stored some parts of the pipeline data when an error is detected. I
> think that serializers where faulty input data would be unacceptable,
> should use some kind of transactions and that they should be notified
> when something goes wrong earlier in the pipeline so that they are
> able to roll back the transaction.
>
> I have not studied the error handling system in Cocoon, maybe there
> already are mechanisms that could be used in input pipelines as well?
It's entirely possible to have 'ValidationTransformers' that trigger an
exception if something is wrong, and this exception will be picked up by
the usual error handler.
>
> In Sitemaps
> -----------
>
> In a sitemap an input pipeline could be used e.g. for implementing a
> web service:
>
> <match pattern="myservice">
> <generate type="xml">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </generate>
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> <serialize type="dom-session" non-terminating="true">
> <parameter name="dom-name" value="input"/>
> </serialize>
> <select type="pipeline-state">
> <when test="success">
> <act type="my-business-logic"/>
> <generate type="xsp" src="collectTheResult.xsp"/>
> <serialize type="xml"/>
> </when>
> <when test="non-valid">
> <!-- produce an error document -->
> </when>
> </select>
> </match>
>
> Here we have first an input pipeline that reads and validates xml
> input, transforms it to some appropriate format and store the result
> as a dom-tree in a session attribute. A serializer normally means that
> the pipeline should be executed and thereafter an exit from the
> sitemap. I used the attribute non-terminating="true", to mark that
> the input pipeline should be executed but that there is more to do in
> the sitemap afterwards.
>
> After the input pipeline there is a selector that select the output
> pipeline depending of if the input pipeline succeed or not. This use
> of selection have some relation to the discussion about pipe-aware
> selection (see [3] and the references therein). It would solve at
> least my main use cases for pipe-aware selection, without having its
> drawbacks: Stefano considered pipe-aware selection mix of concern,
> selection should be based on meta data (pipeline state) rather than on
> data (pipeline content). There were also some people who didn't like
> my use of buffering of all input to the pipe-aware selector. IMO the
> use of selectors above solves booth of these issues.
>
> The output pipeline start with an action that takes care about the
> business logic for the application. This is IMHO a more legitimate use
> for actions than the current mix of input handling and business logic.
Wouldn't the following pipeline achieve the same functionality you want
without requiring changes to the architecture?
<match pattern="myservice">
<generate type="payload"/>
<transform type="validator">
<parameter name="scheme" value="myInputFormat.scm"/>
</transform>
<select type="pipeline-state">
<when test="valid">
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
<transform type="my-business-logic"/>
<serialize type="xml"/>
</when>
<otherwise>
<!-- produce an error document -->
</otherwise>
</select>
</match>
> In Flowscripts
> --------------
>
> IIRC the discussion and examples of input for flowscripts this far has
> mainly dealed with request parameter based input. If we want to use
> flowscripts for describing e.g. web service flow, more advanced input
> handling is needed. IMO it would be an excelent SOC to use output
> pipelines for the presentation of the data used in the system, input
> pipelines for going from input to system data, java objects (or some
> other programming language) for describing business logic working on
> the data within the system, and flowscripts for connecting all this in
> an appropriate temporal order.
A while ago, I proposed the addition of a new flowscript method that
would be something like this
invoquePipeline(uri, parameters, outputStream)
that means that the flow will be calling the pipeline associated with
the given URI, but the serializer will write on the given outputStream.
Since there were already too many irons in the fire, I wanted to see the
flowscript settle down before starting to push for this again, but your
RT brings back pressure on this concept and I think this is all we need
to remove the asymmetry from cocoon pipelines.
> For Reuseability Between Blocks
> -------------------------------
>
> There have been some discussions about how to reuse functionality
> between blocks in Cocoon (see the threads [1] and [2] for
> background). IMO (cf. my post in the thread [1]), a natural way of
> exporting pipeline functionality is by extending the cocoon pseudo
> protocol, so that it accepts input as well as produces output. The
> protocol should also be extended so that input as well as output can
> be any octet stream, not just xml.
The above flowscript method could use the URI to connect to
block-contained pipelines.... but I'm not sure if this would solve the
entire solution space.
> If we extend generators so that their input can be set by the
> environment (as proposed in the discussion about input pipelines), we
> have what is needed for creating a writable cocoon protocol. The web
> service example in the section "In Sitemaps" could also be used as an
> internal service, exported from a block.
>
> Booth input and output for the extended cocoon protocol can be booth
> xml and non-xml, this give us 4 cases:
>
> xml input, xml output: could be used from a "pipeline"-transformer,
> the input to the transformer is redirected to the protocol and the
> output from the protocol is redirected to the output of the
> transformer.
>
> non-xml input, xml output: could be used from a generator.
>
> xml input, non-xml output: could be used from a serializer.
>
> non-xml input, non-xml output: could be used from a reader if the
> input is ignored, from a "writer" if the output is ignored and from a
> "reader-writer", if booth are used.
>
> Generators that accepts xml should of course also accept sax-events
> for efficiency reasons, and serializer that produces xml should of the
> same reason also be able to produce sax-events.
I still can't see any difference between a reader and a writer (or an
input-generator vs. output-generator) in terms of interface methods.
They look totally similar to me. It's the way the sitemap uses them that
changes their behavior. IoC should enforce that.
> Conclusion
> ----------
>
> The ability to handle structured input (e.g. xml) in a convenient way,
> will probably be an important requirement on webapp frameworks in the
> near future.
Agreed.
> By removing the asymmetry between generators and serializers, by letting
> the input of a generator be set by the context and the output of a
> serializer be set from the sitemap, Cocoon could IMO be as good in
> handling input as it is today in producing output.
I don't understand what you mean by 'setting the input by the context'.
As far as allowing the serializer to have a destination semantic in the
sitemap, I'd be against it because I see it more harmful than useful.
I do agree that serializers should not be connected only to the servlet
output stream, but this is not a concern of the pipeline itself, but of
who assembles the pipeline... and, IMO, the flow logic is what is
closest to that that we have today.
> This would also make it possible to introduce a writable as well as
> readable Cocoon pseudo protocol, that would be a good way to export
> functionality from blocks.
I agree that a writeable cocoon: protocol is required, expecially for
blocks, but this doesn't mean we have to change the sitemap semantics
for that.
> There are of course many open questions, e.g. how to implement those
> ideas without introducing to much back incompability.
The best idea is to avoid changing what it doesn't require changes and
work to minimize architectural changes from that point on.
But enough for now.
And thanks for keeping up with the input-oriented discussions :-)
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Nicola Ken Barozzi wrote:
>
> Daniel Fagerstrom wrote:
> [...]
>
> Cocoon is symmetric, if you see it as it really is, a system that
> transforms a Request in a Response.
>
> The problem arises in the way we have defined the request and the
> response: The Request is a URL, the response is a Stream.
>
> So actually Cocoon transforms URIs in a stream.
>
> The sitemap is the system that demultiplexes URIs by associating them
> with actual source of the data. This makes cocoon richer than a system
> that just hands an entity to transform: Cocoon uses indirect references
> (URLs) instead.
>
> The Stream as an input is a specialization, so I can say in the request
> to get stuff from the stream.
>
> More on this later.
>
>> In a sitemap an input pipeline could be used e.g. for implementing a
>> web service:
>>
>> <match pattern="myservice">
>> <generate type="xml">
>> <parameter name="scheme" value="myInputFormat.scm"/>
>> </generate>
>> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
>> <serialize type="dom-session" non-terminating="true">
>> <parameter name="dom-name" value="input"/>
>> </serialize>
>> <select type="pipeline-state">
>> <when test="success">
>> <act type="my-business-logic"/>
>> <generate type="xsp" src="collectTheResult.xsp"/>
>> <serialize type="xml"/>
>> </when>
>> <when test="non-valid">
>> <!-- produce an error document -->
>> </when>
>> </select>
>> </match>
>
>
> What you correctly point out, is that the Generation phase not get the
> source, but just transform it to SAX.
<snip/>
> But IMHO this has a deficiency of fixing the source from the input.
My intension was that that when not using src attribute, the generator
should read the input stream.
> Think about having good Source Protocols.
>
> We could write:
>
> <match pattern="myservice">
> <generate type="xml" src="inputstream:myInputFormat.scm"/>
> ...
> </match>
>
> This can easily make all my Generators able to work with the new system
> right away.
This seem to be a better solution. Can please expand about why you put
the scheme in the inputstream: protocol.
>
>> Here we have first an input pipeline that reads and validates xml
>> input, transforms it to some appropriate format and store the result
>> as a dom-tree in a session attribute. A serializer normally means that
>> the pipeline should be executed and thereafter an exit from the
>> sitemap. I used the attribute non-terminating="true", to mark that
>> the input pipeline should be executed but that there is more to do in
>> the sitemap afterwards.
>
>
> Pipelines can already call one another.
> We add the serializer at the end, but it's basically skipped, thus
> making your pipeline example.
The idea is using two pipelines, executed in sequence, for processing a
post. First the input pipeline that is responsible for reading the input
data, trandform it to an appropriate format and store it, after that the
stored data can be used for the business logic that can be called from
an action, after the action an ordinary output pipeline is executed for
publishing the result of the business logic, for sending the next form
page etc.
In this scenario the serializer in the input pipeline is responsible for
storing the input data and can thus not be skipped. Furthermore as we
are going to execute two pipelines in sequence, the first serializer
must not mean an exit from the sitemap as it normally would do.
I think it is better SoC and reuse of components, to let a serializer be
responsible for storing input data than to use transformers for that.
Write DOM session transformer, source writing transformer,
SQLTransformer used for inserting data and the session transformer would
IMHO be more natural as serializers.
> I would think that with the blocks discussion there has been some
> advancement on the definition of pipeline fragments.
> I didn't follow it closely though, anyone care to comment?
>
>> After the input pipeline there is a selector that select the output
>> pipeline depending of if the input pipeline succeed or not. This use
>> of selection have some relation to the discussion about pipe-aware
>> selection (see [3] and the references therein). It would solve at
>> least my main use cases for pipe-aware selection, without having its
>> drawbacks: Stefano considered pipe-aware selection mix of concern,
>> selection should be based on meta data (pipeline state) rather than on
>> data (pipeline content). There were also some people who didn't like
>> my use of buffering of all input to the pipe-aware selector. IMO the
>> use of selectors above solves booth of these issues.
>
>
> I don't see this. Can you please expand here?
1. Selection should be based on pipeline state instead of pipeline data.
First the input pipeline is executed and is able to set the state of
the pipeline. After that ordinary selects can be used for deciding how
to construct the output pipeline. The selectors for the output pipeline
has no access some pipeline content and are used in exactly the same way
as selector allwys are used.
2. No use of buffering within the pipeline. IIRC some people were
concerned with that pipe aware selection based on buffering of the sax
events before the selection, could be very inefficient if there is much
data in the pipeline. As my main use case for pipe aware selection was
to use it after transformers with side effects, and after validation of
user submitted input data. I never saw it as problem as the amount of
data in the mentioned cases typically is quite small. Anyway, with input
pipelines selection is restricted to cases where the input was going to
be stored by the system anyhow.
> [...]
>
>> In Flowscripts
>> --------------
>>
>> IIRC the discussion and examples of input for flowscripts this far has
>> mainly dealed with request parameter based input. If we want to use
>> flowscripts for describing e.g. web service flow, more advanced input
>> handling is needed. IMO it would be an excelent SOC to use output
>> pipelines for the presentation of the data used in the system, input
>> pipelines for going from input to system data, java objects (or some
>> other programming language) for describing business logic working on
>> the data within the system, and flowscripts for connecting all this in
>> an appropriate temporal order.
>
>
> Hmmm, this seems like a compelling use case.
> Could you please add a concrete use-case/example for this?
> Thanks :-)
One use case, (if combined with persistent storage of continuations),
would be a workflow system.
Besides that, input pipelines are IMO very usefull for handling request
parameters from forms as well. In all webapps that we build at my
company, we use absolute xpaths as request parameter names and then use
a generator that builds a xml document from the name/value pairs. This
xml input is then possibly transformed to another format and therafter
stored in a db or as a dom tree in a session attribute.
A flowscript that uses input pipelines might look like:
handleForm("formPage1.html", "storeData1");
if (objectModel["state"] == "succees")
doBusinessLogic1(...);
...
Where formPage1.html is an output pipeline that produces a form and
storeData handles and store the input.
>
>> For Reuseability Between Blocks
>> -------------------------------
>>
>> There have been some discussions about how to reuse functionality
>> between blocks in Cocoon (see the threads [1] and [2] for
>> background). IMO (cf. my post in the thread [1]), a natural way of
>> exporting pipeline functionality is by extending the cocoon pseudo
>> protocol, so that it accepts input as well as produces output. The
>> protocol should also be extended so that input as well as output can
>> be any octet stream, not just xml.
>>
>> If we extend generators so that their input can be set by the
>> environment (as proposed in the discussion about input pipelines), we
>> have what is needed for creating a writable cocoon protocol. The web
>> service example in the section "In Sitemaps" could also be used as an
>> internal service, exported from a block.
>>
>> Booth input and output for the extended cocoon protocol can be booth
>> xml and non-xml, this give us 4 cases:
>>
>> xml input, xml output: could be used from a "pipeline"-transformer,
>> the input to the transformer is redirected to the protocol and the
>> output from the protocol is redirected to the output of the
>> transformer.
>>
>> non-xml input, xml output: could be used from a generator.
>>
>> xml input, non-xml output: could be used from a serializer.
>>
>> non-xml input, non-xml output: could be used from a reader if the
>> input is ignored, from a "writer" if the output is ignored and from a
>> "reader-writer", if booth are used.
>>
>> Generators that accepts xml should of course also accept sax-events
>> for efficiency reasons, and serializer that produces xml should of the
>> same reason also be able to produce sax-events.
>
>
> Also this seems interesting.
>
> Please add concrete examples here to, possibly applied to blocks.
> I know it's hard, but it would really help.
What I tried to describe is just a somewhat different approach to how to
describe reusable pipeline fragments between blocks, so for use cases
please see Sylvains and Stefanos original posts in the threads [1] and [2].
Lets take a look on an example from Sylvains post (in [1]) to illustrate
what I have in mind:
<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="xdoc2skinnedHtml"/>
<map:serialize type="html"/>
</map:match>
<map:match pattern="xdoc2skinnedHtml">
<map:generate type="dont_care"/>
<map:transform type="i18n"/>
<map:transform type="xdoc2html.xsl"/>
<map:transform type="htmlskin.xsl"/>
<map:serialize type="dont_care"/>
</map:match>
Here the idea is that when xdoc2skinnedHtml is used from a pipeline
transformer the generator and the serializer is not used and only the
sub pipeline consisting of the three transformers in the middle is used.
This behaviour is inspired by the cocoon: protocol where the serializer
is skipped.
Several people thought that the removal of parts of generators and
serializer depending on the usage context of the pipeline, confusing.
Carsten wrote that:
"It is correct, that internally in most cases the serializer
of a pipeline is ignored, when the cocoon protocol is used.
But this is only because of performance."
And that a pipeline used from the cocoon protocol is supose to end with
an xml serializer. I agree with this and think that it would be better
to express the example above as (cf with my post in [1]):
<map:match pattern="a_page">
<map:generate src="an_xdoc.xml"/>
<map:transform type="pipeline" src="cocoon:xdoc2skinnedHtml"/>
<map:serialize type="html"/>
</map:match>
<map:match pattern="xdoc2skinnedHtml">
<map:generate src="inputstream:xdoc.scm"/>
<map:transform type="i18n"/>
<map:transform type="xdoc2html.xsl"/>
<map:transform type="htmlskin.xsl"/>
<map:serialize type="xml"/>
</map:match>
Here the cocoon: protocol is suposed to be a writable source. The
function of the pipeline transformer is that it serializes its xml
input, redirect it to the writable source in the src attribute, parses
the xml output stream from the source and output the result from the
parser as sax events. Of course the serialize-parse steps should be
optimzed away, but this should be considered an implementation detail
not part of the semantics.
By further generalizing the cocoon: protocol so that it allows non-xml
output (and input) it can be used for the pipeline serializer that
Sylvain proposed as well. For the pipeline generator the cocoon:
protocol can be used as is.
>
> It seems that what you propose Cocoon already mostly has, but it's more
> the use-case and some minor additions that have to be put forward.
>
>> Conclusion
>> ----------
>>
>> The ability to handle structured input (e.g. xml) in a convenient way,
>> will probably be an important requirement on webapp frameworks in the
>> near future.
>>
>> By removing the asymmetry between generators and serializers, by letting
>> the input of a generator be set by the context and the output of a
>> serializer be set from the sitemap, Cocoon could IMO be as good in
>> handling input as it is today in producing output.
>
>
> Cocoon already does this, no?
> Can't we use the cocoon:// protocol to get the results of a pipeline
> from another one? What would change?
As said above, the cocoon protocol should be writable as well as
readable and allow for non xml input and output. The block protocol
could use the same ideas and thus give a good way of exporting
functionality.
To realize the above ideas we would need to implement the inputstream
protocol that in turn would require that the Request interface is
extended with a getInputStream() method. The cocoon protocol should be
extended as described. The proposed extension of the serializer for the
use in input pipelines would require serializers to implement
SitemapModelComponent.
Thank you for your comments.
/Daniel Fagerstrom
<snip/>
>> References
>> ----------
>>
>> [1] [RT] Using pipeline as sitemap components (long)
>> http://marc.theaimsgroup.com/?t=103787330400001&r=1&w=2
>>
>> [2] [RT] reconsidering pipeline semantics
>> http://marc.theaimsgroup.com/?t=102562575200001&r=2&w=2
>>
>> [3] [Contribution] Pipe-aware selection
>> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [RT] Input Pipelines (long)
Posted by Nicola Ken Barozzi <ni...@apache.org>.
Daniel Fagerstrom wrote:
[...]
Cocoon is symmetric, if you see it as it really is, a system that
transforms a Request in a Response.
The problem arises in the way we have defined the request and the
response: The Request is a URL, the response is a Stream.
So actually Cocoon transforms URIs in a stream.
The sitemap is the system that demultiplexes URIs by associating them
with actual source of the data. This makes cocoon richer than a system
that just hands an entity to transform: Cocoon uses indirect references
(URLs) instead.
The Stream as an input is a specialization, so I can say in the request
to get stuff from the stream.
More on this later.
> In a sitemap an input pipeline could be used e.g. for implementing a
> web service:
>
> <match pattern="myservice">
> <generate type="xml">
> <parameter name="scheme" value="myInputFormat.scm"/>
> </generate>
> <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
> <serialize type="dom-session" non-terminating="true">
> <parameter name="dom-name" value="input"/>
> </serialize>
> <select type="pipeline-state">
> <when test="success">
> <act type="my-business-logic"/>
> <generate type="xsp" src="collectTheResult.xsp"/>
> <serialize type="xml"/>
> </when>
> <when test="non-valid">
> <!-- produce an error document -->
> </when>
> </select>
> </match>
What you correctly point out, is that the Generation phase not get the
source, but just transform it to SAX.
<match pattern="myservice">
<generate type="xml">
<parameter name="scheme" value="myInputFormat.scm"/>
</generate>
<transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
<serialize type="dom-session" non-terminating="true">
<parameter name="dom-name" value="input"/>
</serialize>
<select type="pipeline-state">
<when test="success">
<act type="my-business-logic"/>
<generate type="xsp" src="collectTheResult.xsp"/>
<serialize type="xml"/>
</when>
<when test="non-valid">
<!-- produce an error document -->
</when>
</select>
</match>
But IMHO this has a deficiency of fixing the source from the input.
Think about having good Source Protocols.
We could write:
<match pattern="myservice">
<generate type="xml" src="inputstream:myInputFormat.scm"/>
...
</match>
This can easily make all my Generators able to work with the new system
right away.
> Here we have first an input pipeline that reads and validates xml
> input, transforms it to some appropriate format and store the result
> as a dom-tree in a session attribute. A serializer normally means that
> the pipeline should be executed and thereafter an exit from the
> sitemap. I used the attribute non-terminating="true", to mark that
> the input pipeline should be executed but that there is more to do in
> the sitemap afterwards.
Pipelines can already call one another.
We add the serializer at the end, but it's basically skipped, thus
making your pipeline example.
I would think that with the blocks discussion there has been some
advancement on the definition of pipeline fragments.
I didn't follow it closely though, anyone care to comment?
> After the input pipeline there is a selector that select the output
> pipeline depending of if the input pipeline succeed or not. This use
> of selection have some relation to the discussion about pipe-aware
> selection (see [3] and the references therein). It would solve at
> least my main use cases for pipe-aware selection, without having its
> drawbacks: Stefano considered pipe-aware selection mix of concern,
> selection should be based on meta data (pipeline state) rather than on
> data (pipeline content). There were also some people who didn't like
> my use of buffering of all input to the pipe-aware selector. IMO the
> use of selectors above solves booth of these issues.
I don't see this. Can you please expand here?
[...]
> In Flowscripts
> --------------
>
> IIRC the discussion and examples of input for flowscripts this far has
> mainly dealed with request parameter based input. If we want to use
> flowscripts for describing e.g. web service flow, more advanced input
> handling is needed. IMO it would be an excelent SOC to use output
> pipelines for the presentation of the data used in the system, input
> pipelines for going from input to system data, java objects (or some
> other programming language) for describing business logic working on
> the data within the system, and flowscripts for connecting all this in
> an appropriate temporal order.
Hmmm, this seems like a compelling use case.
Could you please add a concrete use-case/example for this?
Thanks :-)
> For Reuseability Between Blocks
> -------------------------------
>
> There have been some discussions about how to reuse functionality
> between blocks in Cocoon (see the threads [1] and [2] for
> background). IMO (cf. my post in the thread [1]), a natural way of
> exporting pipeline functionality is by extending the cocoon pseudo
> protocol, so that it accepts input as well as produces output. The
> protocol should also be extended so that input as well as output can
> be any octet stream, not just xml.
>
> If we extend generators so that their input can be set by the
> environment (as proposed in the discussion about input pipelines), we
> have what is needed for creating a writable cocoon protocol. The web
> service example in the section "In Sitemaps" could also be used as an
> internal service, exported from a block.
>
> Booth input and output for the extended cocoon protocol can be booth
> xml and non-xml, this give us 4 cases:
>
> xml input, xml output: could be used from a "pipeline"-transformer,
> the input to the transformer is redirected to the protocol and the
> output from the protocol is redirected to the output of the
> transformer.
>
> non-xml input, xml output: could be used from a generator.
>
> xml input, non-xml output: could be used from a serializer.
>
> non-xml input, non-xml output: could be used from a reader if the
> input is ignored, from a "writer" if the output is ignored and from a
> "reader-writer", if booth are used.
>
> Generators that accepts xml should of course also accept sax-events
> for efficiency reasons, and serializer that produces xml should of the
> same reason also be able to produce sax-events.
Also this seems interesting.
Please add concrete examples here to, possibly applied to blocks.
I know it's hard, but it would really help.
It seems that what you propose Cocoon already mostly has, but it's more
the use-case and some minor additions that have to be put forward.
> Conclusion
> ----------
>
> The ability to handle structured input (e.g. xml) in a convenient way,
> will probably be an important requirement on webapp frameworks in the
> near future.
>
> By removing the asymmetry between generators and serializers, by letting
> the input of a generator be set by the context and the output of a
> serializer be set from the sitemap, Cocoon could IMO be as good in
> handling input as it is today in producing output.
Cocoon already does this, no?
Can't we use the cocoon:// protocol to get the results of a pipeline
from another one? What would change?
> This would also make it possible to introduce a writable as well as
> readable Cocoon pseudo protocol, that would be a good way to export
> functionality from blocks.
Please expand on this.
> There are of course many open questions, e.g. how to implement those
> ideas without introducing to much back incompability.
If we see the use cases, it would be much easier.
Your ideas are interesting, and I see too this asymmetry.
If you expand in the aboive areas, it would be really of help for me.
Thanks :-)
>
> References
> ----------
>
> [1] [RT] Using pipeline as sitemap components (long)
> http://marc.theaimsgroup.com/?t=103787330400001&r=1&w=2
>
> [2] [RT] reconsidering pipeline semantics
> http://marc.theaimsgroup.com/?t=102562575200001&r=2&w=2
>
> [3] [Contribution] Pipe-aware selection
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101735848009654&w=2
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org