You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Daniel Fagerstrom <da...@nada.kth.se> on 2003/01/09 19:24:56 UTC

Re: [RT] Better Environment Abstraction

Nicola Ken Barozzi wrote:

>
> I have some problems I need to solve:
>
> 1) I want to use Cocoon as a bean in other programs, as an evolved xml 
> processing bean. I want to create it, set input (stream), set output, 
> and execute.
>
> Problem: I cannot do it with the current Cocoon, without creating my 
> specialized environment to put the input and a Generator that gets 
> from that and generates.
>
> 2) I want to execute transformations of mails in a James mailet using 
> Cocoon.
>
> Problem: I have a similar problem to (1).
> Cocoon doesn't get input directly from the request, unless from a 
> webapp HttpServletRequest, which is not really feasable here.
>
> 3) The StreamGenerator depends on the servlet package, so Cocoon 
> cannot compile without it. So much for blocks and reduced dependencies 
> on core. Cocoon should be able to run in a smaller space, without 
> depending on servlets.
>
> Problem: As we use Cocoon in a container != to a servlet container 
> (Avalon Phoenix, Mod-Cocoon) we loose the StreamGenerator 
> functionality, or portaility of the sitemap, unnecessarily.
>
>
> So I cannot get a stream from the Request in a standard way, though 
> it's IMHO a reasonably common operation, that should thus be abstracted.
>
> With Vadim we have discussed about it a bit and we found that:
>
> 1) not all env. have the need (CLI), but most do 

I don't know much about CLI, so I might be completely of track, but I 
belive that being able to connect the input stream to standard input or 
to a filem from command line could be usefull for:

* Command line testing of web services.
* Writing tools that convert between different file formats.
* Populates your db from e.g. xml documents.

> 2) some env.s (mail) have multiple inputs
> 3) IMHO all env.s have a main input (mail content)
> 4) it's possible to get these from the Request as attributes instead 
> of as a stream. This makes it more flexible because I can pass 
> objects, and more than one. Standard entries can be added. 

I would prefer to put it this way:

* Most (all?) environments (servlet, mail, possibly cli, don't know much 
about jms) have an input stream.

* In some cases the input stream have multipart content. There are 
several different sub types of MIME multipart:

 - multipart/formdata, used in html forms and xforms
 - multipart/mixed, used for email
 - multipart/related, used for SOAP over mail. In the working document: 
SOAP 1.2 Attachment feature from w3c,  they even talk about DIME 
multipart messages as a possible format, (DIME is IIRC, like MIME, but 
with some kind of part/size table in the beginning so that parts can be 
extracted without the need of parsing the whole message).
 - application/x-www-form-urlencoded, is used in html forms and xforms. 
It is not a sub type of multipart, but it is used for transmitting 
key/value pairs.

Most of these multipart formats describe unordered key/value pairs, but 
multipart/related is a little bit more complicated. It consist of a root 
document with references to the other parts, and  the references can be 
booth absolute and relative adresses.

* In the current implementation of the servlet environment, the input 
stream is parsed if it is of type application/x-www-form-urlencoded or 
multipart/formdata and the key/value pairs are put in the request 
attributes.

* IMO getting the input stream and parsing its (possibly) multipart 
content are different concerns. So lets make a getInputStream() method 
available in all environment, a let it be  implemented as 
mimeMessage.getInputStream() for mail, and request.getInputStream for 
servlet. The multipart parsing could then be done in some source sub 
protocols, multipartinput:related://foo/bar.gif, or in specialized 
modules or maybe in a generator. Of course the current handling of 
multipart/formdata should be kept in the servlet environment, but I 
think it is to html specific to be used as a model for all environments.

> 5) getting stuff from a Request attribute means that I need to parse 
> all the request. This increases memory usage, but sometimes is 
> inevitable, because of the protocol used (mail attachments) 

Yes, not much to do about it. The multipart DIME format mentioned above 
is designed for random acces without having to parse all of the input.

> 6) getting from a stream can make it easy to make it more efficient on 
> input->output transformations, typical of web services. 

Yes, this will most likely be the dominating use case, so lets focus on 
that. Even if the handling of multipart messages in the general case 
(i.e. outside the servlet environment or using other sub types than 
formdata) also might be important it is IMO a different concern and can 
be handled later when somebody need it.

> So, from these, I seem to think that we could
>
> 1) add a getInputStream() to the Request

+1.

Today, the Request interface contains the methods getContentLength(), 
and getContentType(), so that you can ask about the length of and the 
type of the input stream but not the content of it. Strange IMHO.

Furthermore i think that getInputStream() and the set and get methods 
for its length and content in the Environment interface. From point 1) 
in the begining of your mail I guess that you have the same opinion.

> 2) make other input features available through request attributes

This is more a question about how to implement the interface, as the 
Request interface allready contain the necessary methods. As I said 
above, IMO multipart parsing should be a responsablity for sources, 
modules or generators, and not something that is automatically done by 
the environment.

> In case of mail
>
> 1) the mail content goes through getInputStream()
> 2) the attachements go through Request attributes 

getInputStream() returns the mime multipart stream.
 - "input://" is connected to a source that also returns the mime 
multipart stream.
 - "multipartinput:mixed://" is a  TraversableSource and makes it 
possible to list the content of the multipart.
 -  "multipartinput:mixed://1" is the stream of the first attachment.

/Daniel Fagerstrom



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org