You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2003/03/16 14:39:16 UTC

[proposal] fixing the encoding problems

Cocoon is heavily internationalized but we fail to do one thing: signal 
the proper encoding to the user-agent thru HTTP headers, which is the 
most reliable way of doing it.

the current *hack* is to use <meta> tags in the HTML stream, these are 
interpreted by the HTTP server stack and transfered as HTTP headers. but 
this creates many problems and concern mixes.

Vadim suggested to set the headers from the serializers, but I think 
there is a better alternative.

So I propose to add the method

  getEncoding()

to the interface

  org.apache.cocoon.sitemap

Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
> 
> <snip/>
> 
>> I restate:
>>
>> 1) I want a way for serializers to indicate to the pipeline what is 
>> the encoding they will be using, so that the pipeline can set the 
>> right HTTP header for it.
>>
>> 2) also, i want a way to overwrite the sitemap-wide behavior of every 
>> single serializers, locally, such as
>>
>>  <map:serialize encoding="UTF-8"/>
>>
>> when the global serializer configurations state they will be using 
>> something else.
>>
>> Is the proposal clear enough?
> 
> 
> 
> Sure ;-)
> 
> However, we also have to consider that serializers basically produce 
> binary data (e.g svg2png) for which the encoding has no meaning. So 
> should there be a new kind of serializers (TextSerializer ?) that gets a 
> Writer instead of an OutputStream ?
> 
> This would allow for the encoding to be handled directly and totally by 
> the pipeline engine, which would use the proper encoding to build the 
> TextSerializer's Writer.

This is a good point.

Also raises another architectural point: why do we have an empty 
Serializer interface that inherits the SitemapOutputComponent interface?

Stefano.



Re: [proposal] fixing the encoding problems

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:

<snip/>

> I restate:
>
> 1) I want a way for serializers to indicate to the pipeline what is 
> the encoding they will be using, so that the pipeline can set the 
> right HTTP header for it.
>
> 2) also, i want a way to overwrite the sitemap-wide behavior of every 
> single serializers, locally, such as
>
>  <map:serialize encoding="UTF-8"/>
>
> when the global serializer configurations state they will be using 
> something else.
>
> Is the proposal clear enough?


Sure ;-)

However, we also have to consider that serializers basically produce 
binary data (e.g svg2png) for which the encoding has no meaning. So 
should there be a new kind of serializers (TextSerializer ?) that gets a 
Writer instead of an OutputStream ?

This would allow for the encoding to be handled directly and totally by 
the pipeline engine, which would use the proper encoding to build the 
TextSerializer's Writer.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: [VOTE] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Stefano Mazzocchi wrote:

> Vadim Gritsenko wrote:
>
>> Pier Fumagalli wrote:
>>
>>> On 17/3/03 20:15, "Stefano Mazzocchi" <st...@apache.org> wrote: 
>>
>> <snip/>
>>
>>>> What about adding a 'content encoding' attribute to the 'pipeline' 
>>>> instead?   
>>>
>>
>> 'transfer encoding' attribute would make sense, but content encoding...
>
>
> wait wait wait
>
> I'm using Pier's terminology here: for 'content encoding' I'm thinking 
> about 'gzip' stuff, not 'UTF-8'. 


Ah, Ok, my bad; encoding="gzip" is ok.

Vadim



Re: [VOTE] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Vadim Gritsenko wrote:
> Pier Fumagalli wrote:
> 
>> On 17/3/03 20:15, "Stefano Mazzocchi" <st...@apache.org> wrote:
>>  
>>
> <snip/>
> 
>>> What about adding a 'content encoding' attribute to the 'pipeline' 
>>> instead?
>>>   
> 
> 
> 'transfer encoding' attribute would make sense, but content encoding... 

wait wait wait

I'm using Pier's terminology here: for 'content encoding' I'm thinking 
about 'gzip' stuff, not 'UTF-8'.

> Currently pipeline does not alter content, does not affect content.

Right.

> Addition of this attribute will bring content aspect to the pipeline 
> concern... Do we really want it?

No, in fact, I think you got my terminology wrong.

I'm envisioning something like

  <map:pipeline encoding="gzip">
   ...
  </map:pipeline>

but I don't like "encoding", any idea for a better attribute?

>> Right now, to make things work in 99.9% of the cases, I'd say that we add
>> this to the pipeline, and right now, we enforce this onto the client. 
>> So, no
>> matter what he asked in the different "Accept*" headers, we deliver them
>> what _we_ want...
>>
> 
> Same could be done now with serializer configuration, right?
> 
> 
>> As negotiating the encoding/type/language is something that will also 
>> affect
>> widely our cache, I propose to defer this to Cocoon 2.2 (or later) 
>> when the
>> problem can be analyzed thoroughly in more details...
>>  
>>
> 
> +1
> 
> 
>> Now, does anyone have suggestions on how to retrieve the "charset" 
>> value out
>> of <map:pipeline> ??? Where would be a nice place in our API?
>>  
>>
> 
> PipelineNodeBuilder. But I would postpone such a thing to Cocoon 2.2 too.

Right, we already have enough irons in the fire.

Stefano.


Re: [VOTE] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Pier Fumagalli wrote:

>On 17/3/03 20:15, "Stefano Mazzocchi" <st...@apache.org> wrote:
>  
>
<snip/>

>>What about adding a 'content encoding' attribute to the 'pipeline' instead?
>>    
>>

'transfer encoding' attribute would make sense, but content encoding... 
Currently pipeline does not alter content, does not affect content. 
Addition of this attribute will bring content aspect to the pipeline 
concern... Do we really want it?


>Right now, to make things work in 99.9% of the cases, I'd say that we add
>this to the pipeline, and right now, we enforce this onto the client. So, no
>matter what he asked in the different "Accept*" headers, we deliver them
>what _we_ want...
>

Same could be done now with serializer configuration, right?


>As negotiating the encoding/type/language is something that will also affect
>widely our cache, I propose to defer this to Cocoon 2.2 (or later) when the
>problem can be analyzed thoroughly in more details...
>  
>

+1


>Now, does anyone have suggestions on how to retrieve the "charset" value out
>of <map:pipeline> ??? Where would be a nice place in our API?
>  
>

PipelineNodeBuilder. But I would postpone such a thing to Cocoon 2.2 too.

Vadim

>On a side not, currently, "AbstractTextSerializer" relies on Xalan as its
>way to generate content, but, IMO, there could be other text serializers not
>needing it (ok, it's JAXP! :-)
>
>    Pier
>



Re: [VOTE] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
Sorry for the subject ... I was intended on another "kind" email, but when I
realized that I still had a couple of gray areas that I wanted to address, I
rewrote it forgetting the title...

    Pier




[VOTE] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 17/3/03 20:15, "Stefano Mazzocchi" <st...@apache.org> wrote:

>> On another thought... The cache should store unicode characters "as is", not
>> bytes, as those might change for the same request URL depending on the
>> different headers in the request...
> 
> Uh, another good point.

Ok, in the light of what Dirk said regarding to the "Accept-Encoding", I'd
propose an interim solution that is not fully "http" compliant, but at least
is better than what we've got at the moment...

I believe that we should go with what you said in another reply to this
thread:

> What about adding a 'content encoding' attribute to the 'pipeline' instead?

Right now, to make things work in 99.9% of the cases, I'd say that we add
this to the pipeline, and right now, we enforce this onto the client. So, no
matter what he asked in the different "Accept*" headers, we deliver them
what _we_ want...

As negotiating the encoding/type/language is something that will also affect
widely our cache, I propose to defer this to Cocoon 2.2 (or later) when the
problem can be analyzed thoroughly in more details...

Now, does anyone have suggestions on how to retrieve the "charset" value out
of <map:pipeline> ??? Where would be a nice place in our API?

On a side not, currently, "AbstractTextSerializer" relies on Xalan as its
way to generate content, but, IMO, there could be other text serializers not
needing it (ok, it's JAXP! :-)

    Pier


Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Pier Fumagalli wrote:
> On 16/3/03 23:38, "Vadim Gritsenko" <va...@verizon.net> wrote:
> 
>>>true. but you can't have chinese text in US-ASCII, right?
>>
>>Even if you can not that anybody will be able to read it ;-)
>>So yes, right.
> 
> 
> Unicode specifes (somewhere) that any character non representable by the
> current charset-encoding should be replaced with a "?" (\u003f) which exists
> in all representations...
> 
> 
>>>>But I am not convinced that it's sitemap's responsibility to worry
>>>>about encoding (from SoC POV).
>>>
>>>I restate:
>>>
>>>1) I want a way for serializers to indicate to the pipeline what is
>>>the encoding they will be using, so that the pipeline can set the
>>>right HTTP header for it.
>>
>>+-0, I'm not sure (yet) on this one...
> 
> 
> I am almost sure that it should be made all-the-way around: the client can
> request a specific encoding to the server: See RFC 2616 section 14.2 page
> 102: the Accept-Charset header.
> 
> I believe that the TextSerializer should return what the client asked in its
> request through the "Accept-Charset" header, if this is present.
> 
> It it isn't, it should default to what has been specified in the pipeline
> (if we use <map:serialize charset="xxxx"/>) or default to the "cocoon
> global" configuration...

Oh, that's right, I forgot about the client 'forcing' a charset. Great 
point.

>>>2) also, i want a way to overwrite the sitemap-wide behavior of every
>>>single serializers, locally, such as
>>>
>>> <map:serialize encoding="UTF-8"/>
>>>
>>>when the global serializer configurations state they will be using
>>>something else.
>>
>>But this one is Ok with me and, more over, in line with earlier decision:
>>http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2
> 
> 
> I'd say to use this only if the client didn't request a particular
> encoding...
> 
> On another thought... The cache should store unicode characters "as is", not
> bytes, as those might change for the same request URL depending on the
> different headers in the request...

Uh, another good point.

Stefano.


Re: [proposal] fixing the encoding problems

Posted by Niclas Hedhman <ni...@internuscorp.com>.
On Tuesday 18 March 2003 05:25, Pier Fumagalli wrote:
> If, for example, in my corporation there are two guys, one using Windows in
> jp and one using Linux in en_US, if the first guy requests
> "http://www.vnunet.com/", I'll deliver the page the first time in jp,
> encoded in shift_jis (let's not track content-type for a sec).
>
> Now, when the second guy requests the same page, I'd have to send it in
> en_US maybe encoded in iso-8859-1...
>
> But my corporation proxy (or the cocoon cache), will cache the first
> version it hits, so, to both of them, I'll end up serving the same Japanese
> shift_jis content...

That's why almost noone uses the Accept headers to determine the served 
language or encoding, but use different URL spaces. HTTP's biggest mess-up 
IMHO.

To make matters worse, many ISPs (at least in Asia) has transparent caches 
that doesn't handle the Expires headers very well either, and you content 
becomes static for (among Malaysian ISPs) 24 hours.

Niclas

Re: [proposal] fixing the encoding problems

Posted by Dirk-Willem van Gulik <di...@webweaving.org>.
> Vary: *
> which effectively disables any caching...

You bet :-) Though now one said that 'source IP' was a valid vary ;-)

Dw


Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Pier Fumagalli wrote:
<snip/>

>But there is a problem... Proxies and caches...
>  
>

AFAIK (took a look at spec too ;):

>If, for example, in my corporation there are two guys, one using Windows in
>jp and one using Linux in en_US, if the first guy requests
>"http://www.vnunet.com/", I'll deliver the page the first time in jp,
>encoded in shift_jis (let's not track content-type for a sec).
>

Vary: Content-Encoding


>Now, when the second guy requests the same page, I'd have to send it in
>en_US maybe encoded in iso-8859-1...
>

Vary: Content-Encoding


>But my corporation proxy (or the cocoon cache), will cache the first version
>it hits, so, to both of them, I'll end up serving the same Japanese
>shift_jis content...
>

Now, proxy will have 2 objects as it knows that responses were different.


>Not good... Needs more thinking indeed...
>

:)

PS Worse case schenario is
Vary: *
which effectively disables any caching...


Vadim

>    Pier
>



Re: [proposal] fixing the encoding problems

Posted by Dirk-Willem van Gulik <di...@webweaving.org>.
> It gets quite complicated, because for the same URL the client might request
> a Japanese, shift_jis, text/html view, while another might request a simple
> image/jpeg...

> It basically implies that the URL is a resource _for_real_ and that the

Resource -> 'semantics' or 'the bit of info'.

> client can decide the way in which he wants to receive it..

Yes, i.e. which 'rendition'.

But it gets worse.. (or more complex).. some browsers say that they are
fine with */*; i.e. just everything, which in actually practice makes a
real live implementation a bit more complex than needed.

> But there is a problem... Proxies and caches...
..
> jp and one using Linux in en_US, if the first guy requests
> "http://www.vnunet.com/", I'll deliver the page the first time in jp,
> encoded in shift_jis (let's not track content-type for a sec).
..
> it hits, so, to both of them, I'll end up serving the same Japanese
> shift_jis content...
..
> Not good... Needs more thinking indeed...

Correct; this is (or was) a common issue with proxies. Most proxies and
caches now get this right.

Dw


Re: [proposal] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 17/3/03 18:23, "Dirk-Willem van Gulik" <di...@webweaving.org> wrote:

>> I am almost sure that it should be made all-the-way around: the client can
>> request a specific encoding to the server: See RFC 2616 section 14.2 page
>> 102: the Accept-Charset header.
> 
> Or an _ordered_list_ of those as input. See also the Languages while you
> are at it;  and the Accept: type as well - they are all dimensions of the
> same problem. And they are not orthogonal; i.e. there is an easy semantic
> coupling between languages and charset - and the Accept list may prompt
> you to send a gif or pdf in some cases.

Yes... You're absolutely right... I was re-reading that part of HTTP on the
tube today, and it gets pretty nasty at that point...

Basically, correct me if I'm wrong, from what I understand the client sends
a list of "preferred" encodings, while the application should "negotiate"
charset, language and type...

It gets quite complicated, because for the same URL the client might request
a Japanese, shift_jis, text/html view, while another might request a simple
image/jpeg...

It basically implies that the URL is a resource _for_real_ and that the
client can decide the way in which he wants to receive it..

>> On another thought... The cache should store unicode characters "as is", not
>> bytes, as those might change for the same request URL depending on the
>> different headers in the request...
> 
> You'd have to track which Accept, Accept-Language and Accept-Charset you
> negotiated on. As applications may (also) do i18n and localizations
> optimizations such as swapping ',' into '.' or abusing charsets and doing
> locale specific normalizations of the unicode cast.

Yes yes yes...

But there is a problem... Proxies and caches...

If, for example, in my corporation there are two guys, one using Windows in
jp and one using Linux in en_US, if the first guy requests
"http://www.vnunet.com/", I'll deliver the page the first time in jp,
encoded in shift_jis (let's not track content-type for a sec).

Now, when the second guy requests the same page, I'd have to send it in
en_US maybe encoded in iso-8859-1...

But my corporation proxy (or the cocoon cache), will cache the first version
it hits, so, to both of them, I'll end up serving the same Japanese
shift_jis content...

Not good... Needs more thinking indeed...

    Pier


Re: [proposal] fixing the encoding problems

Posted by Dirk-Willem van Gulik <di...@webweaving.org>.
> I am almost sure that it should be made all-the-way around: the client can
> request a specific encoding to the server: See RFC 2616 section 14.2 page
> 102: the Accept-Charset header.

Or an _ordered_list_ of those as input. See also the Languages while you
are at it;  and the Accept: type as well - they are all dimensions of the
same problem. And they are not orthogonal; i.e. there is an easy semantic
coupling between languages and charset - and the Accept list may prompt
you to send a gif or pdf in some cases.

> On another thought... The cache should store unicode characters "as is", not
> bytes, as those might change for the same request URL depending on the
> different headers in the request...

You'd have to track which Accept, Accept-Language and Accept-Charset you
negotiated on. As applications may (also) do i18n and localizations
optimizations such as swapping ',' into '.' or abusing charsets and doing
locale specific normalizations of the unicode cast.

Dw.


Re: [proposal] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 16/3/03 23:38, "Vadim Gritsenko" <va...@verizon.net> wrote:
> 
>> true. but you can't have chinese text in US-ASCII, right?
> 
> Even if you can not that anybody will be able to read it ;-)
> So yes, right.

Unicode specifes (somewhere) that any character non representable by the
current charset-encoding should be replaced with a "?" (\u003f) which exists
in all representations...

>>> But I am not convinced that it's sitemap's responsibility to worry
>>> about encoding (from SoC POV).
>> 
>> I restate:
>> 
>> 1) I want a way for serializers to indicate to the pipeline what is
>> the encoding they will be using, so that the pipeline can set the
>> right HTTP header for it.
> 
> +-0, I'm not sure (yet) on this one...

I am almost sure that it should be made all-the-way around: the client can
request a specific encoding to the server: See RFC 2616 section 14.2 page
102: the Accept-Charset header.

I believe that the TextSerializer should return what the client asked in its
request through the "Accept-Charset" header, if this is present.

It it isn't, it should default to what has been specified in the pipeline
(if we use <map:serialize charset="xxxx"/>) or default to the "cocoon
global" configuration...

>> 2) also, i want a way to overwrite the sitemap-wide behavior of every
>> single serializers, locally, such as
>> 
>>  <map:serialize encoding="UTF-8"/>
>> 
>> when the global serializer configurations state they will be using
>> something else.
> 
> But this one is Ok with me and, more over, in line with earlier decision:
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2

I'd say to use this only if the client didn't request a particular
encoding...

On another thought... The cache should store unicode characters "as is", not
bytes, as those might change for the same request URL depending on the
different headers in the request...

    Pier


Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Stefano Mazzocchi wrote:
<snip/>

>> And, any of these are totally independent from the 
>> internationalization. Internationalization affects language used to 
>> produce output, but not how the text in this language is encoded 
>> (UTF8, UTF16, ISO-1859-1, what-have-you).
>
>
> true. but you can't have chinese text in US-ASCII, right? 


Even if you can not that anybody will be able to read it ;-)
So yes, right.


> my point is that having globally-balanced hooks for encoding will 
> allow cocoon to be even more friendly for non-latin-charset needs. I 
> used i18n out of context here, sorry. 


<snip/>


> I didn't propose that, where did you get that impression? 


Your original email was kind of short:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104782195022862&w=2
So I guessed things up a bit.


>> But I am not convinced that it's sitemap's responsibility to worry 
>> about encoding (from SoC POV).
>
>
> I restate:
>
> 1) I want a way for serializers to indicate to the pipeline what is 
> the encoding they will be using, so that the pipeline can set the 
> right HTTP header for it.


+-0, I'm not sure (yet) on this one...


> 2) also, i want a way to overwrite the sitemap-wide behavior of every 
> single serializers, locally, such as
>
>  <map:serialize encoding="UTF-8"/>
>
> when the global serializer configurations state they will be using 
> something else.


But this one is Ok with me and, more over, in line with earlier decision:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2


> Is the proposal clear enough?


Clearer, yes :)

Vadim


> Stefano.




Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Pier Fumagalli wrote:
> On 16/3/03 20:04, "Stefano Mazzocchi" <st...@apache.org> wrote:
> 
>>>So, if you to put encoding into sitemap... You will have to disable
>>>serializer configuration and request configuration and force sitemap
>>>encoding onto request / response. Is this what you are proposing?
>>
>>nonononononooo
>>
>>please, read again, my proposal, i think it's pretty clear.
> 
> 
> Stefano, I believe your proposal got to the list chopped up big time,
> because what Vadim quoted is _ALL_ I've got as well, and really I don't
> understand what you want to do.

Uh, than sorry.

[big snip on well detailed encoding things]

> To rewrite what he said with the above mentioned three-layer encoding in
> mind:
> 
> - the servlet container/mail engine/whatever will take care of the "Transfer
>   Encoding" (Cocoon as an application should not care nor interfere with
>   it).

Right.

> - ALL serializers should have the ability to deal with "Content Encoding",
>   unless (that would be my preferred option, as 90% of the times we think
>   about deploying things over servlets) we don't want to "recommend" the use
>   of "servlet filters" to do things such as GZIP encoding of the content.

In the past, I've been suggesting people to go down the servlet filter 
path, but I'm getting more and more to think that servlet filters are 
totally useless crap that can possibly work only for a few things and 
are overdesigned for what they can do.

So, I'm all in favor to provide internal alternatives.

You suggest to add a property to the serializer, but I think this is 
*NOT* a serializer's concer, but a higher level concern.

What about adding a 'content encoding' attribute to the 'pipeline' instead?

A pipeline provides a context of processing behavior. I think it fits 
perfectly with what we need and we don't even have to modify the 
serializers because all the stuff will be done by the pipeline engine 
that assembles the pipelines and creates the final response.

> - TEXT-based serializers should think about "charset encoding" and are the
>   only ones which should do that.

Right.

> So, in my opinion, the "best" way to tackle the charset-encoding problem is
> to have the org.apache.cocoon.serialization.AbstractTextSerializer to
> receive an OutputStream from its implementation of the
> SitemapOutputComponent interface, but to expose to its solid implementations
> another couple of methods, instead of "getOutputStream":
> 
> - String getCharsetEncoding() [or getCharacterEncoding]:
>     
>     Returns the default character encoding configured for the specified
>     AbstractTextSerializer (or the default one for the sitemap if none
>     was specified).
>     This can be usefult (for example) in the HtmlSerializer so that a new
>     <meta http-equiv="Content-Type" content="text/html; charset=???"/>
>     tag can be added automagically to the output, or to the "XMLSerializer"
>     so that the "<?xml version="1.0" encoding="???"?>" initial processing
>     instruction can be constructed appropriately.
> 
> - Writer getWriter():
> 
>     Returns a java.io.Writer encoding character data to the response output
>     stream according to whatever is returned by getCharsetEncoding

Sounds good to me.

> Those two should be controlled from the sitemap by (as you, Stefano, said):
> 
> 
>>2) also, i want a way to overwrite the sitemap-wide behavior of every
>>single serializers, locally, such as
>>
>> <map:serialize encoding="UTF-8"/>
> 
> 
> The only "nitpick" I have is that since "encoding" means a lot of things,
> this should be called "charset" (which is way more specific)...

very good point, I agree.

> This can be easily picked up by the AbstractTextSerializer.configure()
> method and returned by the two methods added above...

Right.

> I can work on a patch if you guys want... It's pretty trivial indeed...

Cool.

Stefano.



Re: [proposal] fixing the encoding problems

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote, On 19/03/2003 14.41:
> Nicola Ken Barozzi wrote:
> 
>> Stefano Mazzocchi wrote, On 18/03/2003 13.05:
>>
>>> Nicola Ken Barozzi wrote:
>> ...
>>>> So, how would you tackle the above real-world problem?
>>>
>>> I would not write a transformer but a serializer. 
...
>>> What's wrong with this?
>>
>> I cannot configure it.
> 
> ???? serializers are configurable just like any other component.

Ok, I'm not clear, let's see if this is better :-)

- Local redifinition if serializer parameters:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104281144016447&w=2
http://marc.theaimsgroup.com/?t=101806758100001&r=1&w=2

example:
"
  <map:match pattern="doc/utf/*.xml">
    <map:generate  src="docs/*.xml"/>
    <map:transform  src="doc-page2html.xsl"/>
    <map:serialize type="html">
     <encoding>UTF-8</encoding>
    </map:serialize>
  </map:match>
"

- Access to ComponentManager

http://www.mail-archive.com/cocoon-dev@xml.apache.org/msg14047.html


- Access to Sourceresolver to use external configuration files

- Access to values in the Environment/objectmodel


-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: [proposal] fixing the encoding problems

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Gianugo Rabellino wrote:

> Stefano Mazzocchi wrote:
>
>>
>> If you start adding the environment, this is not true anymore and we 
>> must cache *BOTH* the pipeline output (as xml) and the serializer 
>> output (as binary) because their ergodicity can be different.
>>
>> This is the only concern I'm having.
>>
>> If enough people believe this is a small price to pay, well, I'll 
>> turn my -1 into a -0 for giving Environment access to the Serializers.
>>
>
> Given that caching seems to be an almost-non-issue and that we seem to 
> be pretty convinced that this would not break the overall design, 
> here's +1 to add SitemapModelComponent to serializer into 
> SitemapModelComponent. Of course we have to take into account a 
> migration path for all the existing serializers (though it should be 
> enough to add empty impls to AbstractSerializer: AFAIK every concrete 
> Serializer in Cocoon inherits from it).


Mmmh... are you sure every Serializer extends AbstractSerializer ? Maybe 
our's, but aren't there some custom serializers out in the wild world 
that don't ?

It would be better for the pipeline to handle the optional case where a 
Serializer also implements SitemapModelComponent.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: [proposal] fixing the encoding problems

Posted by Gianugo Rabellino <gi...@apache.org>.
Stefano Mazzocchi wrote:

> 
> If you start adding the environment, this is not true anymore and we 
> must cache *BOTH* the pipeline output (as xml) and the serializer output 
> (as binary) because their ergodicity can be different.
> 
> This is the only concern I'm having.
> 
> If enough people believe this is a small price to pay, well, I'll turn 
> my -1 into a -0 for giving Environment access to the Serializers.
> 

Given that caching seems to be an almost-non-issue and that we seem to 
be pretty convinced that this would not break the overall design, here's 
+1 to add SitemapModelComponent to serializer into 
SitemapModelComponent. Of course we have to take into account a 
migration path for all the existing serializers (though it should be 
enough to add empty impls to AbstractSerializer: AFAIK every concrete 
Serializer in Cocoon inherits from it).

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l.
http://www.pro-netics.com


Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Gianugo Rabellino wrote:
> Stefano Mazzocchi wrote:
> 
>> Paul Duffin wrote:
>>
>>> A problem that I ran into was that Serializers do not have access to 
>>> the environment (Request / Response). This means that it is very hard 
>>> to write sophisticated Serializers.
>>
>>
>>
>> For example? (I think FOP and batik are both pretty sofisticated 
>> serializers)
> 
> 
> Well... not that much actually: they are both ~200 lines of Java code 
> and all they do is delegate to the underlying framework the XML events 
> grabbing an OutputStream to write to. Agreed, SLOC are a sloppy metric 
> but then again...
> 
> The real keyword, however, is not "sophisticated" in an algotithmic 
> sense but "context aware": if a Serializer is not a 
> SitemapModelComponent than there is no way it can decide based on 
> context, be it a Request or any other information. We've been through 
> this many times, and I wander from one side to the other so I have no 
> clear decision in mind...
> 
>>> We worked around this by using it in conjunction with a Transformer 
>>> that was given the environment and simply passed it on to the 
>>> Serializer.
>>
>>
>>
>> The problem with having an environment-dependent serializer is that 
>> the cache needs access to it because it might change its behavior 
>> depending on environment parameters.
> 
> 
> Can't parse this. Care to explain?

The problem I'm having with Serializers having access to the environment 
is that they stop becoming non-ergodic.

Right now, serializers don't depend on run-time parameters. this means 
that if I save the exit of the serializer, I can avoid saving (in cache) 
the previous stage because they will be co-ergodic (if one changes, the 
other changes).

If you start adding the environment, this is not true anymore and we 
must cache *BOTH* the pipeline output (as xml) and the serializer output 
(as binary) because their ergodicity can be different.

This is the only concern I'm having.

If enough people believe this is a small price to pay, well, I'll turn 
my -1 into a -0 for giving Environment access to the Serializers.

Thoughts?



Re: [proposal] fixing the encoding problems

Posted by Paul Duffin <pd...@volantis.com>.
Gianugo Rabellino wrote:
> Stefano Mazzocchi wrote:
> 
>> Paul Duffin wrote:
>>
>>> A problem that I ran into was that Serializers do not have access to 
>>> the environment (Request / Response). This means that it is very hard 
>>> to write sophisticated Serializers.
>>
>>
>>
>> For example? (I think FOP and batik are both pretty sofisticated 
>> serializers)
> 
> 
> Well... not that much actually: they are both ~200 lines of Java code 
> and all they do is delegate to the underlying framework the XML events 
> grabbing an OutputStream to write to. Agreed, SLOC are a sloppy metric 
> but then again...
> 
> The real keyword, however, is not "sophisticated" in an algotithmic 
> sense but "context aware": if a Serializer is not a 
> SitemapModelComponent than there is no way it can decide based on 
> context, be it a Request or any other information. We've been through 
> this many times, and I wander from one side to the other so I have no 
> clear decision in mind...
> 

You are correct I meant context aware.

>>> We worked around this by using it in conjunction with a Transformer 
>>> that was given the environment and simply passed it on to the 
>>> Serializer.
>>
>>
>>
>> The problem with having an environment-dependent serializer is that 
>> the cache needs access to it because it might change its behavior 
>> depending on environment parameters.
> 
> 
> Can't parse this. Care to explain?
> 

I guess the caching mechanism needs to decide whether the output from 
this request will be the same as the output from a previous request. If 
the output is context sensitive then the caching mechanism needs to use 
those parts of the request that affect the output as part of its key.

However this problem applies to any site map components that are context 
sensitive, not just serializers. I presume (I have not used the cache) 
that this problem has been solved already.

So while this possibly impacts the cost of making the changes it is not 
really an argument against allowing serializers to be context aware.


Re: [proposal] fixing the encoding problems

Posted by Gianugo Rabellino <gi...@apache.org>.
Stefano Mazzocchi wrote:
> Paul Duffin wrote:
> 
>> A problem that I ran into was that Serializers do not have access to 
>> the environment (Request / Response). This means that it is very hard 
>> to write sophisticated Serializers.
> 
> 
> For example? (I think FOP and batik are both pretty sofisticated 
> serializers)

Well... not that much actually: they are both ~200 lines of Java code 
and all they do is delegate to the underlying framework the XML events 
grabbing an OutputStream to write to. Agreed, SLOC are a sloppy metric 
but then again...

The real keyword, however, is not "sophisticated" in an algotithmic 
sense but "context aware": if a Serializer is not a 
SitemapModelComponent than there is no way it can decide based on 
context, be it a Request or any other information. We've been through 
this many times, and I wander from one side to the other so I have no 
clear decision in mind...

>> We worked around this by using it in conjunction with a Transformer 
>> that was given the environment and simply passed it on to the Serializer.
> 
> 
> The problem with having an environment-dependent serializer is that the 
> cache needs access to it because it might change its behavior depending 
> on environment parameters.

Can't parse this. Care to explain?

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l.
http://www.pro-netics.com


Re: [proposal] fixing the encoding problems

Posted by Santiago Gala <sg...@hisitech.com>.
Stefano Mazzocchi wrote:

> 
> The problem with having an environment-dependent serializer is that the 
> cache needs access to it because it might change its behavior depending 
> on environment parameters.
> 

IIRC, the reason why serializers are "special" WRT cache stuff is 
exactly this (i.e. circular reasoning alarm sounding): they are not 
depending on the environment, and they act as blackboxes.

OTOH, the environment-dependent serializer will need to provide the 
right Validity for the cache and that's it. Am I wrong?

It looks to me that the issues are mostly potential abuse/Separation of 
Concerns, and I think they are obsolete by now:

* People could write pipelines of just generation -> 
specialSerializerForMyUncleBrowser, subverting the whole architecture 
(but they should not be using cocoon for this, there are simpler ways to 
shoot yourself in the foot) ;-)
* Also, overoptimization in the cache layer (you wanted to asume 
serializers are ergodic, to save the dual caching and avoid potential 
performance problems, since some serializers *are* expensive)

> Stefano.
> 

-- 
Santiago Gala
High Sierra Technology, S.L. (http://hisitech.com)
http://memojo.com?page=SantiagoGalaBlog



Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Carsten Ziegeler wrote:
> Vadim Gritsenko wrote:
> 
>>One word: CacheableProcessingComponent. IIRC, cache was aware of 
>>cacheable serializers some time ago. The only missing piece is to add 
>>SitemapModelComponent support for Serializers.
>>
> 
> Yes, the caching algorithm queries serializers if they support caching 
> since more than two years now ;)

Uh, you never stop learning.

:/



RE: [proposal] fixing the encoding problems

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Vadim Gritsenko wrote:
> 
> One word: CacheableProcessingComponent. IIRC, cache was aware of 
> cacheable serializers some time ago. The only missing piece is to add 
> SitemapModelComponent support for Serializers.
> 
Yes, the caching algorithm queries serializers if they support caching 
since more than two years now ;)

Carsten

Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Stefano Mazzocchi wrote:

> Paul Duffin wrote:
>
>> A problem that I ran into was that Serializers do not have access to 
>> the environment (Request / Response). This means that it is very hard 
>> to write sophisticated Serializers.
>
>
> For example? (I think FOP and batik are both pretty sofisticated 
> serializers)
>
>> We worked around this by using it in conjunction with a Transformer 
>> that was given the environment and simply passed it on to the 
>> Serializer.
>
>
> The problem with having an environment-dependent serializer is that 
> the cache needs access to it because it might change its behavior 
> depending on environment parameters.


One word: CacheableProcessingComponent. IIRC, cache was aware of 
cacheable serializers some time ago. The only missing piece is to add 
SitemapModelComponent support for Serializers.

And Stefano, you already agreed on this change.

Vadim


> Stefano.




Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Paul Duffin wrote:
> A problem that I ran into was that Serializers do not have access to the 
> environment (Request / Response). This means that it is very hard to 
> write sophisticated Serializers.

For example? (I think FOP and batik are both pretty sofisticated 
serializers)

> We worked around this by using it in conjunction with a Transformer that 
> was given the environment and simply passed it on to the Serializer.

The problem with having an environment-dependent serializer is that the 
cache needs access to it because it might change its behavior depending 
on environment parameters.

Stefano.



Re: [proposal] fixing the encoding problems

Posted by Paul Duffin <pd...@volantis.com>.
A problem that I ran into was that Serializers do not have access to 
the environment (Request / Response). This means that it is very hard to 
write sophisticated Serializers.

We worked around this by using it in conjunction with a Transformer that 
was given the environment and simply passed it on to the Serializer.

Stefano Mazzocchi wrote:
> Nicola Ken Barozzi wrote:
> 
>>
>>
>> Stefano Mazzocchi wrote, On 18/03/2003 13.05:
>>
>>> Nicola Ken Barozzi wrote:
>>
>>
>> ...
>>
>>>> So, how would you tackle the above real-world problem?
>>>
>>>
>>>
>>> I would not write a transformer but a serializer. In fact, a chart 
>>> package image rendere *is* a serializer, since the output of a chart 
>>> transformer will not need to be further processed anyway.
>>
>>
>>
>> Hmmm, this seems different from you have said till now AFAIK.
>> Wasn's a serializer just an "adaptor"?
> 
> 
> Yes, an adaptor between an xml-driven inside and a binary-driven 
> outside. It receives xml and produces binary. It receives an xml 
> description of a chart and produces the raster image.
> 
>>> What's wrong with this?
>>
>>
>>
>> I cannot configure it.
> 
> 
> ???? serializers are configurable just like any other component.
> 
> Stefano.
> 
> 



Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:
> 
> 
> Stefano Mazzocchi wrote, On 18/03/2003 13.05:
> 
>> Nicola Ken Barozzi wrote:
> 
> ...
> 
>>> So, how would you tackle the above real-world problem?
>>
>>
>> I would not write a transformer but a serializer. In fact, a chart 
>> package image rendere *is* a serializer, since the output of a chart 
>> transformer will not need to be further processed anyway.
> 
> 
> Hmmm, this seems different from you have said till now AFAIK.
> Wasn's a serializer just an "adaptor"?

Yes, an adaptor between an xml-driven inside and a binary-driven 
outside. It receives xml and produces binary. It receives an xml 
description of a chart and produces the raster image.

>> What's wrong with this?
> 
> 
> I cannot configure it.

???? serializers are configurable just like any other component.

Stefano.



Re: [proposal] fixing the encoding problems

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote, On 18/03/2003 13.05:
> Nicola Ken Barozzi wrote:
...
>> So, how would you tackle the above real-world problem?
> 
> I would not write a transformer but a serializer. In fact, a chart 
> package image rendere *is* a serializer, since the output of a chart 
> transformer will not need to be further processed anyway.

Hmmm, this seems different from you have said till now AFAIK.
Wasn's a serializer just an "adaptor"?

> What's wrong with this?

I cannot configure it.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:

> When my personal need comes, I surely will, although now I have other 
> things to do. If others want to write a more detailed proposal (Luca for 
> example) please do.
> 
> The *real* fact is that if I do:
> 
>   xml data -> chart transformer -> batik -> png
> 
> It's 10x SLOWER than
> 
>   xml data -> chart serializer -> png
> 
> This is not peanuts, this is a real need. 
 >
> And the serializer should be 
> configurable as the chart transformer (same functionality).
 >
> Of course, we could make a transformer that creates a png image, puts it 
> in the context, and then a serializer serializes it, but it seems to 
> really stink.
> 
> So, how would you tackle the above real-world problem?

I would not write a transformer but a serializer. In fact, a chart 
package image rendere *is* a serializer, since the output of a chart 
transformer will not need to be further processed anyway.

What's wrong with this?

Stefano.


Re: [proposal] fixing the encoding problems

Posted by Nicola Ken Barozzi <ni...@apache.org>.
Stefano Mazzocchi wrote, On 18/03/2003 10.37:
> Nicola Ken Barozzi wrote:
...
>> Serializers, in the real world I mean, not in theoretical abstrations, 
>> are efectively fisrt class components, not just adapters. IMO they 
>> should be treated as such, because there is no real concrete reason 
>> IMHO why this lack of configurability continues till now, given that 
>> we have real needs for it that are not effectively solvable by other 
>> means.
> 
> 
> Everytime I real need emerges, the architecture must change to reflect 
> the need.

of course

> I'm against changes that *DO* *NOT* reflect real needs but just lack of 
> symmetry.

Again this symmetry thing...

> If you think this has changed, write a proposal.

When my personal need comes, I surely will, although now I have other 
things to do. If others want to write a more detailed proposal (Luca for 
example) please do.

The *real* fact is that if I do:

   xml data -> chart transformer -> batik -> png

It's 10x SLOWER than

   xml data -> chart serializer -> png

This is not peanuts, this is a real need. And the serializer should be 
configurable as the chart transformer (same functionality).

Of course, we could make a transformer that creates a png image, puts it 
in the context, and then a serializer serializes it, but it seems to 
really stink.

So, how would you tackle the above real-world problem?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:
> 
> 
> Vadim Gritsenko wrote, On 18/03/2003 3.52:
> 
>> Pier Fumagalli wrote:
> 
> ...
> 
>>> That gets the value out of the serializer configuration itself... So, 
>>> per
>>> se, we cannot have a per-pipeline text serializer with different 
>>> encodings
>>> per different sitemaps...
>>
>>
>> That's why we have to allow local serializer configuration :)
> 
> 
> This thing that serializers cannot be configured is driving us mad.
> Luca Morandini needs it because the serializer is doing the charts, and 
> this is 10 times faster than going through batik in the pipeline.
> 
> I second this, because my earlier experiments showed the same thing.
> 
> Serializers, in the real world I mean, not in theoretical abstrations, 
> are efectively fisrt class components, not just adapters. IMO they 
> should be treated as such, because there is no real concrete reason IMHO 
> why this lack of configurability continues till now, given that we have 
> real needs for it that are not effectively solvable by other means.

Everytime I real need emerges, the architecture must change to reflect 
the need.

I'm against changes that *DO* *NOT* reflect real needs but just lack of 
symmetry.

If you think this has changed, write a proposal.

Stefano.



Re: Lookup of PortalManager failed

Posted by JD Daniels <jd...@datatrio.com>.
cocoon 2.1-dev march 16 cvs checkout 

oops

Lookup of PortalManager failed

Posted by JD Daniels <jd...@datatrio.com>.
I apologize if this a brain dead newbie problem.

I copied the portal-fw folder up to the cocoon root. After adding the portal
pipelines to the root sitemap, I get this when I try to save a profile
change (ie, personalize(guest)-->Customize-->Save)

My root sitemap is here:
http://dev.datatrio.com/tmp-junk/sitemap.xmap

Uri:
http://localhost/portal-fw/sunspotdemo-portal?portalprofile=uprofile:portalh
andler|sunspotdemo:user_5_guest_guest&portalcmd=save


Error.log entry:

ERROR   (2003-03-18) 01:14.54:703   [access]
(/portal-fw/sunspotdemo-portlets) HttpProcessor[80][10]/CocoonServlet:
Internal Cocoon Problem
org.apache.cocoon.ProcessingException: Lookup of PortalManager failed.:
org.apache.avalon.framework.component.ComponentException: Could not find
component (role [org.apache.cocoon.webapps.portal.components.PortalManager])
 at
org.apache.cocoon.webapps.portal.generation.PortalGenerator.generate(PortalG
enerator.java:86)
 at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLP
ipeline(AbstractProcessingPipeline.java:545)
 at
org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline
.processXMLPipeline(AbstractCachingProcessingPipeline.java:214)
 at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(Abs
tractProcessingPipeline.java:489)
 at
org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(Seri
alizeNode.java:145)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:164)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp
eNode.java:158)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:164)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:108)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:153)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:108)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:143)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:317)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:299)
 at org.apache.cocoon.Cocoon.process(Cocoon.java:639)
 at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1074)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:247)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:193)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:260)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:191)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180
)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.
java:170)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172
)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:509)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:174)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.connector.http.HttpProcessor.process(HttpProcessor.java:
1040)
 at
org.apache.catalina.connector.http.HttpProcessor.run(HttpProcessor.java:1151
)
 at java.lang.Thread.run(Thread.java:536)
Caused by: org.apache.avalon.framework.component.ComponentException: Could
not find component (role
[org.apache.cocoon.webapps.portal.components.PortalManager])
 at
org.apache.avalon.excalibur.component.ExcaliburComponentManager.lookup(Excal
iburComponentManager.java:255)
 at
org.apache.cocoon.components.CocoonComponentManager.lookup(CocoonComponentMa
nager.java:296)
 at
org.apache.avalon.excalibur.component.DefaultComponentFactory$ComponentManag
erProxy.lookup(DefaultComponentFactory.java:393)
 at
org.apache.avalon.excalibur.component.DefaultComponentFactory$ComponentManag
erProxy.lookup(DefaultComponentFactory.java:393)
 at
org.apache.cocoon.webapps.portal.generation.PortalGenerator.generate(PortalG
enerator.java:77)
 ... 47 more
org.apache.avalon.framework.component.ComponentException: Could not find
component (role [org.apache.cocoon.webapps.portal.components.PortalManager])
 at
org.apache.avalon.excalibur.component.ExcaliburComponentManager.lookup(Excal
iburComponentManager.java:255)
 at
org.apache.cocoon.components.CocoonComponentManager.lookup(CocoonComponentMa
nager.java:296)
 at
org.apache.avalon.excalibur.component.DefaultComponentFactory$ComponentManag
erProxy.lookup(DefaultComponentFactory.java:393)
 at
org.apache.avalon.excalibur.component.DefaultComponentFactory$ComponentManag
erProxy.lookup(DefaultComponentFactory.java:393)
 at
org.apache.cocoon.webapps.portal.generation.PortalGenerator.generate(PortalG
enerator.java:77)
 at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLP
ipeline(AbstractProcessingPipeline.java:545)
 at
org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline
.processXMLPipeline(AbstractCachingProcessingPipeline.java:214)
 at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(Abs
tractProcessingPipeline.java:489)
 at
org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(Seri
alizeNode.java:145)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:164)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp
eNode.java:158)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:164)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:108)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:153)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:108)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:143)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:317)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:299)
 at org.apache.cocoon.Cocoon.process(Cocoon.java:639)
 at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1074)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:247)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:193)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:260)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:191)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180
)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.
java:170)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172
)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:509)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:174)
 at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
 at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at
org.apache.catalina.connector.http.HttpProcessor.process(HttpProcessor.java:
1040)
 at
org.apache.catalina.connector.http.HttpProcessor.run(HttpProcessor.java:1151
)
 at java.lang.Thread.run(Thread.java:536)


Re: [proposal] fixing the encoding problems

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Vadim Gritsenko wrote, On 18/03/2003 3.52:
> Pier Fumagalli wrote:
...
>> That gets the value out of the serializer configuration itself... So, per
>> se, we cannot have a per-pipeline text serializer with different 
>> encodings
>> per different sitemaps...
> 
> That's why we have to allow local serializer configuration :)

This thing that serializers cannot be configured is driving us mad.
Luca Morandini needs it because the serializer is doing the charts, and 
this is 10 times faster than going through batik in the pipeline.

I second this, because my earlier experiments showed the same thing.

Serializers, in the real world I mean, not in theoretical abstrations, 
are efectively fisrt class components, not just adapters. IMO they 
should be treated as such, because there is no real concrete reason IMHO 
why this lack of configurability continues till now, given that we have 
real needs for it that are not effectively solvable by other means.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Pier Fumagalli wrote:

>On 17/3/03 0:16, "Vadim Gritsenko" <va...@verizon.net> wrote:
>
<snip/>

>>It (<?xml?> instruction) done via
>>format.put(OutputKeys.ENCODING,encoding.getValue()) in abstract
>>serializer itself.
>>    
>>
>
>That gets the value out of the serializer configuration itself... So, per
>se, we cannot have a per-pipeline text serializer with different encodings
>per different sitemaps...
>  
>

That's why we have to allow local serializer configuration :)

Vadim


>Hmm... I don't know why but it doesn't feel right.
>
>    Pier
>



Re: [proposal] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 17/3/03 0:16, "Vadim Gritsenko" <va...@verizon.net> wrote:

>> - String getCharsetEncoding() [or getCharacterEncoding]:
>>    
>>    Returns the default character encoding configured for the specified
>>    AbstractTextSerializer (or the default one for the sitemap if none
>>    was specified).
>>    This can be usefult (for example) in the HtmlSerializer so that a new
>>    <meta http-equiv="Content-Type" content="text/html; charset=???"/>
>>    tag can be added automagically to the output, or to the "XMLSerializer"
>>    so that the "<?xml version="1.0" encoding="???"?>" initial processing
>>    instruction can be constructed appropriately.
>> 
> 
> It (<?xml?> instruction) done via
> format.put(OutputKeys.ENCODING,encoding.getValue()) in abstract
> serializer itself.

That gets the value out of the serializer configuration itself... So, per
se, we cannot have a per-pipeline text serializer with different encodings
per different sitemaps...

Hmm... I don't know why but it doesn't feel right.

    Pier


Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Pier Fumagalli wrote:

<snip/>

>So, in my opinion, the "best" way to tackle the charset-encoding problem is
>to have the org.apache.cocoon.serialization.AbstractTextSerializer to
>receive an OutputStream from its implementation of the
>SitemapOutputComponent interface, but to expose to its solid implementations
>another couple of methods, instead of "getOutputStream":
>
>- String getCharsetEncoding() [or getCharacterEncoding]:
>    
>    Returns the default character encoding configured for the specified
>    AbstractTextSerializer (or the default one for the sitemap if none
>    was specified).
>    This can be usefult (for example) in the HtmlSerializer so that a new
>    <meta http-equiv="Content-Type" content="text/html; charset=???"/>
>    tag can be added automagically to the output, or to the "XMLSerializer"
>    so that the "<?xml version="1.0" encoding="???"?>" initial processing
>    instruction can be constructed appropriately.
>

It (<?xml?> instruction) done via 
format.put(OutputKeys.ENCODING,encoding.getValue()) in abstract 
serializer itself.


>- Writer getWriter():
>
>    Returns a java.io.Writer encoding character data to the response output
>    stream according to whatever is returned by getCharsetEncoding
>
>Those two should be controlled from the sitemap by (as you, Stefano, said):
>  
>

Sounds good.


>>2) also, i want a way to overwrite the sitemap-wide behavior of every
>>single serializers, locally, such as
>>
>> <map:serialize encoding="UTF-8"/>
>>    
>>
>
>The only "nitpick" I have is that since "encoding" means a lot of things,
>this should be called "charset" (which is way more specific)...
>
>This can be easily picked up by the AbstractTextSerializer.configure()
>method and returned by the two methods added above...
>
>I can work on a patch if you guys want... It's pretty trivial indeed...
>

Not that trivial. Configure works only globally. Right now serializers 
do not have local configuration - because they don't implement 
SitemapModelComponent and its setup() method.

Sitemap implementation has to be changed to test serializers for 
SitemapModelComponent interface, and caching pipeline should test for it 
too (IIRC, old implementation did that, not sure about new one).

Vadim


>    Pier
>



Re: [proposal] fixing the encoding problems

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 16/3/03 20:04, "Stefano Mazzocchi" <st...@apache.org> wrote:
>
>> So, if you to put encoding into sitemap... You will have to disable
>> serializer configuration and request configuration and force sitemap
>> encoding onto request / response. Is this what you are proposing?
> 
> nonononononooo
> 
> please, read again, my proposal, i think it's pretty clear.

Stefano, I believe your proposal got to the list chopped up big time,
because what Vadim quoted is _ALL_ I've got as well, and really I don't
understand what you want to do.

Also, a little "nitpick" (naming conventions): MIME, and its children one of
which is HTTP, specifies various "kinds" of encoding:

- a charset encoding (UTF-8, ISO-8859-1, US-ASCII, name your own)
- a content encoding (gzip, compress)
- a (content?) transfer encoding (chunked, base64, 8-bit...)
  (see RFC-2616 section 3.6)

In MIME, usually the charset encoding is called simply "charset" and is a
subproperty of the Content-Type header, only when the content type starts
with "text/"...

Content encoding specifies how the content is represented in a binary
stream, and therefore can be applied to both binary and text resources.

The transfer-encoding, instead, is relative to the protocol used (it's
called Content-Transfer-Encoding in MIME/mail, it's called Transfer-Encoding
in HTTP) and of course the values vary quite a lot..

When we think about i18n, one must think about the first encoding (charset
encoding), when we think about passing the content to a client in some way
we have to think about the second kind (content encoding), when thinking
about the protocol, we have to think about the third one (transfer
encoding).

Lets say that transfer encoding is handled by the protocol handler itself
(mail engine, servlet container, whatever), we still have to deal with the
other two.

Content encoding (outer layer) encodes content from a (In|Out)putStream into
another stream of the same kind, while charset encoding (inner layer) can be
applied only to text resources and encodes content from a (Writer|Reader)
into a (In|Out)putStream.

[ two hours roughly pass, dinner + "The Bourne identity" on DVD ]

I just red what Sylvain said and he is absolutely right.

On 16/3/03 21:09, "Sylvain Wallez" <sy...@anyware-tech.com> wrote:
>
> However, we also have to consider that serializers basically produce
> binary data (e.g svg2png) for which the encoding has no meaning. So
> should there be a new kind of serializers (TextSerializer ?) that gets a
> Writer instead of an OutputStream ?
> 
> This would allow for the encoding to be handled directly and totally by
> the pipeline engine, which would use the proper encoding to build the
> TextSerializer's Writer.

To rewrite what he said with the above mentioned three-layer encoding in
mind:

- the servlet container/mail engine/whatever will take care of the "Transfer
  Encoding" (Cocoon as an application should not care nor interfere with
  it).

- ALL serializers should have the ability to deal with "Content Encoding",
  unless (that would be my preferred option, as 90% of the times we think
  about deploying things over servlets) we don't want to "recommend" the use
  of "servlet filters" to do things such as GZIP encoding of the content.

- TEXT-based serializers should think about "charset encoding" and are the
  only ones which should do that.

So, in my opinion, the "best" way to tackle the charset-encoding problem is
to have the org.apache.cocoon.serialization.AbstractTextSerializer to
receive an OutputStream from its implementation of the
SitemapOutputComponent interface, but to expose to its solid implementations
another couple of methods, instead of "getOutputStream":

- String getCharsetEncoding() [or getCharacterEncoding]:
    
    Returns the default character encoding configured for the specified
    AbstractTextSerializer (or the default one for the sitemap if none
    was specified).
    This can be usefult (for example) in the HtmlSerializer so that a new
    <meta http-equiv="Content-Type" content="text/html; charset=???"/>
    tag can be added automagically to the output, or to the "XMLSerializer"
    so that the "<?xml version="1.0" encoding="???"?>" initial processing
    instruction can be constructed appropriately.

- Writer getWriter():

    Returns a java.io.Writer encoding character data to the response output
    stream according to whatever is returned by getCharsetEncoding

Those two should be controlled from the sitemap by (as you, Stefano, said):

> 2) also, i want a way to overwrite the sitemap-wide behavior of every
> single serializers, locally, such as
>
>  <map:serialize encoding="UTF-8"/>

The only "nitpick" I have is that since "encoding" means a lot of things,
this should be called "charset" (which is way more specific)...

This can be easily picked up by the AbstractTextSerializer.configure()
method and returned by the two methods added above...

I can work on a patch if you guys want... It's pretty trivial indeed...

    Pier


Re: [proposal] fixing the encoding problems

Posted by Stefano Mazzocchi <st...@apache.org>.
Vadim Gritsenko wrote:
> Stefano Mazzocchi wrote:
> 
>> Cocoon is heavily internationalized but we fail to do one thing: 
>> signal the proper encoding to the user-agent thru HTTP headers, which 
>> is the most reliable way of doing it.
>>
>> the current *hack* is to use <meta> tags in the HTML stream,
> 
> Ew!

I know.

>> these are interpreted by the HTTP server stack and transfered as HTTP 
>> headers. but this creates many problems and concern mixes.
>>
>> Vadim suggested to set the headers from the serializers, but I think 
>> there is a better alternative.
>>
>> So I propose to add the method
>>
>>  getEncoding()
>>
>> to the interface
>>
>>  org.apache.cocoon.sitemap 
> 
> 
> 
> Why sitemap would ever know anything about encoding?

Ok, good question.

> There are two parts to the encoding problem: decoding incoming request 
> and encoding outgoing response. 

Right.

> Request encoding can be set by 
> SetCharacterEncodingAction or by anything else via 
> request.setCharacterEncoding() method. Or, every request parameter can 
> be decoded independently. Response encoding directly depends on the 
> encoding parameter set to the serializer from the sitemap.

yes.

> And, any of these are totally independent from the internationalization. 
> Internationalization affects language used to produce output, but not 
> how the text in this language is encoded (UTF8, UTF16, ISO-1859-1, 
> what-have-you).

true. but you can't have chinese text in US-ASCII, right? my point is 
that having globally-balanced hooks for encoding will allow cocoon to be 
even more friendly for non-latin-charset needs. I used i18n out of 
context here, sorry.

> So, if you to put encoding into sitemap... You will have to disable 
> serializer configuration and request configuration and force sitemap 
> encoding onto request / response. Is this what you are proposing?

nonononononooo

please, read again, my proposal, i think it's pretty clear.

> If 
> yes... IMHO, it makes more sence to have this parameter of the pipeline 
> but not whole sitemap. 

I didn't propose that, where did you get that impression?

> But I am not convinced that it's sitemap's 
> responsibility to worry about encoding (from SoC POV).

I restate:

1) I want a way for serializers to indicate to the pipeline what is the 
encoding they will be using, so that the pipeline can set the right HTTP 
header for it.

2) also, i want a way to overwrite the sitemap-wide behavior of every 
single serializers, locally, such as

  <map:serialize encoding="UTF-8"/>

when the global serializer configurations state they will be using 
something else.

Is the proposal clear enough?

Stefano.


Re: [proposal] fixing the encoding problems

Posted by Vadim Gritsenko <va...@verizon.net>.
Stefano Mazzocchi wrote:

> Cocoon is heavily internationalized but we fail to do one thing: 
> signal the proper encoding to the user-agent thru HTTP headers, which 
> is the most reliable way of doing it.
>
> the current *hack* is to use <meta> tags in the HTML stream,


Ew!


> these are interpreted by the HTTP server stack and transfered as HTTP 
> headers. but this creates many problems and concern mixes.
>
> Vadim suggested to set the headers from the serializers, but I think 
> there is a better alternative.
>
> So I propose to add the method
>
>  getEncoding()
>
> to the interface
>
>  org.apache.cocoon.sitemap 


Why sitemap would ever know anything about encoding?

There are two parts to the encoding problem: decoding incoming request 
and encoding outgoing response. Request encoding can be set by 
SetCharacterEncodingAction or by anything else via 
request.setCharacterEncoding() method. Or, every request parameter can 
be decoded independently. Response encoding directly depends on the 
encoding parameter set to the serializer from the sitemap.

And, any of these are totally independent from the internationalization. 
Internationalization affects language used to produce output, but not 
how the text in this language is encoded (UTF8, UTF16, ISO-1859-1, 
what-have-you).

So, if you to put encoding into sitemap... You will have to disable 
serializer configuration and request configuration and force sitemap 
encoding onto request / response. Is this what you are proposing? If 
yes... IMHO, it makes more sence to have this parameter of the pipeline 
but not whole sitemap. But I am not convinced that it's sitemap's 
responsibility to worry about encoding (from SoC POV).


Vadim