You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by roy huang <li...@hotmail.com> on 2004/08/12 13:45:43 UTC
[Help]How can I use non-ascii file name?
Hi,all:
Use reader to display jpg or gif is quite simple,like:
<map:match pattern="*.jpg">
<map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
</map:match>
But if the file name is not ASCII but utf-8 or other encoding like 花.jpg (simplified Chinese),the resolver didn't resolve the name correctly,error occur:
org.apache.cocoon.ResourceNotFoundException: Error during resolving of the input stream: org.apache.excalibur.source.SourceNotFoundException: file:/C:/My Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg doesn't exist.
How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.
Roy Huang
Re: [Help]How can I use non-ascii file name?
Posted by Marc Portier <mp...@outerthought.org>.
Pier Fumagalli wrote:
> On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
>
>> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
>> either a fix, or an explaination and workaround... For now, brrrr, I
>> think that the hack is the only way to go...
>
>
> I don't know about Tomcat, but if you're not on the jetty developers
> list, here's the outcome:
>
I'm not, thx for copying over...
> Jetty defaults (for compatibility to all the other broken containers,
> and because there's no "official standard" about UTF-8 URIs) to
> ISO-8859-1. And this ain't great.
>
> Now, the good thing is that if you start your jetty specifying the
> "org.mortbay.util.URI.charset" system property, it will use that one as
> the charset used for decoding URLs.
>
> So, by putting in "-Dorg.mortbay.util.URI.charset=UTF-8" we get the
> expected behavior.
>
cool
> How about setting it up as the default behavior for Cocoon's internal
> Jetty distro?
>
makes sense, but: (whishing all this brokenness wan't there but helas)
- it shouldn't keep us from actually get about solving it for all
containers? (my guess is that just a fraction of cocoon deployments
actually run on the internal jetty distro, i.e. using the cocoon.sh or
.bat?)
- learning about this org.mortbay.util.URI.charset property we should
probably use it to override (or at least log-warn deployers if it's
different to) the container-encoding setting in the web.xml
(assuming that the mentioned property will also be in effect when
decoding the request parameters, and taking in account that current
cocoon code assumes ISO-8859-1 as the default there)
- once we've run that far, we might even consider making a scan of other
servlet containers and how they possibly allow setting the
container-encoding?
wdyt?
while typing I started rethinking why we ended up with this
container-encoding init-param in web.xml?
IIRC we did that because of required compliance to servlet spec versions
prior to 2.3? So first question is are we still on servlet 2.2?
If not: Since 2.3 there exists a setCharacterEncoding()
<quote from="servlet 2.3 javadoc"
href="http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)">
Overrides the name of the character encoding used in the body of this
request. This method must be called prior to reading request
parameters or reading input using getReader().
</quote>
- I assume the cocoon servlet could easily arrange for calling the
method before anything else
- I'm a bit unsure here if the javadoc mentioning of 'in the body of
this request' is going to be interpreted by implementations as a
limiting scope, and if so if they include the URI (and the request
params using get vs post) as part of it or not
(talk about possible confusion when writing specs like this, yuk!)
regards,
-marc=
(sorry for just popping up the questions, lacking the time to
investigate deeper myself ATM)
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On 17 Aug 2004, at 11:03, Pier Fumagalli wrote:
> On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
>
>> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
>> either a fix, or an explaination and workaround... For now, brrrr, I
>> think that the hack is the only way to go...
>
> I don't know about Tomcat, but if you're not on the jetty developers
> list, here's the outcome:
>
> Jetty defaults (for compatibility to all the other broken containers,
> and because there's no "official standard" about UTF-8 URIs) to
> ISO-8859-1. And this ain't great.
>
> Now, the good thing is that if you start your jetty specifying the
> "org.mortbay.util.URI.charset" system property, it will use that one
> as the charset used for decoding URLs.
>
> So, by putting in "-Dorg.mortbay.util.URI.charset=UTF-8" we get the
> expected behavior.
>
> How about setting it up as the default behavior for Cocoon's internal
> Jetty distro?
you got my +1
regards Jeremy
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------
Re: [Help]How can I use non-ascii file name?
Posted by Marc Portier <mp...@outerthought.org>.
Pier Fumagalli wrote:
> On 17 Aug 2004, at 16:20, Marc Portier wrote:
>
>>> How about setting it up as the default behavior for Cocoon's
>>> internal Jetty distro?
>>
>>
>> makes sense, but: (whishing all this brokenness wan't there but helas)
>
>
> It's not really "brokenness" but more along the lines of an inversion
> of the Robustness Principle, as outlined by J. Postel in RFC-791
> (http://www.rfc-editor.org/rfc/rfc791.txt section 3.2) and later
> dogmatized by R. Braden in RFC-1122
> (http://www.rfc-editor.org/rfc/rfc1122.txt Section 1.2.2).
>
> "Be liberal in what you accept, and conservative in what you send."
>
> In this case browsers are liberal in what they send (URL-Encoded UTF-8)
> and servlet containers are conservative in what they accept
> (URL-Encoded ISO-8859-1).
>
indeed
>> - it shouldn't keep us from actually get about solving it for all
>> containers? (my guess is that just a fraction of cocoon deployments
>> actually run on the internal jetty distro, i.e. using the cocoon.sh or
>> .bat?)
>
>
> Well, we found that Jetty in production was much better than anyone
> else. So, in our production environment we have Jetty (not the Cocoon
> distro one, a full blown copy)... Works pretty neatly! :-P
>
>> - learning about this org.mortbay.util.URI.charset property we should
>> probably use it to override (or at least log-warn deployers if it's
>> different to) the container-encoding setting in the web.xml
>> (assuming that the mentioned property will also be in effect when
>> decoding the request parameters, and taking in account that current
>> cocoon code assumes ISO-8859-1 as the default there)
>
>
> I agree, but as I said, my world revolves around the best container in
> the world (whops, Jetty), so I already have "my" fix to the problem:
> switch! :-P
>
>> - once we've run that far, we might even consider making a scan of other
>> servlet containers and how they possibly allow setting the
>> container-encoding?
>
>
> The "conteiner-encoding" servlet initialization parameter simply
> applies for request parameters (form data), and I suppose it only
> affects how the way in which from the ServletRequest.getInputStream()
> we read full blown characters, and parse forms.
>
I'ld need to check but assume the request params are included regardless
off the GET or POST method
of course the uri-part before ? would need to been used already
internally in the servlet container at least to point to the correct JSP
or servlet...
hm, I'ld need to try-out some jsp/servlet with a euro-sign in the
file-name or so and check whether the path indication in the web.xml is
able to find it...
>> while typing I started rethinking why we ended up with this
>> container-encoding init-param in web.xml?
>>
>> IIRC we did that because of required compliance to servlet spec versions
>> prior to 2.3? So first question is are we still on servlet 2.2?
>>
Just found the thread that answers the question:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=108858029423811&w=2
>> If not: Since 2.3 there exists a setCharacterEncoding()
>> <quote from="servlet 2.3 javadoc"
>> href="http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/
>> ServletRequest.html#setCharacterEncoding(java.lang.String)">
>> Overrides the name of the character encoding used in the body of this
>> request. This method must be called prior to reading request
>> parameters or reading input using getReader().
>> </quote>
>
>
> Indeed, the problem here is that it's nowhere specified how the request
> BODY (not the URL, source of this problem) should be encoded.
>
yep, but as stated above: I suppose that the border-case 'request-params
in GET mode' is included (even if those are -stricktly speaking- not in
the body?).
This seems to suggest that the current use of the en-re-decoding trick
in cocoon's request-wrapper could be cleaned out (since we voted to go
with 2.3 from now on)
> Normally, from browser behaviour, I can see that usually browsers tend
> to post application/www-form-urlencoded in the same charset they used
> interpreting the form. So given an HTTP request like this:
>
> C: GET /myForm HTTP/1.1
> C: Host: localhost:80
> C:
> S: HTTP/1.1 200 OK
> S: Date: Wed, 18 Aug 2004 08:30:28 GMT
> S: Server: Apache/2.0.49 (Unix) DAV/2 SVN/1.0.2
> S: Content-Type: text/html; charset=utf-8
>
> When the form included in /myForm is posted back to its action, the
> UTF-8 charset will be used to encode the form data...
>
> That's normally a rule of thumb, and that's why (IMVHO) UTF-8 should be
> used for all forms, and should always used be as the default encoding
> for writing and riding.
>
yep,
we have wiki info already indicating that to our users:
http://wiki.apache.org/cocoon/RequestParameterEncoding
(hm, more interesting stuff out there, and probably some of the new
viewpoints from this thread could be added there)
>> - I assume the cocoon servlet could easily arrange for calling the
>> method before anything else
>
>
> Yes, hoping that it actually works. But cocoon should call the method
> with the encoding used to send the form from where data is read...
yep, they should be consistent.
fact is there was a patch on the serializers to do so by default
(but the other way around: by default they are taking the setting of
form_encoding init param for doing the serialization)
fixcommit here:
http://cvs.apache.org/viewcvs.cgi/cocoon/trunk/src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java?r1=24666&r2=26246&p1=cocoon/trunk/src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java&p2=cocoon/trunk/src/java/org/apache/cocoon/serialization/AbstractTextSerializer.java&diff_format=h&root=Apache-SVN
archived discussion here:
http://marc.theaimsgroup.com/?t=106760662600010&r=1&w=2
> should be easy for continuations, but in most of the cases, I'd say
> that it's a good principle to choose one encoding for your entire
> application and stick to it...
>
agree, just running through the (above mentioned) wiki page however I
noticed some paragraph on wanting to 'locally' override the
form-encoding for certain pipelines (use case being support for
different clients then only the classic browsers which might behave
differently)
the suggested setCharacterEncodingAction seems to be a good match to
that issue and it somewhat suggests we should keep some form of possible
en-re-decoding scheme in our request-wrapper (looks like the 2.3 switch
should not make us jump to hasty conclusions on that part)
(boy this issue seems to be a rose with many thorns, and it seems to
blossom every year or so :-))
>> - I'm a bit unsure here if the javadoc mentioning of 'in the body of
>> this request' is going to be interpreted by implementations as a
>> limiting scope, and if so if they include the URI (and the request
>> params using get vs post) as part of it or not
>
>
> The point you mentioned in the spec _DOES_NOT_ include the request URI.
> We've talked quite extensively over it while writing Servlet 2.4, which
> (in theory) should expand more on the concepts of charset and i18n.
>
thx for the clarrification and inside info
>> (talk about possible confusion when writing specs like this, yuk!)
>
>
> Well, it's a big gray area... Most of my knowledge is based on my
> girlfriend's PC. She's japanese, and although I don't understand what's
> all that gibberish on her screen, I can still test out few bits and
> bobs...
>
> For all our MacOS/X folks, if you want to try out playing with
> different encodings and internationalization settings, close your
> Safari, Mozilla, Firefox, and so on, go into the System Preferences and
> drag the three "bookcase, christmas tree, lotsa-lines block"
> (ni-hon-go) sequence of three characters right up to the top. Start
> your browser, and then restore english (french, italian, german) up on
> top where it was in the preferences.
>
> Your browser will now think it's working on a Japanese PC and will do
> everything like you were living in Tokyo.
>
> On Windows, sorry, your best bet is to actually GO to Tokyo, and buy a
> copy of WindowsXP in Japanese. :-(
>
yeah testing isn't obvious as one also needs to rely on having a
as-unicode-complete-as-they-come font so you are sure you are seeing
what you think you are seeing...
any case: my personal testing-candidate for these cases is just using
the euro-sign (\u20AC, utf-8: %E2%82%AC) in pathnames, filenames,
classnames, request params and whatnot.
most european systems (even windows) would have a native encoding
supporting the eurosign (while iso-8859-1 obviously doesn't)
geek detail: you can even use it in your Java source code:
public class \u20ACToBEF
{
...
}
(in fact java's compiler is completely unicode aware towards the source
code: if you're sick enough you might even go about writing the keywords
like 'public' and 'class' in their escaped unicode variants :-)
notice that you will need to be able to specify an euro-sign in the
filename of that source though)
regards,
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
On 17 Aug 2004, at 16:20, Marc Portier wrote:
>> How about setting it up as the default behavior for Cocoon's internal
>> Jetty distro?
>
> makes sense, but: (whishing all this brokenness wan't there but helas)
It's not really "brokenness" but more along the lines of an inversion
of the Robustness Principle, as outlined by J. Postel in RFC-791
(http://www.rfc-editor.org/rfc/rfc791.txt section 3.2) and later
dogmatized by R. Braden in RFC-1122
(http://www.rfc-editor.org/rfc/rfc1122.txt Section 1.2.2).
"Be liberal in what you accept, and conservative in what you send."
In this case browsers are liberal in what they send (URL-Encoded UTF-8)
and servlet containers are conservative in what they accept
(URL-Encoded ISO-8859-1).
> - it shouldn't keep us from actually get about solving it for all
> containers? (my guess is that just a fraction of cocoon deployments
> actually run on the internal jetty distro, i.e. using the cocoon.sh or
> .bat?)
Well, we found that Jetty in production was much better than anyone
else. So, in our production environment we have Jetty (not the Cocoon
distro one, a full blown copy)... Works pretty neatly! :-P
> - learning about this org.mortbay.util.URI.charset property we should
> probably use it to override (or at least log-warn deployers if it's
> different to) the container-encoding setting in the web.xml
> (assuming that the mentioned property will also be in effect when
> decoding the request parameters, and taking in account that current
> cocoon code assumes ISO-8859-1 as the default there)
I agree, but as I said, my world revolves around the best container in
the world (whops, Jetty), so I already have "my" fix to the problem:
switch! :-P
> - once we've run that far, we might even consider making a scan of
> other
> servlet containers and how they possibly allow setting the
> container-encoding?
The "conteiner-encoding" servlet initialization parameter simply
applies for request parameters (form data), and I suppose it only
affects how the way in which from the ServletRequest.getInputStream()
we read full blown characters, and parse forms.
> while typing I started rethinking why we ended up with this
> container-encoding init-param in web.xml?
>
> IIRC we did that because of required compliance to servlet spec
> versions
> prior to 2.3? So first question is are we still on servlet 2.2?
>
> If not: Since 2.3 there exists a setCharacterEncoding()
> <quote from="servlet 2.3 javadoc"
> href="http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/
> ServletRequest.html#setCharacterEncoding(java.lang.String)">
> Overrides the name of the character encoding used in the body of this
> request. This method must be called prior to reading request
> parameters or reading input using getReader().
> </quote>
Indeed, the problem here is that it's nowhere specified how the request
BODY (not the URL, source of this problem) should be encoded.
Normally, from browser behaviour, I can see that usually browsers tend
to post application/www-form-urlencoded in the same charset they used
interpreting the form. So given an HTTP request like this:
C: GET /myForm HTTP/1.1
C: Host: localhost:80
C:
S: HTTP/1.1 200 OK
S: Date: Wed, 18 Aug 2004 08:30:28 GMT
S: Server: Apache/2.0.49 (Unix) DAV/2 SVN/1.0.2
S: Content-Type: text/html; charset=utf-8
When the form included in /myForm is posted back to its action, the
UTF-8 charset will be used to encode the form data...
That's normally a rule of thumb, and that's why (IMVHO) UTF-8 should be
used for all forms, and should always used be as the default encoding
for writing and riding.
> - I assume the cocoon servlet could easily arrange for calling the
> method before anything else
Yes, hoping that it actually works. But cocoon should call the method
with the encoding used to send the form from where data is read...
should be easy for continuations, but in most of the cases, I'd say
that it's a good principle to choose one encoding for your entire
application and stick to it...
> - I'm a bit unsure here if the javadoc mentioning of 'in the body of
> this request' is going to be interpreted by implementations as a
> limiting scope, and if so if they include the URI (and the request
> params using get vs post) as part of it or not
The point you mentioned in the spec _DOES_NOT_ include the request URI.
We've talked quite extensively over it while writing Servlet 2.4, which
(in theory) should expand more on the concepts of charset and i18n.
> (talk about possible confusion when writing specs like this, yuk!)
Well, it's a big gray area... Most of my knowledge is based on my
girlfriend's PC. She's japanese, and although I don't understand what's
all that gibberish on her screen, I can still test out few bits and
bobs...
For all our MacOS/X folks, if you want to try out playing with
different encodings and internationalization settings, close your
Safari, Mozilla, Firefox, and so on, go into the System Preferences and
drag the three "bookcase, christmas tree, lotsa-lines block"
(ni-hon-go) sequence of three characters right up to the top. Start
your browser, and then restore english (french, italian, german) up on
top where it was in the preferences.
Your browser will now think it's working on a Japanese PC and will do
everything like you were living in Tokyo.
On Windows, sorry, your best bet is to actually GO to Tokyo, and buy a
copy of WindowsXP in Japanese. :-(
Pier
Re: [Help]How can I use non-ascii file name?
Posted by Marc Portier <mp...@outerthought.org>.
(repost: just noticed I forgot to copy dev-list)
Pier Fumagalli wrote:
> On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
>
>> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
>> either a fix, or an explaination and workaround... For now, brrrr, I
>> think that the hack is the only way to go...
>
>
> I don't know about Tomcat, but if you're not on the jetty developers
> list, here's the outcome:
>
I'm not, thx for copying over...
> Jetty defaults (for compatibility to all the other broken containers,
> and because there's no "official standard" about UTF-8 URIs) to
> ISO-8859-1. And this ain't great.
>
> Now, the good thing is that if you start your jetty specifying the
> "org.mortbay.util.URI.charset" system property, it will use that one as
> the charset used for decoding URLs.
>
> So, by putting in "-Dorg.mortbay.util.URI.charset=UTF-8" we get the
> expected behavior.
>
cool
> How about setting it up as the default behavior for Cocoon's internal
> Jetty distro?
>
makes sense, but: (whishing all this brokenness wan't there but helas)
- it shouldn't keep us from actually get about solving it for all
containers? (my guess is that just a fraction of cocoon deployments
actually run on the internal jetty distro, i.e. using the cocoon.sh or
.bat?)
- learning about this org.mortbay.util.URI.charset property we should
probably use it to override (or at least log-warn deployers if it's
different to) the container-encoding setting in the web.xml
(assuming that the mentioned property will also be in effect when
decoding the request parameters, and taking in account that current
cocoon code assumes ISO-8859-1 as the default there)
- once we've run that far, we might even consider making a scan of other
servlet containers and how they possibly allow setting the
container-encoding?
wdyt?
while typing I started rethinking why we ended up with this
container-encoding init-param in web.xml?
IIRC we did that because of required compliance to servlet spec versions
prior to 2.3? So first question is are we still on servlet 2.2?
If not: Since 2.3 there exists a setCharacterEncoding()
<quote from="servlet 2.3 javadoc"
href="http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)">
Overrides the name of the character encoding used in the body of this
request. This method must be called prior to reading request
parameters or reading input using getReader().
</quote>
- I assume the cocoon servlet could easily arrange for calling the
method before anything else
- I'm a bit unsure here if the javadoc mentioning of 'in the body of
this request' is going to be interpreted by implementations as a
limiting scope, and if so if they include the URI (and the request
params using get vs post) as part of it or not
(talk about possible confusion when writing specs like this, yuk!)
regards,
-marc=
(sorry for just popping up the questions, lacking the time to
investigate deeper myself ATM)
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
> either a fix, or an explaination and workaround... For now, brrrr, I
> think that the hack is the only way to go...
I don't know about Tomcat, but if you're not on the jetty developers
list, here's the outcome:
Jetty defaults (for compatibility to all the other broken containers,
and because there's no "official standard" about UTF-8 URIs) to
ISO-8859-1. And this ain't great.
Now, the good thing is that if you start your jetty specifying the
"org.mortbay.util.URI.charset" system property, it will use that one as
the charset used for decoding URLs.
So, by putting in "-Dorg.mortbay.util.URI.charset=UTF-8" we get the
expected behavior.
How about setting it up as the default behavior for Cocoon's internal
Jetty distro?
Pier
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
References to non-hack:
http://www.w3.org/International/O-URL-and-ident
Pier
On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
> Ok, I tracked the sucker down... It's the servlet container... They
> all decode the stupid URL using ISO-8859-1... And therefore, utterly
> incompatible with 3/4 of the non-english-speaking world...
>
> At best, I was able to _HACK_ the whole thing through, by getting the
> path info in this way:
>
> <WARNING note="shit-code-follows">
>
> new String(request.getPathInfo().getBytes("ISO-8859-1"),"UTF-8"));
>
> </WARNING>
>
> Therefore, I get the BYTES of the path-info string as if they were in
> ISO-8859-1, and re-create a new string by taking those bytes and
> forcing them to be in UTF-8...
>
> Niiiiiiiiiiiiiiiiiiice!
>
> Note that this stupidity also happens with accented letters (that for
> us Italians is a big p-i-t-a).
>
> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
> either a fix, or an explaination and workaround... For now, brrrr, I
> think that the hack is the only way to go...
>
> Oh, I checked it also on Tomcat. Same problem there as well...
>
> Pier
>
>
>
> On 16 Aug 2004, at 12:05, Marc Portier wrote:
>
>> Pier,
>>
>>
>> As a coincidence we recently (last week) had a similar post on
>> xreporter-list (which uses cocoon)
>>
>> Bad news is that I didn't track it down to the bottom yet, just some
>> findings below:
>> (in fact the odd-char-in-filename for map:read and map:mount was one
>> of the first things I was going to test, seems I'm already presented
>> with the results)
>>
>>
>> what I did find already was this:
>>
>> Cocoon's Request.getSitemapURI() will return an assembly of
>> javax.servlet.http.HttpServletRequest.getServletPath()
>> + javax.servlet.http.HttpServletRequest.getPathInfo()
>>
>> Servlet spec on those states they will be (url-) decoded
>> Thus 3 char sequences of the kind "%BYTE_HEX" will have been
>> translated into single bytes. The obtained byte-sequence is then
>> decoded using SOME_DECODING (my guess would be using ISO-8859-1, but
>> haven't found yet if this is container specific, modifiable or hard
>> noted in some spec. Only thing I found is this:
>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but
>> I'm yet unsure on how this influences servlet specs, or actual
>> container and even browser implementations for that matter)
>>
>>
>> Alternatively there is:
>> Cocoon's Request.getRequestURI() which maps onto the
>> javax.servlet.http.HttpServletRequest.getRequestURI()
>>
>> This one resembles the URI as transferred over the wire: ie. not
>> (url-)decoded, or in other words still holding the %XX sequences
>>
>>
>> As an extra clarification on all these the servlet spec explicitely
>> states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
>> <quote>
>> It is important to note that, *except for URL encoding differences*
>> between the request URI and the path parts, the following equation is
>> always true:
>>
>> requestURI = contextPath + servletPath + pathInfo
>> </quote>
>>
>>
>> I (for now) assume that this is the same encoding we expect
>> cocoon-deploy people to specify in the 'container-encoding'
>> init-parameter in the web.xml (allowing to correctly en-re-decode
>> request-paramater-values in case of mismatching form and container
>> encodings)
>>
>>
>>
>>
>> Ok, above is dull data, and not much into a direction of any solution
>> yet. My current feeling (long shot, needs time to test and try, and
>> based on above assumption) is that we should
>>
>> In terms of backwards compatibility I'm unsure if we could just go
>> about changing the semantics (histrocally implied use of iso-8859-1
>> encoding) of getSitemapURI() or rather should deprecate and/or have a
>> different method next to it?
>>
>> In any case this new implementation should then probably apply the
>> same kind of dirty en-re-decoding-trick
>>
>> new return(getSitemapURI().getBytes(container_encoding),form_encoding)
>>
>> as we do today with the request param values?
>>
>> (see
>> http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/
>> cocoon/environment/http/HttpRequest.java?annotate=1.11#391
>> sorry for the old cvs-style link, the svn version of viewcvs doesn't
>> seem to support 'annotate' ?)
>>
>>
>> For the record: the fast hack/workaround in the xreporter case was
>> exactly to apply this.
>>
>>
>>
>>
>> Attached to this I'm also seeing the trouble of mount-points in
>> cocoon. I've seen a number of installments needing (well, 'using'
>> at least) some insertion of that
>> part-of-the-URL-that-maps-to-the-mounted-sitemap to be able to have
>> links in source xml.files refer to other resources managed by the
>> same mounted sitemap without the need to explicitely mention that
>> part (but have it dynamically inserted by some xsl in stead).
>>
>> In those occasions I've seen people mostly subtract siteMapURI from
>> requestURI to obtain that prefix part. Regarding the above
>> observations this algorithm will however fail due to encoding
>> differences.
>>
>> My proposal would be to not only add a method for decoding the
>> sitemapURI properly, but in the mean time adding the convenience
>> method to return the mounted-sitemap-part as well on the level of
>> cocoon's request.
>>
>>
>>
>> Above are early observations that need some backing, so comments
>> welcome. (and hoping someone beats me to this since I'm lacking the
>> time to pursue myself)
>> -marc=
>>
>>
>> Pier Fumagalli wrote:
>>> On 12 Aug 2004, at 12:45, roy huang wrote:
>>>> Hi,all:
>>>> Use reader to display jpg or gif is quite simple,like:
>>>> <map:match pattern="*.jpg">
>>>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>>>> </map:match>
>>>> But if the file name is not ASCII but utf-8 or other encoding
>>>> like 花.jpg (simplified Chinese),the resolver didn't resolve the
>>>> name correctly,error occur:
>>>> org.apache.cocoon.ResourceNotFoundException: Error during resolving
>>>> of the input stream:
>>>> org.apache.excalibur.source.SourceNotFoundException: file:/C:/My
>>>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>>>> doesn't exist.
>>>>
>>>> How can I use non-ASCII file name in cocoon?I can't find any
>>>> description or help in wiki or archived mail list.
>>>>
>>>> Roy Huang
>>> It appears indeed as a bug...
>>> I have this sitemap snippet:
>>> <map:match pattern="谷*">
>>> <map:generate src="谷{1}.xml"/>
>>> <map:transform src="welcome.xslt">
>>> <map:parameter name="contextPath"
>>> value="{request:contextPath}"/>
>>> </map:transform>
>>> <map:serialize type="xhtml"/>
>>> </map:match>
>>> and a file on the disk called "谷理子.xml". Somewhere, when I make a
>>> request for "http://localhost:8888/谷理子", the whole thing goes
>>> berserk...
>>> Now, the URL is passed correctly, as I see that in the access log:
>>> INFO (2004-08-16) 10:26.36:538 [access]
>>> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
>>> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>>> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0
>>> B7 E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow
>>> it gets lost in the process.
>>> Now, if I modify my itemap to
>>> <map:match pattern="tanisatoko">
>>> <map:generate src="谷理子.xml"/>
>>> <map:transform src="welcome.xslt">
>>> <map:parameter name="contextPath"
>>> value="{request:contextPath}"/>
>>> </map:transform>
>>> <map:serialize type="xhtml"/>
>>> </map:match>
>>> And I make a request to "http://localhost:8888/tanisatoko", the
>>> thing works perfectly. We can safely exclude the fact that it's the
>>> generation process.
>>> Now, the _odd_ thing I noticed is that in those cases, I get an
>>> error of "PipelineNotFound", not a "ResourceNotFound", which means
>>> that the matcher seriously doesn't see that request.
>>> Changing over the matcher to a 'regexp' matcher doesn't change, so,
>>> I bet it's the data we feed to the matcher.
>>> Now, changing that matcher to
>>> "谷理子", the
>>> encoding, and running it again, I get my nice page correctly.
>>> I bet that somewhere (I don't know where, but surely somewhere), the
>>> UTF-8 encoded URL converted into a string using the current locale
>>> (MacRoman on my system), or a default of "ISO-8859-1", before the
>>> string is actually given to the sitemap.
>>> Not having the sources at hand at the moment, I can't do a quick
>>> build to put out some debugging instruction, but you get the idea.
>>> Pier
>>
>> --
>> Marc Portier http://outerthought.org/
>> Outerthought - Open Source, Java & XML Competence Support Center
>> Read my weblog at http://blogs.cocoondev.org/mpo/
>> mpo@outerthought.org mpo@apache.org
>>
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
> either a fix, or an explaination and workaround... For now, brrrr, I
> think that the hack is the only way to go...
I don't know about Tomcat, but if you're not on the jetty developers
list, here's the outcome:
Jetty defaults (for compatibility to all the other broken containers,
and because there's no "official standard" about UTF-8 URIs) to
ISO-8859-1. And this ain't great.
Now, the good thing is that if you start your jetty specifying the
"org.mortbay.util.URI.charset" system property, it will use that one as
the charset used for decoding URLs.
So, by putting in "-Dorg.mortbay.util.URI.charset=UTF-8" we get the
expected behavior.
How about setting it up as the default behavior for Cocoon's internal
Jetty distro?
Pier
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
References to non-hack:
http://www.w3.org/International/O-URL-and-ident
Pier
On 16 Aug 2004, at 14:02, Pier Fumagalli wrote:
> Ok, I tracked the sucker down... It's the servlet container... They
> all decode the stupid URL using ISO-8859-1... And therefore, utterly
> incompatible with 3/4 of the non-english-speaking world...
>
> At best, I was able to _HACK_ the whole thing through, by getting the
> path info in this way:
>
> <WARNING note="shit-code-follows">
>
> new String(request.getPathInfo().getBytes("ISO-8859-1"),"UTF-8"));
>
> </WARNING>
>
> Therefore, I get the BYTES of the path-info string as if they were in
> ISO-8859-1, and re-create a new string by taking those bytes and
> forcing them to be in UTF-8...
>
> Niiiiiiiiiiiiiiiiiiice!
>
> Note that this stupidity also happens with accented letters (that for
> us Italians is a big p-i-t-a).
>
> I'll see why this happens in Jetty, I'll poke Jen and Greg to have
> either a fix, or an explaination and workaround... For now, brrrr, I
> think that the hack is the only way to go...
>
> Oh, I checked it also on Tomcat. Same problem there as well...
>
> Pier
>
>
>
> On 16 Aug 2004, at 12:05, Marc Portier wrote:
>
>> Pier,
>>
>>
>> As a coincidence we recently (last week) had a similar post on
>> xreporter-list (which uses cocoon)
>>
>> Bad news is that I didn't track it down to the bottom yet, just some
>> findings below:
>> (in fact the odd-char-in-filename for map:read and map:mount was one
>> of the first things I was going to test, seems I'm already presented
>> with the results)
>>
>>
>> what I did find already was this:
>>
>> Cocoon's Request.getSitemapURI() will return an assembly of
>> javax.servlet.http.HttpServletRequest.getServletPath()
>> + javax.servlet.http.HttpServletRequest.getPathInfo()
>>
>> Servlet spec on those states they will be (url-) decoded
>> Thus 3 char sequences of the kind "%BYTE_HEX" will have been
>> translated into single bytes. The obtained byte-sequence is then
>> decoded using SOME_DECODING (my guess would be using ISO-8859-1, but
>> haven't found yet if this is container specific, modifiable or hard
>> noted in some spec. Only thing I found is this:
>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but
>> I'm yet unsure on how this influences servlet specs, or actual
>> container and even browser implementations for that matter)
>>
>>
>> Alternatively there is:
>> Cocoon's Request.getRequestURI() which maps onto the
>> javax.servlet.http.HttpServletRequest.getRequestURI()
>>
>> This one resembles the URI as transferred over the wire: ie. not
>> (url-)decoded, or in other words still holding the %XX sequences
>>
>>
>> As an extra clarification on all these the servlet spec explicitely
>> states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
>> <quote>
>> It is important to note that, *except for URL encoding differences*
>> between the request URI and the path parts, the following equation is
>> always true:
>>
>> requestURI = contextPath + servletPath + pathInfo
>> </quote>
>>
>>
>> I (for now) assume that this is the same encoding we expect
>> cocoon-deploy people to specify in the 'container-encoding'
>> init-parameter in the web.xml (allowing to correctly en-re-decode
>> request-paramater-values in case of mismatching form and container
>> encodings)
>>
>>
>>
>>
>> Ok, above is dull data, and not much into a direction of any solution
>> yet. My current feeling (long shot, needs time to test and try, and
>> based on above assumption) is that we should
>>
>> In terms of backwards compatibility I'm unsure if we could just go
>> about changing the semantics (histrocally implied use of iso-8859-1
>> encoding) of getSitemapURI() or rather should deprecate and/or have a
>> different method next to it?
>>
>> In any case this new implementation should then probably apply the
>> same kind of dirty en-re-decoding-trick
>>
>> new return(getSitemapURI().getBytes(container_encoding),form_encoding)
>>
>> as we do today with the request param values?
>>
>> (see
>> http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/
>> cocoon/environment/http/HttpRequest.java?annotate=1.11#391
>> sorry for the old cvs-style link, the svn version of viewcvs doesn't
>> seem to support 'annotate' ?)
>>
>>
>> For the record: the fast hack/workaround in the xreporter case was
>> exactly to apply this.
>>
>>
>>
>>
>> Attached to this I'm also seeing the trouble of mount-points in
>> cocoon. I've seen a number of installments needing (well, 'using'
>> at least) some insertion of that
>> part-of-the-URL-that-maps-to-the-mounted-sitemap to be able to have
>> links in source xml.files refer to other resources managed by the
>> same mounted sitemap without the need to explicitely mention that
>> part (but have it dynamically inserted by some xsl in stead).
>>
>> In those occasions I've seen people mostly subtract siteMapURI from
>> requestURI to obtain that prefix part. Regarding the above
>> observations this algorithm will however fail due to encoding
>> differences.
>>
>> My proposal would be to not only add a method for decoding the
>> sitemapURI properly, but in the mean time adding the convenience
>> method to return the mounted-sitemap-part as well on the level of
>> cocoon's request.
>>
>>
>>
>> Above are early observations that need some backing, so comments
>> welcome. (and hoping someone beats me to this since I'm lacking the
>> time to pursue myself)
>> -marc=
>>
>>
>> Pier Fumagalli wrote:
>>> On 12 Aug 2004, at 12:45, roy huang wrote:
>>>> Hi,all:
>>>> Use reader to display jpg or gif is quite simple,like:
>>>> <map:match pattern="*.jpg">
>>>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>>>> </map:match>
>>>> But if the file name is not ASCII but utf-8 or other encoding
>>>> like 花.jpg (simplified Chinese),the resolver didn't resolve the
>>>> name correctly,error occur:
>>>> org.apache.cocoon.ResourceNotFoundException: Error during resolving
>>>> of the input stream:
>>>> org.apache.excalibur.source.SourceNotFoundException: file:/C:/My
>>>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>>>> doesn't exist.
>>>>
>>>> How can I use non-ASCII file name in cocoon?I can't find any
>>>> description or help in wiki or archived mail list.
>>>>
>>>> Roy Huang
>>> It appears indeed as a bug...
>>> I have this sitemap snippet:
>>> <map:match pattern="谷*">
>>> <map:generate src="谷{1}.xml"/>
>>> <map:transform src="welcome.xslt">
>>> <map:parameter name="contextPath"
>>> value="{request:contextPath}"/>
>>> </map:transform>
>>> <map:serialize type="xhtml"/>
>>> </map:match>
>>> and a file on the disk called "谷理子.xml". Somewhere, when I make a
>>> request for "http://localhost:8888/谷理子", the whole thing goes
>>> berserk...
>>> Now, the URL is passed correctly, as I see that in the access log:
>>> INFO (2004-08-16) 10:26.36:538 [access]
>>> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
>>> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>>> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0
>>> B7 E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow
>>> it gets lost in the process.
>>> Now, if I modify my itemap to
>>> <map:match pattern="tanisatoko">
>>> <map:generate src="谷理子.xml"/>
>>> <map:transform src="welcome.xslt">
>>> <map:parameter name="contextPath"
>>> value="{request:contextPath}"/>
>>> </map:transform>
>>> <map:serialize type="xhtml"/>
>>> </map:match>
>>> And I make a request to "http://localhost:8888/tanisatoko", the
>>> thing works perfectly. We can safely exclude the fact that it's the
>>> generation process.
>>> Now, the _odd_ thing I noticed is that in those cases, I get an
>>> error of "PipelineNotFound", not a "ResourceNotFound", which means
>>> that the matcher seriously doesn't see that request.
>>> Changing over the matcher to a 'regexp' matcher doesn't change, so,
>>> I bet it's the data we feed to the matcher.
>>> Now, changing that matcher to
>>> "谷理子", the
>>> encoding, and running it again, I get my nice page correctly.
>>> I bet that somewhere (I don't know where, but surely somewhere), the
>>> UTF-8 encoded URL converted into a string using the current locale
>>> (MacRoman on my system), or a default of "ISO-8859-1", before the
>>> string is actually given to the sitemap.
>>> Not having the sources at hand at the moment, I can't do a quick
>>> build to put out some debugging instruction, but you get the idea.
>>> Pier
>>
>> --
>> Marc Portier http://outerthought.org/
>> Outerthought - Open Source, Java & XML Competence Support Center
>> Read my weblog at http://blogs.cocoondev.org/mpo/
>> mpo@outerthought.org mpo@apache.org
>>
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
Ok, I tracked the sucker down... It's the servlet container... They all
decode the stupid URL using ISO-8859-1... And therefore, utterly
incompatible with 3/4 of the non-english-speaking world...
At best, I was able to _HACK_ the whole thing through, by getting the
path info in this way:
<WARNING note="shit-code-follows">
new String(request.getPathInfo().getBytes("ISO-8859-1"),"UTF-8"));
</WARNING>
Therefore, I get the BYTES of the path-info string as if they were in
ISO-8859-1, and re-create a new string by taking those bytes and
forcing them to be in UTF-8...
Niiiiiiiiiiiiiiiiiiice!
Note that this stupidity also happens with accented letters (that for
us Italians is a big p-i-t-a).
I'll see why this happens in Jetty, I'll poke Jen and Greg to have
either a fix, or an explaination and workaround... For now, brrrr, I
think that the hack is the only way to go...
Oh, I checked it also on Tomcat. Same problem there as well...
Pier
On 16 Aug 2004, at 12:05, Marc Portier wrote:
> Pier,
>
>
> As a coincidence we recently (last week) had a similar post on
> xreporter-list (which uses cocoon)
>
> Bad news is that I didn't track it down to the bottom yet, just some
> findings below:
> (in fact the odd-char-in-filename for map:read and map:mount was one
> of the first things I was going to test, seems I'm already presented
> with the results)
>
>
> what I did find already was this:
>
> Cocoon's Request.getSitemapURI() will return an assembly of
> javax.servlet.http.HttpServletRequest.getServletPath()
> + javax.servlet.http.HttpServletRequest.getPathInfo()
>
> Servlet spec on those states they will be (url-) decoded
> Thus 3 char sequences of the kind "%BYTE_HEX" will have been
> translated into single bytes. The obtained byte-sequence is then
> decoded using SOME_DECODING (my guess would be using ISO-8859-1, but
> haven't found yet if this is container specific, modifiable or hard
> noted in some spec. Only thing I found is this:
> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but
> I'm yet unsure on how this influences servlet specs, or actual
> container and even browser implementations for that matter)
>
>
> Alternatively there is:
> Cocoon's Request.getRequestURI() which maps onto the
> javax.servlet.http.HttpServletRequest.getRequestURI()
>
> This one resembles the URI as transferred over the wire: ie. not
> (url-)decoded, or in other words still holding the %XX sequences
>
>
> As an extra clarification on all these the servlet spec explicitely
> states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
> <quote>
> It is important to note that, *except for URL encoding differences*
> between the request URI and the path parts, the following equation is
> always true:
>
> requestURI = contextPath + servletPath + pathInfo
> </quote>
>
>
> I (for now) assume that this is the same encoding we expect
> cocoon-deploy people to specify in the 'container-encoding'
> init-parameter in the web.xml (allowing to correctly en-re-decode
> request-paramater-values in case of mismatching form and container
> encodings)
>
>
>
>
> Ok, above is dull data, and not much into a direction of any solution
> yet. My current feeling (long shot, needs time to test and try, and
> based on above assumption) is that we should
>
> In terms of backwards compatibility I'm unsure if we could just go
> about changing the semantics (histrocally implied use of iso-8859-1
> encoding) of getSitemapURI() or rather should deprecate and/or have a
> different method next to it?
>
> In any case this new implementation should then probably apply the
> same kind of dirty en-re-decoding-trick
>
> new return(getSitemapURI().getBytes(container_encoding),form_encoding)
>
> as we do today with the request param values?
>
> (see
> http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/
> cocoon/environment/http/HttpRequest.java?annotate=1.11#391
> sorry for the old cvs-style link, the svn version of viewcvs doesn't
> seem to support 'annotate' ?)
>
>
> For the record: the fast hack/workaround in the xreporter case was
> exactly to apply this.
>
>
>
>
> Attached to this I'm also seeing the trouble of mount-points in
> cocoon. I've seen a number of installments needing (well, 'using' at
> least) some insertion of that
> part-of-the-URL-that-maps-to-the-mounted-sitemap to be able to have
> links in source xml.files refer to other resources managed by the same
> mounted sitemap without the need to explicitely mention that part (but
> have it dynamically inserted by some xsl in stead).
>
> In those occasions I've seen people mostly subtract siteMapURI from
> requestURI to obtain that prefix part. Regarding the above
> observations this algorithm will however fail due to encoding
> differences.
>
> My proposal would be to not only add a method for decoding the
> sitemapURI properly, but in the mean time adding the convenience
> method to return the mounted-sitemap-part as well on the level of
> cocoon's request.
>
>
>
> Above are early observations that need some backing, so comments
> welcome. (and hoping someone beats me to this since I'm lacking the
> time to pursue myself)
> -marc=
>
>
> Pier Fumagalli wrote:
>> On 12 Aug 2004, at 12:45, roy huang wrote:
>>> Hi,all:
>>> Use reader to display jpg or gif is quite simple,like:
>>> <map:match pattern="*.jpg">
>>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>>> </map:match>
>>> But if the file name is not ASCII but utf-8 or other encoding
>>> like 花.jpg (simplified Chinese),the resolver didn't resolve the name
>>> correctly,error occur:
>>> org.apache.cocoon.ResourceNotFoundException: Error during resolving
>>> of the input stream:
>>> org.apache.excalibur.source.SourceNotFoundException: file:/C:/My
>>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>>> doesn't exist.
>>>
>>> How can I use non-ASCII file name in cocoon?I can't find any
>>> description or help in wiki or archived mail list.
>>>
>>> Roy Huang
>> It appears indeed as a bug...
>> I have this sitemap snippet:
>> <map:match pattern="谷*">
>> <map:generate src="谷{1}.xml"/>
>> <map:transform src="welcome.xslt">
>> <map:parameter name="contextPath"
>> value="{request:contextPath}"/>
>> </map:transform>
>> <map:serialize type="xhtml"/>
>> </map:match>
>> and a file on the disk called "谷理子.xml". Somewhere, when I make a
>> request for "http://localhost:8888/谷理子", the whole thing goes
>> berserk...
>> Now, the URL is passed correctly, as I see that in the access log:
>> INFO (2004-08-16) 10:26.36:538 [access]
>> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
>> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7
>> E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it
>> gets lost in the process.
>> Now, if I modify my itemap to
>> <map:match pattern="tanisatoko">
>> <map:generate src="谷理子.xml"/>
>> <map:transform src="welcome.xslt">
>> <map:parameter name="contextPath"
>> value="{request:contextPath}"/>
>> </map:transform>
>> <map:serialize type="xhtml"/>
>> </map:match>
>> And I make a request to "http://localhost:8888/tanisatoko", the thing
>> works perfectly. We can safely exclude the fact that it's the
>> generation process.
>> Now, the _odd_ thing I noticed is that in those cases, I get an error
>> of "PipelineNotFound", not a "ResourceNotFound", which means that the
>> matcher seriously doesn't see that request.
>> Changing over the matcher to a 'regexp' matcher doesn't change, so, I
>> bet it's the data we feed to the matcher.
>> Now, changing that matcher to
>> "谷理子", the
>> encoding, and running it again, I get my nice page correctly.
>> I bet that somewhere (I don't know where, but surely somewhere), the
>> UTF-8 encoded URL converted into a string using the current locale
>> (MacRoman on my system), or a default of "ISO-8859-1", before the
>> string is actually given to the sitemap.
>> Not having the sources at hand at the moment, I can't do a quick
>> build to put out some debugging instruction, but you get the idea.
>> Pier
>
> --
> Marc Portier http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at http://blogs.cocoondev.org/mpo/
> mpo@outerthought.org mpo@apache.org
>
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
Ok, I tracked the sucker down... It's the servlet container... They all
decode the stupid URL using ISO-8859-1... And therefore, utterly
incompatible with 3/4 of the non-english-speaking world...
At best, I was able to _HACK_ the whole thing through, by getting the
path info in this way:
<WARNING note="shit-code-follows">
new String(request.getPathInfo().getBytes("ISO-8859-1"),"UTF-8"));
</WARNING>
Therefore, I get the BYTES of the path-info string as if they were in
ISO-8859-1, and re-create a new string by taking those bytes and
forcing them to be in UTF-8...
Niiiiiiiiiiiiiiiiiiice!
Note that this stupidity also happens with accented letters (that for
us Italians is a big p-i-t-a).
I'll see why this happens in Jetty, I'll poke Jen and Greg to have
either a fix, or an explaination and workaround... For now, brrrr, I
think that the hack is the only way to go...
Oh, I checked it also on Tomcat. Same problem there as well...
Pier
On 16 Aug 2004, at 12:05, Marc Portier wrote:
> Pier,
>
>
> As a coincidence we recently (last week) had a similar post on
> xreporter-list (which uses cocoon)
>
> Bad news is that I didn't track it down to the bottom yet, just some
> findings below:
> (in fact the odd-char-in-filename for map:read and map:mount was one
> of the first things I was going to test, seems I'm already presented
> with the results)
>
>
> what I did find already was this:
>
> Cocoon's Request.getSitemapURI() will return an assembly of
> javax.servlet.http.HttpServletRequest.getServletPath()
> + javax.servlet.http.HttpServletRequest.getPathInfo()
>
> Servlet spec on those states they will be (url-) decoded
> Thus 3 char sequences of the kind "%BYTE_HEX" will have been
> translated into single bytes. The obtained byte-sequence is then
> decoded using SOME_DECODING (my guess would be using ISO-8859-1, but
> haven't found yet if this is container specific, modifiable or hard
> noted in some spec. Only thing I found is this:
> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but
> I'm yet unsure on how this influences servlet specs, or actual
> container and even browser implementations for that matter)
>
>
> Alternatively there is:
> Cocoon's Request.getRequestURI() which maps onto the
> javax.servlet.http.HttpServletRequest.getRequestURI()
>
> This one resembles the URI as transferred over the wire: ie. not
> (url-)decoded, or in other words still holding the %XX sequences
>
>
> As an extra clarification on all these the servlet spec explicitely
> states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
> <quote>
> It is important to note that, *except for URL encoding differences*
> between the request URI and the path parts, the following equation is
> always true:
>
> requestURI = contextPath + servletPath + pathInfo
> </quote>
>
>
> I (for now) assume that this is the same encoding we expect
> cocoon-deploy people to specify in the 'container-encoding'
> init-parameter in the web.xml (allowing to correctly en-re-decode
> request-paramater-values in case of mismatching form and container
> encodings)
>
>
>
>
> Ok, above is dull data, and not much into a direction of any solution
> yet. My current feeling (long shot, needs time to test and try, and
> based on above assumption) is that we should
>
> In terms of backwards compatibility I'm unsure if we could just go
> about changing the semantics (histrocally implied use of iso-8859-1
> encoding) of getSitemapURI() or rather should deprecate and/or have a
> different method next to it?
>
> In any case this new implementation should then probably apply the
> same kind of dirty en-re-decoding-trick
>
> new return(getSitemapURI().getBytes(container_encoding),form_encoding)
>
> as we do today with the request param values?
>
> (see
> http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/
> cocoon/environment/http/HttpRequest.java?annotate=1.11#391
> sorry for the old cvs-style link, the svn version of viewcvs doesn't
> seem to support 'annotate' ?)
>
>
> For the record: the fast hack/workaround in the xreporter case was
> exactly to apply this.
>
>
>
>
> Attached to this I'm also seeing the trouble of mount-points in
> cocoon. I've seen a number of installments needing (well, 'using' at
> least) some insertion of that
> part-of-the-URL-that-maps-to-the-mounted-sitemap to be able to have
> links in source xml.files refer to other resources managed by the same
> mounted sitemap without the need to explicitely mention that part (but
> have it dynamically inserted by some xsl in stead).
>
> In those occasions I've seen people mostly subtract siteMapURI from
> requestURI to obtain that prefix part. Regarding the above
> observations this algorithm will however fail due to encoding
> differences.
>
> My proposal would be to not only add a method for decoding the
> sitemapURI properly, but in the mean time adding the convenience
> method to return the mounted-sitemap-part as well on the level of
> cocoon's request.
>
>
>
> Above are early observations that need some backing, so comments
> welcome. (and hoping someone beats me to this since I'm lacking the
> time to pursue myself)
> -marc=
>
>
> Pier Fumagalli wrote:
>> On 12 Aug 2004, at 12:45, roy huang wrote:
>>> Hi,all:
>>> Use reader to display jpg or gif is quite simple,like:
>>> <map:match pattern="*.jpg">
>>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>>> </map:match>
>>> But if the file name is not ASCII but utf-8 or other encoding
>>> like 花.jpg (simplified Chinese),the resolver didn't resolve the name
>>> correctly,error occur:
>>> org.apache.cocoon.ResourceNotFoundException: Error during resolving
>>> of the input stream:
>>> org.apache.excalibur.source.SourceNotFoundException: file:/C:/My
>>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>>> doesn't exist.
>>>
>>> How can I use non-ASCII file name in cocoon?I can't find any
>>> description or help in wiki or archived mail list.
>>>
>>> Roy Huang
>> It appears indeed as a bug...
>> I have this sitemap snippet:
>> <map:match pattern="谷*">
>> <map:generate src="谷{1}.xml"/>
>> <map:transform src="welcome.xslt">
>> <map:parameter name="contextPath"
>> value="{request:contextPath}"/>
>> </map:transform>
>> <map:serialize type="xhtml"/>
>> </map:match>
>> and a file on the disk called "谷理子.xml". Somewhere, when I make a
>> request for "http://localhost:8888/谷理子", the whole thing goes
>> berserk...
>> Now, the URL is passed correctly, as I see that in the access log:
>> INFO (2004-08-16) 10:26.36:538 [access]
>> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
>> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7
>> E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it
>> gets lost in the process.
>> Now, if I modify my itemap to
>> <map:match pattern="tanisatoko">
>> <map:generate src="谷理子.xml"/>
>> <map:transform src="welcome.xslt">
>> <map:parameter name="contextPath"
>> value="{request:contextPath}"/>
>> </map:transform>
>> <map:serialize type="xhtml"/>
>> </map:match>
>> And I make a request to "http://localhost:8888/tanisatoko", the thing
>> works perfectly. We can safely exclude the fact that it's the
>> generation process.
>> Now, the _odd_ thing I noticed is that in those cases, I get an error
>> of "PipelineNotFound", not a "ResourceNotFound", which means that the
>> matcher seriously doesn't see that request.
>> Changing over the matcher to a 'regexp' matcher doesn't change, so, I
>> bet it's the data we feed to the matcher.
>> Now, changing that matcher to
>> "谷理子", the
>> encoding, and running it again, I get my nice page correctly.
>> I bet that somewhere (I don't know where, but surely somewhere), the
>> UTF-8 encoded URL converted into a string using the current locale
>> (MacRoman on my system), or a default of "ISO-8859-1", before the
>> string is actually given to the sitemap.
>> Not having the sources at hand at the moment, I can't do a quick
>> build to put out some debugging instruction, but you get the idea.
>> Pier
>
> --
> Marc Portier http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at http://blogs.cocoondev.org/mpo/
> mpo@outerthought.org mpo@apache.org
>
Re: [Help]How can I use non-ascii file name?
Posted by Marc Portier <mp...@outerthought.org>.
Pier,
As a coincidence we recently (last week) had a similar post on
xreporter-list (which uses cocoon)
Bad news is that I didn't track it down to the bottom yet, just some
findings below:
(in fact the odd-char-in-filename for map:read and map:mount was one of
the first things I was going to test, seems I'm already presented with
the results)
what I did find already was this:
Cocoon's Request.getSitemapURI() will return an assembly of
javax.servlet.http.HttpServletRequest.getServletPath()
+ javax.servlet.http.HttpServletRequest.getPathInfo()
Servlet spec on those states they will be (url-) decoded
Thus 3 char sequences of the kind "%BYTE_HEX" will have been translated
into single bytes. The obtained byte-sequence is then decoded using
SOME_DECODING (my guess would be using ISO-8859-1, but haven't found yet
if this is container specific, modifiable or hard noted in some spec.
Only thing I found is this:
http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but I'm
yet unsure on how this influences servlet specs, or actual container and
even browser implementations for that matter)
Alternatively there is:
Cocoon's Request.getRequestURI() which maps onto the
javax.servlet.http.HttpServletRequest.getRequestURI()
This one resembles the URI as transferred over the wire: ie. not
(url-)decoded, or in other words still holding the %XX sequences
As an extra clarification on all these the servlet spec explicitely
states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
<quote>
It is important to note that, *except for URL encoding differences*
between the request URI and the path parts, the following equation is
always true:
requestURI = contextPath + servletPath + pathInfo
</quote>
I (for now) assume that this is the same encoding we expect
cocoon-deploy people to specify in the 'container-encoding'
init-parameter in the web.xml (allowing to correctly en-re-decode
request-paramater-values in case of mismatching form and container
encodings)
Ok, above is dull data, and not much into a direction of any solution
yet. My current feeling (long shot, needs time to test and try, and
based on above assumption) is that we should
In terms of backwards compatibility I'm unsure if we could just go about
changing the semantics (histrocally implied use of iso-8859-1 encoding)
of getSitemapURI() or rather should deprecate and/or have a different
method next to it?
In any case this new implementation should then probably apply the same
kind of dirty en-re-decoding-trick
new return(getSitemapURI().getBytes(container_encoding),form_encoding)
as we do today with the request param values?
(see
http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/cocoon/environment/http/HttpRequest.java?annotate=1.11#391
sorry for the old cvs-style link, the svn version of viewcvs doesn't
seem to support 'annotate' ?)
For the record: the fast hack/workaround in the xreporter case was
exactly to apply this.
Attached to this I'm also seeing the trouble of mount-points in cocoon.
I've seen a number of installments needing (well, 'using' at least)
some insertion of that part-of-the-URL-that-maps-to-the-mounted-sitemap
to be able to have links in source xml.files refer to other resources
managed by the same mounted sitemap without the need to explicitely
mention that part (but have it dynamically inserted by some xsl in stead).
In those occasions I've seen people mostly subtract siteMapURI from
requestURI to obtain that prefix part. Regarding the above observations
this algorithm will however fail due to encoding differences.
My proposal would be to not only add a method for decoding the
sitemapURI properly, but in the mean time adding the convenience method
to return the mounted-sitemap-part as well on the level of cocoon's request.
Above are early observations that need some backing, so comments
welcome. (and hoping someone beats me to this since I'm lacking the time
to pursue myself)
-marc=
Pier Fumagalli wrote:
> On 12 Aug 2004, at 12:45, roy huang wrote:
>
>> Hi,all:
>> Use reader to display jpg or gif is quite simple,like:
>> <map:match pattern="*.jpg">
>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>> </map:match>
>> But if the file name is not ASCII but utf-8 or other encoding like
>> 花.jpg (simplified Chinese),the resolver didn't resolve the name
>> correctly,error occur:
>> org.apache.cocoon.ResourceNotFoundException: Error during resolving of
>> the input stream: org.apache.excalibur.source.SourceNotFoundException:
>> file:/C:/My
>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>> doesn't exist.
>>
>> How can I use non-ASCII file name in cocoon?I can't find any
>> description or help in wiki or archived mail list.
>>
>> Roy Huang
>
>
> It appears indeed as a bug...
>
> I have this sitemap snippet:
>
> <map:match pattern="谷*">
> <map:generate src="谷{1}.xml"/>
> <map:transform src="welcome.xslt">
> <map:parameter name="contextPath" value="{request:contextPath}"/>
> </map:transform>
> <map:serialize type="xhtml"/>
> </map:match>
>
> and a file on the disk called "谷理子.xml". Somewhere, when I make a
> request for "http://localhost:8888/谷理子", the whole thing goes berserk...
>
> Now, the URL is passed correctly, as I see that in the access log:
>
> INFO (2004-08-16) 10:26.36:538 [access]
> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>
> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7 E7
> 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it gets
> lost in the process.
>
> Now, if I modify my itemap to
>
> <map:match pattern="tanisatoko">
> <map:generate src="谷理子.xml"/>
> <map:transform src="welcome.xslt">
> <map:parameter name="contextPath" value="{request:contextPath}"/>
> </map:transform>
> <map:serialize type="xhtml"/>
> </map:match>
>
> And I make a request to "http://localhost:8888/tanisatoko", the thing
> works perfectly. We can safely exclude the fact that it's the generation
> process.
>
> Now, the _odd_ thing I noticed is that in those cases, I get an error of
> "PipelineNotFound", not a "ResourceNotFound", which means that the
> matcher seriously doesn't see that request.
>
> Changing over the matcher to a 'regexp' matcher doesn't change, so, I
> bet it's the data we feed to the matcher.
>
> Now, changing that matcher to
> "谷理子", the encoding,
> and running it again, I get my nice page correctly.
>
> I bet that somewhere (I don't know where, but surely somewhere), the
> UTF-8 encoded URL converted into a string using the current locale
> (MacRoman on my system), or a default of "ISO-8859-1", before the string
> is actually given to the sitemap.
>
> Not having the sources at hand at the moment, I can't do a quick build
> to put out some debugging instruction, but you get the idea.
>
> Pier
>
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Marc Portier <mp...@outerthought.org>.
Pier,
As a coincidence we recently (last week) had a similar post on
xreporter-list (which uses cocoon)
Bad news is that I didn't track it down to the bottom yet, just some
findings below:
(in fact the odd-char-in-filename for map:read and map:mount was one of
the first things I was going to test, seems I'm already presented with
the results)
what I did find already was this:
Cocoon's Request.getSitemapURI() will return an assembly of
javax.servlet.http.HttpServletRequest.getServletPath()
+ javax.servlet.http.HttpServletRequest.getPathInfo()
Servlet spec on those states they will be (url-) decoded
Thus 3 char sequences of the kind "%BYTE_HEX" will have been translated
into single bytes. The obtained byte-sequence is then decoded using
SOME_DECODING (my guess would be using ISO-8859-1, but haven't found yet
if this is container specific, modifiable or hard noted in some spec.
Only thing I found is this:
http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but I'm
yet unsure on how this influences servlet specs, or actual container and
even browser implementations for that matter)
Alternatively there is:
Cocoon's Request.getRequestURI() which maps onto the
javax.servlet.http.HttpServletRequest.getRequestURI()
This one resembles the URI as transferred over the wire: ie. not
(url-)decoded, or in other words still holding the %XX sequences
As an extra clarification on all these the servlet spec explicitely
states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
<quote>
It is important to note that, *except for URL encoding differences*
between the request URI and the path parts, the following equation is
always true:
requestURI = contextPath + servletPath + pathInfo
</quote>
I (for now) assume that this is the same encoding we expect
cocoon-deploy people to specify in the 'container-encoding'
init-parameter in the web.xml (allowing to correctly en-re-decode
request-paramater-values in case of mismatching form and container
encodings)
Ok, above is dull data, and not much into a direction of any solution
yet. My current feeling (long shot, needs time to test and try, and
based on above assumption) is that we should
In terms of backwards compatibility I'm unsure if we could just go about
changing the semantics (histrocally implied use of iso-8859-1 encoding)
of getSitemapURI() or rather should deprecate and/or have a different
method next to it?
In any case this new implementation should then probably apply the same
kind of dirty en-re-decoding-trick
new return(getSitemapURI().getBytes(container_encoding),form_encoding)
as we do today with the request param values?
(see
http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/cocoon/environment/http/HttpRequest.java?annotate=1.11#391
sorry for the old cvs-style link, the svn version of viewcvs doesn't
seem to support 'annotate' ?)
For the record: the fast hack/workaround in the xreporter case was
exactly to apply this.
Attached to this I'm also seeing the trouble of mount-points in cocoon.
I've seen a number of installments needing (well, 'using' at least)
some insertion of that part-of-the-URL-that-maps-to-the-mounted-sitemap
to be able to have links in source xml.files refer to other resources
managed by the same mounted sitemap without the need to explicitely
mention that part (but have it dynamically inserted by some xsl in stead).
In those occasions I've seen people mostly subtract siteMapURI from
requestURI to obtain that prefix part. Regarding the above observations
this algorithm will however fail due to encoding differences.
My proposal would be to not only add a method for decoding the
sitemapURI properly, but in the mean time adding the convenience method
to return the mounted-sitemap-part as well on the level of cocoon's request.
Above are early observations that need some backing, so comments
welcome. (and hoping someone beats me to this since I'm lacking the time
to pursue myself)
-marc=
Pier Fumagalli wrote:
> On 12 Aug 2004, at 12:45, roy huang wrote:
>
>> Hi,all:
>> Use reader to display jpg or gif is quite simple,like:
>> <map:match pattern="*.jpg">
>> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>> </map:match>
>> But if the file name is not ASCII but utf-8 or other encoding like
>> 花.jpg (simplified Chinese),the resolver didn't resolve the name
>> correctly,error occur:
>> org.apache.cocoon.ResourceNotFoundException: Error during resolving of
>> the input stream: org.apache.excalibur.source.SourceNotFoundException:
>> file:/C:/My
>> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
>> doesn't exist.
>>
>> How can I use non-ASCII file name in cocoon?I can't find any
>> description or help in wiki or archived mail list.
>>
>> Roy Huang
>
>
> It appears indeed as a bug...
>
> I have this sitemap snippet:
>
> <map:match pattern="谷*">
> <map:generate src="谷{1}.xml"/>
> <map:transform src="welcome.xslt">
> <map:parameter name="contextPath" value="{request:contextPath}"/>
> </map:transform>
> <map:serialize type="xhtml"/>
> </map:match>
>
> and a file on the disk called "谷理子.xml". Somewhere, when I make a
> request for "http://localhost:8888/谷理子", the whole thing goes berserk...
>
> Now, the URL is passed correctly, as I see that in the access log:
>
> INFO (2004-08-16) 10:26.36:538 [access]
> (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
> Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
>
> The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7 E7
> 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it gets
> lost in the process.
>
> Now, if I modify my itemap to
>
> <map:match pattern="tanisatoko">
> <map:generate src="谷理子.xml"/>
> <map:transform src="welcome.xslt">
> <map:parameter name="contextPath" value="{request:contextPath}"/>
> </map:transform>
> <map:serialize type="xhtml"/>
> </map:match>
>
> And I make a request to "http://localhost:8888/tanisatoko", the thing
> works perfectly. We can safely exclude the fact that it's the generation
> process.
>
> Now, the _odd_ thing I noticed is that in those cases, I get an error of
> "PipelineNotFound", not a "ResourceNotFound", which means that the
> matcher seriously doesn't see that request.
>
> Changing over the matcher to a 'regexp' matcher doesn't change, so, I
> bet it's the data we feed to the matcher.
>
> Now, changing that matcher to
> "谷理子", the encoding,
> and running it again, I get my nice page correctly.
>
> I bet that somewhere (I don't know where, but surely somewhere), the
> UTF-8 encoded URL converted into a string using the current locale
> (MacRoman on my system), or a default of "ISO-8859-1", before the string
> is actually given to the sitemap.
>
> Not having the sources at hand at the moment, I can't do a quick build
> to put out some debugging instruction, but you get the idea.
>
> Pier
>
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
On 12 Aug 2004, at 12:45, roy huang wrote:
> Hi,all:
> Use reader to display jpg or gif is quite simple,like:
> <map:match pattern="*.jpg">
> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
> </map:match>
> But if the file name is not ASCII but utf-8 or other encoding like
> 花.jpg (simplified Chinese),the resolver didn't resolve the name
> correctly,error occur:
> org.apache.cocoon.ResourceNotFoundException: Error during resolving of
> the input stream: org.apache.excalibur.source.SourceNotFoundException:
> file:/C:/My
> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
> doesn't exist.
>
> How can I use non-ASCII file name in cocoon?I can't find any
> description or help in wiki or archived mail list.
>
> Roy Huang
It appears indeed as a bug...
I have this sitemap snippet:
<map:match pattern="谷*">
<map:generate src="谷{1}.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath"
value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>
and a file on the disk called "谷理子.xml". Somewhere, when I make a
request for "http://localhost:8888/谷理子", the whole thing goes
berserk...
Now, the URL is passed correctly, as I see that in the access log:
INFO (2004-08-16) 10:26.36:538 [access]
(/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7
E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it
gets lost in the process.
Now, if I modify my itemap to
<map:match pattern="tanisatoko">
<map:generate src="谷理子.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath"
value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>
And I make a request to "http://localhost:8888/tanisatoko", the thing
works perfectly. We can safely exclude the fact that it's the
generation process.
Now, the _odd_ thing I noticed is that in those cases, I get an error
of "PipelineNotFound", not a "ResourceNotFound", which means that the
matcher seriously doesn't see that request.
Changing over the matcher to a 'regexp' matcher doesn't change, so, I
bet it's the data we feed to the matcher.
Now, changing that matcher to
"谷理子", the encoding,
and running it again, I get my nice page correctly.
I bet that somewhere (I don't know where, but surely somewhere), the
UTF-8 encoded URL converted into a string using the current locale
(MacRoman on my system), or a default of "ISO-8859-1", before the
string is actually given to the sitemap.
Not having the sources at hand at the moment, I can't do a quick build
to put out some debugging instruction, but you get the idea.
Pier
Re: [Help]How can I use non-ascii file name?
Posted by roy huang <li...@hotmail.com>.
Sorry,it should be :
name1=new Packages.java.lang.String(name);
name2=new Packages.java.lang.String(name1.getBytes("ISO-8859-1"));
cocoon.sendPage(name2);
}
----- Original Message -----
From: "roy huang" <li...@hotmail.com>
To: <de...@cocoon.apache.org>
Sent: Thursday, September 02, 2004 7:13 PM
Subject: Re: [Help]How can I use non-ascii file name?
> After reading all the following mail,I finally using flowscript to solve this problem(thought I don't like this way)
> sitemap:
> <map:match pattern="images">
> <map:call function="display" >
> </map:call>
> </map:match>
> <map:match pattern="*.jpg">
> <map:read mime-type="image/jpg" src="jpg/花.jpg" />
> </map:match>
> flowscript:
> function display(){
> name=cocoon.request.getParameter("name");
> name1=new Packages.java.lang.String(name);
> cocoon.sendPage(name1);
> }
>
> it works,if you want to decode it,you can also :
> name2=new Packages.java.lang.String(name1.getBytes("ISO-8859-1"));
>
> Thought,I don't like this way,just post it hope it is helpful for somebody.
>
> Roy Huang
> ----- Original Message -----
> From: "roy huang" <li...@hotmail.com>
> To: <us...@cocoon.apache.org>; <de...@cocoon.apache.org>
> Sent: Thursday, August 12, 2004 7:45 PM
> Subject: [Help]How can I use non-ascii file name?
>
>
> > Hi,all:
> > Use reader to display jpg or gif is quite simple,like:
> > <map:match pattern="*.jpg">
> > <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
> > </map:match>
> > But if the file name is not ASCII but utf-8 or other encoding like 花.jpg (simplified Chinese),the resolver didn't resolve the name correctly,error occur:
> > org.apache.cocoon.ResourceNotFoundException: Error during resolving of the input stream: org.apache.excalibur.source.SourceNotFoundException: file:/C:/My Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg doesn't exist.
> >
> > How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.
> >
> > Roy Huang
Re: [Help]How can I use non-ascii file name?
Posted by roy huang <li...@hotmail.com>.
After reading all the following mail,I finally using flowscript to solve this problem(thought I don't like this way)
sitemap:
<map:match pattern="images">
<map:call function="display" >
</map:call>
</map:match>
<map:match pattern="*.jpg">
<map:read mime-type="image/jpg" src="jpg/花.jpg" />
</map:match>
flowscript:
function display(){
name=cocoon.request.getParameter("name");
name1=new Packages.java.lang.String(name);
cocoon.sendPage(name1);
}
it works,if you want to decode it,you can also :
name2=new Packages.java.lang.String(name1.getBytes("ISO-8859-1"));
Thought,I don't like this way,just post it hope it is helpful for somebody.
Roy Huang
----- Original Message -----
From: "roy huang" <li...@hotmail.com>
To: <us...@cocoon.apache.org>; <de...@cocoon.apache.org>
Sent: Thursday, August 12, 2004 7:45 PM
Subject: [Help]How can I use non-ascii file name?
> Hi,all:
> Use reader to display jpg or gif is quite simple,like:
> <map:match pattern="*.jpg">
> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
> </map:match>
> But if the file name is not ASCII but utf-8 or other encoding like 花.jpg (simplified Chinese),the resolver didn't resolve the name correctly,error occur:
> org.apache.cocoon.ResourceNotFoundException: Error during resolving of the input stream: org.apache.excalibur.source.SourceNotFoundException: file:/C:/My Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg doesn't exist.
>
> How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.
>
> Roy Huang
Re: [Help]How can I use non-ascii file name?
Posted by "Volkm@r" <pl...@arcor.de>.
roy huang wrote:
> [...]
> How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.
Not yet tested. But maybe the SetCharacterEncodingAction described in
<http://wiki.apache.org/cocoon/RequestParameterEncoding> would help.
--
Volkmar W. Pogatzki
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: [Help]How can I use non-ascii file name?
Posted by Pier Fumagalli <pi...@betaversion.org>.
On 12 Aug 2004, at 12:45, roy huang wrote:
> Hi,all:
> Use reader to display jpg or gif is quite simple,like:
> <map:match pattern="*.jpg">
> <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
> </map:match>
> But if the file name is not ASCII but utf-8 or other encoding like
> 花.jpg (simplified Chinese),the resolver didn't resolve the name
> correctly,error occur:
> org.apache.cocoon.ResourceNotFoundException: Error during resolving of
> the input stream: org.apache.excalibur.source.SourceNotFoundException:
> file:/C:/My
> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/è±.jpg
> doesn't exist.
>
> How can I use non-ASCII file name in cocoon?I can't find any
> description or help in wiki or archived mail list.
>
> Roy Huang
It appears indeed as a bug...
I have this sitemap snippet:
<map:match pattern="谷*">
<map:generate src="谷{1}.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath"
value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>
and a file on the disk called "谷理子.xml". Somewhere, when I make a
request for "http://localhost:8888/谷理子", the whole thing goes
berserk...
Now, the URL is passed correctly, as I see that in the access log:
INFO (2004-08-16) 10:26.36:538 [access]
(/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????'
Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7
E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it
gets lost in the process.
Now, if I modify my itemap to
<map:match pattern="tanisatoko">
<map:generate src="谷理子.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath"
value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>
And I make a request to "http://localhost:8888/tanisatoko", the thing
works perfectly. We can safely exclude the fact that it's the
generation process.
Now, the _odd_ thing I noticed is that in those cases, I get an error
of "PipelineNotFound", not a "ResourceNotFound", which means that the
matcher seriously doesn't see that request.
Changing over the matcher to a 'regexp' matcher doesn't change, so, I
bet it's the data we feed to the matcher.
Now, changing that matcher to
"谷理子", the encoding,
and running it again, I get my nice page correctly.
I bet that somewhere (I don't know where, but surely somewhere), the
UTF-8 encoded URL converted into a string using the current locale
(MacRoman on my system), or a default of "ISO-8859-1", before the
string is actually given to the sitemap.
Not having the sources at hand at the moment, I can't do a quick build
to put out some debugging instruction, but you get the idea.
Pier