You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Gunnar Brand <g....@interface-business.de> on 2003/11/04 16:41:43 UTC

Double URLdecoding problem with request.getSitemapURI()

Hello!

While creating an application that maps a complete path to a resourcereader 
(to retrieve documents from a storage transparently), I noticed a bug(?). 
Whenever the name/path of the file contained a '+' (of course properly 
encoded as %2b), it didn't find the file. The storage server echoes the 
looked up path/file and instead of the '+' there was a ' ' (+ is a 
placeholder for ' ' ).

Since the %2b does work if I get parameters directly from the request, I 
deduced that there must be some double url decoding going on. After some 
investigation it was clear that the incorrect url was fed into the reader 
and wildcard matcher so it had to happen a bit earlier already.

After a quick modification in the samples sitemap (adding ** in front of 
the match) and the RequestGenerator, I could use any path I wanted. The 
generator displayed not only the request.getRequestURI() but also the 
request.getSitemapURI().

RequestGenerator.java:
this.attribute(attr,"target", request.getRequestURI());
this.attribute(attr,"sitemaptarget", request.getSitemapURI());   // <-- added

With a url like
http://rei:8080/samples/a%20test%20dir%20with%20a%20plus%20at%20the%20end%2B/request.html?test=%20%2bx+y
it prints (shortened a bit):

<h:request 
target="/samples/a%20test%20dir%20with%20a%20plus%20at%20the%20end%2B/request.html"
   sitemaptarget="a test dir with a plus at the end /request.html" source="">
<h:requestParameters>
   <h:parameter name="test">
     <h:value> +x y</h:value>
   </h:parameter>
</h:requestParameters>
</h:request>

So it obviously the sitemap uri was decoded twice. The culprit seems to be 
the CocoonServlet.java, so I added a small debug output (code below is from 
cvs HEAD):

   public void service(HttpServletRequest req, HttpServletResponse res)
     throws ServletException, IOException {

         // We got it... Process the request
         String uri = request.getServletPath();
     System.out.println("request.getServletPath():" + uri);  // added
         if (uri == null) {
             uri = "";
         }
         String pathInfo = request.getPathInfo();

         .....

         Environment env;
         try{
             if (uri.charAt(0) == '/') {
                 uri = uri.substring(1);
             }
 >>> line 1087:
             env = getEnvironment(URLDecoder.decode(uri), request, res);
         } catch (Exception e) {
         ...

The debug output is:
request.getServletPath():/samples/a test dir with a plus at the 
end+/request.html

So the request.getServletPath() method returns a "url" that is already 
properly decoded and that is being decoded for the second time in line 
1087. This is true for both Jetty and Tomcat4.1.

Unfortunately a look into the Servlet API does not indicate if 
getServletPath is supposed to return a decoded or still URLencoded string.


getServletPath()
public java.lang.String getServletPath()
Returns the part of this request's URL that calls the servlet. This includes
either the servlet name or a path to the servlet, but does not include any
extra path information or a query string.
Same as the value of the CGI variable SCRIPT_NAME.

Returns: a String containing the name or path of the servlet being called,
          as specified in the request URL


The big question now is, is this a bug - or are there cases where this 
method is returning encoded strings?
(For me it does look like one and I need to remove it to get my application 
working ;)

Gunnar.




-- 
G. Brand - interface:projects GmbH
Tolkewitzer Strasse 49
D-01277 Dresden


Re: Double URLdecoding problem with request.getSitemapURI()

Posted by Joerg Heinicke <jh...@virbus.de>.
Please add this to bugzilla. Thanks for your effort for finding the 
origin of the bug.

Joerg

On 04.11.2003 16:41, Gunnar Brand wrote:

> Hello!
> 
> While creating an application that maps a complete path to a 
> resourcereader (to retrieve documents from a storage transparently), I 
> noticed a bug(?). Whenever the name/path of the file contained a '+' (of 
> course properly encoded as %2b), it didn't find the file. The storage 
> server echoes the looked up path/file and instead of the '+' there was a 
> ' ' (+ is a placeholder for ' ' ).
> 
> Since the %2b does work if I get parameters directly from the request, I 
> deduced that there must be some double url decoding going on. After some 
> investigation it was clear that the incorrect url was fed into the 
> reader and wildcard matcher so it had to happen a bit earlier already.
> 
> After a quick modification in the samples sitemap (adding ** in front of 
> the match) and the RequestGenerator, I could use any path I wanted. The 
> generator displayed not only the request.getRequestURI() but also the 
> request.getSitemapURI().
> 
> RequestGenerator.java:
> this.attribute(attr,"target", request.getRequestURI());
> this.attribute(attr,"sitemaptarget", request.getSitemapURI());   // <-- 
> added
> 
> With a url like
> http://rei:8080/samples/a%20test%20dir%20with%20a%20plus%20at%20the%20end%2B/request.html?test=%20%2bx+y 
> 
> it prints (shortened a bit):
> 
> <h:request 
> target="/samples/a%20test%20dir%20with%20a%20plus%20at%20the%20end%2B/request.html" 
> 
>   sitemaptarget="a test dir with a plus at the end /request.html" 
> source="">
> <h:requestParameters>
>   <h:parameter name="test">
>     <h:value> +x y</h:value>
>   </h:parameter>
> </h:requestParameters>
> </h:request>
> 
> So it obviously the sitemap uri was decoded twice. The culprit seems to 
> be the CocoonServlet.java, so I added a small debug output (code below 
> is from cvs HEAD):
> 
>   public void service(HttpServletRequest req, HttpServletResponse res)
>     throws ServletException, IOException {
> 
>         // We got it... Process the request
>         String uri = request.getServletPath();
>     System.out.println("request.getServletPath():" + uri);  // added
>         if (uri == null) {
>             uri = "";
>         }
>         String pathInfo = request.getPathInfo();
> 
>         .....
> 
>         Environment env;
>         try{
>             if (uri.charAt(0) == '/') {
>                 uri = uri.substring(1);
>             }
>  >>> line 1087:
>             env = getEnvironment(URLDecoder.decode(uri), request, res);
>         } catch (Exception e) {
>         ...
> 
> The debug output is:
> request.getServletPath():/samples/a test dir with a plus at the 
> end+/request.html
> 
> So the request.getServletPath() method returns a "url" that is already 
> properly decoded and that is being decoded for the second time in line 
> 1087. This is true for both Jetty and Tomcat4.1.
> 
> Unfortunately a look into the Servlet API does not indicate if 
> getServletPath is supposed to return a decoded or still URLencoded string.
> 
> 
> getServletPath()
> public java.lang.String getServletPath()
> Returns the part of this request's URL that calls the servlet. This 
> includes
> either the servlet name or a path to the servlet, but does not include any
> extra path information or a query string.
> Same as the value of the CGI variable SCRIPT_NAME.
> 
> Returns: a String containing the name or path of the servlet being called,
>          as specified in the request URL
> 
> 
> The big question now is, is this a bug - or are there cases where this 
> method is returning encoded strings?
> (For me it does look like one and I need to remove it to get my 
> application working ;)
> 
> Gunnar.