You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Alexander Klimetschek <al...@mindquarry.com> on 2007/02/04 23:52:32 UTC

URL decoding in blocks-fw

Hi Daniel,

i had some problems with the decoding of the URL in the blocks-fw-impl that I have fixed locally. As I am still using the state of november/december and did not yet look at the new service/servlet 
code, I send you the problem and fixed code snippets via this mail. Maybe this is similar in your new implementation, but maybe those problems no longer exist. In that case please disregard this email ;-)

The problem was that URLs with query parameters containing special characters like = or & that are also used for delimiting parameters inside the query string did not work. E.g. when sending some 
parameter value that contains a & it was decoded falsely. This was due to the fact that the URL is decoded in BlockConnection.parseBlockURI() and then reencoded. Decoding was done for the entire query 
string, before the parameter were split. This exposes the problem that if there is a & encoded in a parameter, it would now be seen as a parameter delimiter, if one splits the params afterwards. So in 
theory one encodes like this (that is automatically done by browser or other code i am using):

encode('a') + '=' + encode('value with &') + '&' + encode('b') + '=' + encode('value with =')

And during decoding you first split up into parameters, then into key and value and then decode each separately.

The java.net.URI class is not very helpful in this situation, since it builds upon the concept of encoding or decoding the entire query string. That's why I removed the usage of the URI constructor to 
re-construct a changed url because they always encode the complete query string. If it is already encoded (containing %xy) they get encoded again in a completely wrong way. If you use the decoded 
query string, it might contain decoded & or = inside the parameters that won't get encoded again. Instead I create the url manually to avoid this URI behaviour. See the code at the end.

The other changes needed were the creation of the RequestParameter helper class with the non-decoded raw query string in BlockCallHttpServletRequest:

         this.parameters = new RequestParameters(this.uri.getRawQuery());

and using NetUtils.decode inside the RequestParameter class (the encoding should be taken from the appropriate cocoon system property):

     public RequestParameters(String queryString) {
         this.names = new HashMap(5);
         if (queryString != null) {
             StringTokenizer st = new StringTokenizer(queryString, "&");
             while (st.hasMoreTokens()) {
                 String pair = st.nextToken();
                 int pos = pair.indexOf('=');
                 if (pos != -1) {
                     try {
                         this.setParameter(
                                 NetUtils.decode(pair.substring(0, pos), "utf-8"),
                                 NetUtils.decode(pair.substring(pos+1, pair.length()), "utf-8")
                         );
                     } catch (UnsupportedEncodingException e) {
                         throw new IllegalArgumentException(e);
                     }
                 }
             }
         }
     }


Finally the rewritten BlockConnection.parseBlockURI():

     private URI parseBlockURI(URI uri) throws URISyntaxException {
         // Can't happen
         if (!uri.isAbsolute()) {
             throw new URISyntaxException(uri.toString(),
                                          "Only absolute URIs are allowed for the block protocol.");
         }
         this.logger.debug("BlockSource: resolving " + uri.toString() + " with scheme " +
                 uri.getScheme() + " and ssp " + uri.getRawSchemeSpecificPart());

         URI subURI = new URI(uri.getRawSchemeSpecificPart());

         this.logger.debug("BlockSource: resolved to " + subURI.toString());

         this.blockName = subURI.getScheme();

         // All URIs, also relative are resolved and processed from the block manager
         // FIXME: This will not be a system global id, as the blockName is block local.

         // Manually build the URI because decoding and then recoding the query
         // does not work on the query-string-level: it needs to be done after
         // each parameter pair (eg. 'a=b') has been extracted and split into key
         // and value. Then both parts have to be decoded (they might not only
         // contain umlaute but also the delimiters '=' and '&'). But this is not
         // possible when using the java.net.URI() constructors with multiple
         // parameters for each part (eg. scheme, path, query) because the query
         // will always be encoded as entire string. If you pass something that
         // is already encoded, eg. has lots of %xy inside, it will mess up those
         // existing encodings. Thus you can only use the URI(String) constructor
         // for parsing *existing*, thus already encoded, URIs or build the URI
         // by hand like below:
         String ssp = this.blockName + ":" + subURI.getRawPath() + "?" + subURI.getRawQuery();

         // build a new URI that has the previous one only as scheme-specific part
         this.systemId = (new URI(uri.getScheme(), ssp, null)).toString();

         // again, built the URI manually so that the query part does not get mixed up
         return new URI(uri.getScheme() + ":" + subURI.getRawPath() + "?" + subURI.getRawQuery());
     }


Alex


-- 
Alexander Klimetschek
http://www.mindquarry.com