You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Benjamin Bentmann <be...@udo.edu> on 2008/04/06 12:26:38 UTC

Re: Common Bugs

Hi,

6) URLs and Filesystem Paths

URLs and filesystem paths are really two different beasts and converting 
between them is not trivial. The main source of problems is that different 
encoding rules apply for the strings that make up a URL or filesystem path.

For example, consider the following code snippet:

  File file = new File( "foo bar+foo" );
  URL url = file.toURI().toURL();
  System.out.println( file.toURL() );
  System.out.println( url );
  System.out.println( url.getPath() );
  System.out.println( URLDecoder.decode( url.getPath(), "UTF-8" ) );

which outputs something like

  file:/M:/scratch-pad/foo bar+foo
  file:/M:/scratch-pad/foo%20bar+foo
  /M:/scratch-pad/foo%20bar+foo
  /M:/scratch-pad/foo bar foo

First of all, please note that File.toURL() [1] does not escape the space 
character. This yields an invalid URL, as per RFC 2396 [0], section 2.4.3 
"Excluded US-ASCII Characters". The class java.net.URL will silently accept 
such invalid URLs, in contrast java.net.URI will not (see also URL.toURI() 
[2]). For this reason, this API method has already been deprecated and 
should be replaced with File.toURI().toURL().

Next, URL.getPath() does in general not return a string that can be used as 
a filesystem path. It returns a substring of the URL and as such can contain 
escape sequences. The prominent example is the space character which will 
show up as "%20". People sometimes hack around this by means of 
replace("%20", " ") but that does simply not cover all cases. It's worth to 
mention that on the other hand the related method URI.getPath() [3] does 
decode escapes but still the result is not a filesystem path (compare the 
source for the constructor File(URI)).

To decode a URL, people sometimes also choose java.net.URLDecoder [4]. The 
pitfall with this class is that is actually performs HTML form decoding 
which is yet another encoding and not the same as the URL encoding (compare 
last paragraph in class javadoc about java.net.URL). For instance, a 
URLDecoder will errorneously convert the character "+" into a space as 
illustrated by the last sysout in the example above.

Code targetting JRE 1.4+ should easily avoid these problems by using

  new File( new URI( url.toString() ) )

when converting a URL to a filesystem path and

  file.toURI().toURL()

when converting back.

Regards,


Benjamin Bentmann


[0] http://www.faqs.org/rfcs/rfc2396.html
[1] http://java.sun.com/javase/6/docs/api/java/io/File.html#toURL()
[2] http://java.sun.com/javase/6/docs/api/java/net/URL.html#toURI()
[3] http://java.sun.com/javase/6/docs/api/java/net/URI.html#getPath()
[4] http://java.sun.com/javase/6/docs/api/java/net/URLDecoder.html 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Common Bugs

Posted by Vincent Siveton <vi...@gmail.com>.
2008/4/7, Benjamin Bentmann <be...@udo.edu>:
> > I wonder if it's worth posting these as a series under the developers
> > section of the Maven site?
> >
>
>  Vincent and I had already put parts of this stuff onto [0] in a section
>  named "Some Pitfalls", together with a link to this mail thread. But I
>  agree, having all of this in a nicely formatted APT doc on the site is a
>  good idea.
>
>  I suggest we move the "Some Pitfalls" section out into a standalone
> document such that we can list it on the documentation index. If nobody else
> goes for this, it will need to wait some days until I get the next free time
> slice to merge and clean it up for proper presentation.
>

+1

Vincent

>
>  Benjamin
>
>
>  [0]
> http://maven.apache.org/guides/plugin/guide-java-plugin-development.html
>
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>  For additional commands, e-mail: dev-help@maven.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Common Bugs

Posted by Benjamin Bentmann <be...@udo.edu>.
> I wonder if it's worth posting these as a series under the developers
> section of the Maven site?

Vincent and I had already put parts of this stuff onto [0] in a section
named "Some Pitfalls", together with a link to this mail thread. But I
agree, having all of this in a nicely formatted APT doc on the site is a
good idea.

I suggest we move the "Some Pitfalls" section out into a standalone document 
such that we can list it on the documentation index. If nobody else goes for 
this, it will need to wait some days until I get the next free time slice to 
merge and clean it up for proper presentation.


Benjamin


[0] http://maven.apache.org/guides/plugin/guide-java-plugin-development.html


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Common Bugs

Posted by Barrie Treloar <ba...@gmail.com>.
On Mon, Apr 7, 2008 at 6:21 AM, Brett Porter <br...@apache.org> wrote:
> Hey Benjamin,
>
>  I wonder if it's worth posting these as a series under the developers
> section of the Maven site?

And building custom Checsktyle/PMD rules (if built in ones dont exist)
to flag these as error?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Common Bugs

Posted by Brett Porter <br...@apache.org>.
Hey Benjamin,

I wonder if it's worth posting these as a series under the developers  
section of the Maven site?

- Brett

On 06/04/2008, at 9:46 PM, Benjamin Bentmann wrote:

>> new File( new URI( url.toString() ) )
>
> Correction:
> JRE 1.4 is happily returning invalid/unescaped URLs from  
> ClassLoader.getResource(), making the above suggestion fail with a  
> URISyntaxException.
>
> The new suggestion is to use FileUtils.toFile(URL) [0] from Commons  
> IO. A similar methods exists in Plexus Utils but it's currently not  
> decoding escape sequences.
>
>
> Benjamin
>
>
> [0] http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Common Bugs

Posted by Benjamin Bentmann <be...@udo.edu>.
>  new File( new URI( url.toString() ) )

Correction:
JRE 1.4 is happily returning invalid/unescaped URLs from 
ClassLoader.getResource(), making the above suggestion fail with a 
URISyntaxException.

The new suggestion is to use FileUtils.toFile(URL) [0] from Commons IO. A 
similar methods exists in Plexus Utils but it's currently not decoding 
escape sequences.


Benjamin


[0] 
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL) 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org