You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Philippe Poulard <Ph...@sophia.inria.fr> on 2005/08/10 14:36:26 UTC

[VFS] URI normalization

Should VFS normalize URIs before parsing a file name ?

--- URI normalization ---

URI references require encoding and escaping of certain characters. The 
disallowed characters include all non-ASCII characters, plus the 
excluded characters listed in Section 2.4 of [RFC 2396], except for the 
number sign (#) and percent sign (%) characters and the square bracket 
characters re-allowed in [RFC 2732].
The set of excluded US-ASCII characters is :
  [00-20]    [22] [3C] [3E] [5C] [5E] [60] [7B-7D] [7F]
   C0  SPACE   "    <    >    \    ^    `   { | }   DEL

Escaping disallowed characters is performed as follows:
1. Each disallowed character is converted to UTF-8 [RFC 2279] as one or 
more bytes.
2. Any octets corresponding to a disallowed character are escaped with 
the URI escaping mechanism (that is, converted to %HH, where HH is the 
hexadecimal notation of the octet value). If escaping must be performed, 
uppercase hexadecimal characters should be used.
3. The original character is replaced by the resulting character sequence.
Note that this normalization process is idempotent: repeated 
normalization does not change a normalized URI reference.

-- 
Cordialement,

            ///
           (. .)
  -----ooO--(_)--Ooo-----
|   Philippe Poulard    |
  -----------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [VFS] URI normalization

Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi Philippe!
>> Do you have a problem with the current behaviour?
> not yet ;)
>
> as a french person, I'll try VFS with a file named "à la pêche.xml" 
> and tell you if I encounter any problem
I am looking forward to hear from you. :-)
In advance - charset/encoding/printers/filetransfers are a pain in our 
business ;-) ...

Ciao,
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [VFS] URI normalization

Posted by Philippe Poulard <Ph...@sophia.inria.fr>.
Mario Ivankovits wrote:
> Hi!
> 
>> Should VFS normalize URIs before parsing a file name ?
> 
> Do you have a problem with the current behaviour?

not yet ;)

i'm dealing with XML resources and some XML standards oblige to use 
normalized URIs

as a french person, I'll try VFS with a file named "à la pêche.xml" and 
tell you if I encounter any problem

> VFS do not encode special characters other than "%" and sometimes "?" 
> (url based fs) and sometimes "!" (layers/zip fs).
> 
> Before any parsing the filename is DEcoded. For VFS it is needed to have 
> a consistent view of the filename even if it is encoded or decoded - 
> that does not matter.
> 
> If I encode/normalize it, all visual representation of the filename 
> looks a little bit strange.
> 
> Even if the VFS filename looks like a URI I think we could still treat 
> it simply as "VFS filename" - human readable with minimum encoding.
> The filesystem implemenation is responsible to to encode it at needed 
> (e.g. take session charset into account).
> 
> For sure, you can reverse all said above ... but what's the advantage of 
> it?
> 
> The BIG disadvantage is to have to deal e.g. with charsets in VFS core. 
> If the filename is encoded we have to know which charset was used.
> 
> 
> Ciao,
> Mario
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
> 


-- 
Cordialement,

            ///
           (. .)
  -----ooO--(_)--Ooo-----
|   Philippe Poulard    |
  -----------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [VFS] URI normalization

Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi!
> Should VFS normalize URIs before parsing a file name ?
Do you have a problem with the current behaviour?
VFS do not encode special characters other than "%" and sometimes "?" 
(url based fs) and sometimes "!" (layers/zip fs).

Before any parsing the filename is DEcoded. For VFS it is needed to have 
a consistent view of the filename even if it is encoded or decoded - 
that does not matter.

If I encode/normalize it, all visual representation of the filename 
looks a little bit strange.

Even if the VFS filename looks like a URI I think we could still treat 
it simply as "VFS filename" - human readable with minimum encoding.
The filesystem implemenation is responsible to to encode it at needed 
(e.g. take session charset into account).

For sure, you can reverse all said above ... but what's the advantage of it?

The BIG disadvantage is to have to deal e.g. with charsets in VFS core. 
If the filename is encoded we have to know which charset was used.


Ciao,
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org