You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Hapax <we...@hapax.qc.ca> on 2007/05/21 08:55:24 UTC

[users@httpd] Automatic rewrite of Latin-1 and Accentless URL to UTF-8 IRI

Hello,

    I was wondering if someone would know how to automatically support 
the rewriting (aliasing ?) of URL on a whole site to do the following.

    Support IRI of the type :

        http://hapax.qc.ca/Recettes/No%C3%ABl.html (viz. Noël.html in UTF-8)

    while at the same time support URLs encoded in Latin-1 and 
automatically point to the above resource :

        http://hapax.qc.ca/Recettes/No%EBl.html

    and yet better also support ASCII ones for the diacritically-impaired :

        http://hapax.qc.ca/Recettes/Noel.html

    I would like this to be automatically done across all URLs and not 
have to list all the URLs in their three forms...

    The idea here is to gracefully migrate to RFC 3987 (and having UTF-8 
encoded resources names on the server) while supporting various 
constituencies of users, some which will use the old file names (no 
accents) and those using for instance Firefox in its default setting 
which will send the Latin-1 %-encoded URLs. Somehow, I believe this 
could be a common problem for Europeans moving to Utf-8 encoded URLs.

    Did somebody already work on such a problem? Is there a classical 
solution to this problem?

    Many thanks in advance for any insights,

P. A.







   


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Automatic rewrite of Latin-1 and Accentless URL to UTF-8 IRI

Posted by Patrick Andries <we...@hapax.qc.ca>.
Hapax a écrit :

> Nick Kew a écrit :
>
>> On Mon, 21 May 2007 02:55:24 -0400
>> Hapax <we...@hapax.qc.ca> wrote:
>>
>>  
>>
>>> Hello,
>>>
>>>    I was wondering if someone would know how to automatically
>>> support the rewriting (aliasing ?) of URL on a whole site to do the
>>> following.
>>>   
>>
>>
>> [chop]
>>
>> Are you talking about HTTP headers or document bodies?
>> In the former case, I wonder if you could devise an
>> all-in-one regexp to use with "Header edit"?
>>  
>>
> Only the headers. I don't want three different pages or three pages 
> whose content would be rewritten automatically.
>
>  I just want users to be able to get the same file (Noël.html) whether 
> they adress the page as No%C3%ABl.html (viz. Noël.html in UTF-8) , 
> No%EBl.html (Latin-1) or  Noel.html.
>
> Just dealing with No%C3%ABl.html (viz. Noël.html in UTF-8)  and 
> No%EBl.html (Latin-1)  is also fine.
>
> I was wondering if this was feasible through some url rewrite or alias 
> (or maybe some parameter, but there I think I'm asking for too much) 
> and whether someone had already done this.
>
> P. A.



I think this is what 
http://dev.w3.org/cvsweb/apache-modules/mod_fileiri/mod_fileiri.c is 
doing. http://www.w3.org/2003/Talks/0904-IUC-IRI/slide19-0.html

Is it available in Apache 2.2? If not, why not?

P. A.



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Automatic rewrite of Latin-1 and Accentless URL to UTF-8 IRI

Posted by Nick Kew <ni...@webthing.com>.
On Mon, 21 May 2007 02:55:24 -0400
Hapax <we...@hapax.qc.ca> wrote:

> Hello,
> 
>     I was wondering if someone would know how to automatically
> support the rewriting (aliasing ?) of URL on a whole site to do the
> following.

[chop]

Are you talking about HTTP headers or document bodies?
In the former case, I wonder if you could devise an
all-in-one regexp to use with "Header edit"?

In the latter case, you need to parse the output to 
know what is a URL.  You could use mod_proxy_html
with a regexp rule as above, or with version 3
just select ASCII output and the issue goes away.


-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org