You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Edward Toro <ed...@RocketSoftware.com> on 2004/03/10 21:37:41 UTC

Tomcat 5.0.19 international filenames inaccessible

Short version:

Does Tomcat 5 no longer serve files with international characters in their filenames?


Long version:

Environment:  Tomcat 5.1.19 on WinXP Pro

I have a file located in: <tomcat-home>/<webapps>/MyWebApp/.  The filename contains international characters:  0x305f 0x3079 0x304f (a.k.a E3-81-9F E3-81-B9 E3-81-8F in UTF-8)).

When I navigate to the directory via http://<server>:8080/<webappname>/ I get a directory listing of the files in that directory.  I can access every file on that list except those that contain international characters.

When I click on a filename that contains international characters, I'm sent to http://<server>:8080/<webappname>%E3%81%9F%E3%81%B9%E3%81%8F.xml.  This is the correct result of putting the filename through a URLEncoder with the UTF-8 character set, which is what I assume is being done behind by the scene by the server.  Except the file doesn't appear.  I get a 404 error.

So I made some Java testing code:

try {
    URL url = new URL("http://<server>:8080/<webapp>/%E3%81%9F%E3%81%B9%E=
3%81%8F.xml");
    HttpURLConnection conn = (HttpURLConnection)url.openConnection();

    // checking the headers
    String header;
    String key;
    int i = 0;
    while ((header = conn.getHeaderField(i)) != null) {
 	key = conn.getHeaderFieldKey(i);
 	System.out.println(key + " = " + header);
 	i++;
    }

    // checking the content
    InputStream is = url.openConnection().getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    int chr;
    while ((chr = isr.read()) != -1) {
 	System.out.print((char)chr);
    }
    System.out.println("success");
} catch (Throwable t) { t.printStackTrace(); }

The headers I get back are:
HTTP/1.1 404 /<webapp>/%E3%81%9F%E3%81%B9%E3%81%8F.scene.xml
Content-Type = text/html;charset=ISO-8859-1
Content-Language = en-US
Content-Length = 1091
Date = Wed, 10 Mar 2004 18:02:01 GMT
Server = Apache-Coyote/1.1

No help there because I get those same headers when I try to access a file that doesn't exist at all:

HTTP/1.1 404 /<webapp>/inexistent.xml
Content-Type = text/html;charset=3DISO-8859-1
Content-Language = en-US
Content-Length = 1040
Date = Wed, 10 Mar 2004 18:03:22 GMT
Server = Apache-Coyote/1.1

When I try to access the input stream to read for content, I get a FileNotFoundException.

I'm pretty confident that this problem does not exist in Tomcat 4.

I'm also pretty confident that this problem is not related to the characters being 3-byte UTF-8.  I've tested using 2-byte UTF-8 (D0-9F, D1-80) and the result is the same.

Is this a bug?

-Ed Toro


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: Tomcat 5.0.19 international filenames inaccessible

Posted by Yansheng Lin <ya...@silvacom.com>.
Hi, I got the same error when I tried to view
"http://localhost:8080/j2e/jsp/%E5%AE%9B%E5%90%8D.jsp".  And I've been
researching on this for a few days now whenever I got some free time.  But the
only way that seemed have worked for others has been adding/setting the 
	D:\java\bin\java.exe -Dfile.encoding==UTF-8 ... 
in your catalina config file.  But I don't like the solution myself:_(.  Not
entirely sure why either:).

Let us know if you get it working somehow through any other way. 

Thanks!

-Yan

-----Original Message-----
From: Edward Toro [mailto:ed.toro@RocketSoftware.com] 
Sent: Wednesday, March 10, 2004 1:38 PM
To: Tomcat Users List
Subject: Tomcat 5.0.19 international filenames inaccessible


Short version:

Does Tomcat 5 no longer serve files with international characters in their
filenames?


Long version:

Environment:  Tomcat 5.1.19 on WinXP Pro

I have a file located in: <tomcat-home>/<webapps>/MyWebApp/.  The filename
contains international characters:  0x305f 0x3079 0x304f (a.k.a E3-81-9F
E3-81-B9 E3-81-8F in UTF-8)).

When I navigate to the directory via http://<server>:8080/<webappname>/ I get a
directory listing of the files in that directory.  I can access every file on
that list except those that contain international characters.

When I click on a filename that contains international characters, I'm sent to
http://<server>:8080/<webappname>%E3%81%9F%E3%81%B9%E3%81%8F.xml.  This is the
correct result of putting the filename through a URLEncoder with the UTF-8
character set, which is what I assume is being done behind by the scene by the
server.  Except the file doesn't appear.  I get a 404 error.

So I made some Java testing code:

try {
    URL url = new URL("http://<server>:8080/<webapp>/%E3%81%9F%E3%81%B9%E=
3%81%8F.xml");
    HttpURLConnection conn = (HttpURLConnection)url.openConnection();

    // checking the headers
    String header;
    String key;
    int i = 0;
    while ((header = conn.getHeaderField(i)) != null) {
 	key = conn.getHeaderFieldKey(i);
 	System.out.println(key + " = " + header);
 	i++;
    }

    // checking the content
    InputStream is = url.openConnection().getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    int chr;
    while ((chr = isr.read()) != -1) {
 	System.out.print((char)chr);
    }
    System.out.println("success");
} catch (Throwable t) { t.printStackTrace(); }

The headers I get back are:
HTTP/1.1 404 /<webapp>/%E3%81%9F%E3%81%B9%E3%81%8F.scene.xml
Content-Type = text/html;charset=ISO-8859-1
Content-Language = en-US
Content-Length = 1091
Date = Wed, 10 Mar 2004 18:02:01 GMT
Server = Apache-Coyote/1.1

No help there because I get those same headers when I try to access a file that
doesn't exist at all:

HTTP/1.1 404 /<webapp>/inexistent.xml
Content-Type = text/html;charset=3DISO-8859-1
Content-Language = en-US
Content-Length = 1040
Date = Wed, 10 Mar 2004 18:03:22 GMT
Server = Apache-Coyote/1.1

When I try to access the input stream to read for content, I get a
FileNotFoundException.

I'm pretty confident that this problem does not exist in Tomcat 4.

I'm also pretty confident that this problem is not related to the characters
being 3-byte UTF-8.  I've tested using 2-byte UTF-8 (D0-9F, D1-80) and the
result is the same.

Is this a bug?

-Ed Toro


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org