You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2007/04/12 19:12:56 UTC

DO NOT REPLY [Bug 42105] New: - Patch for mod_autoindex to set the character set

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105

           Summary: Patch for mod_autoindex to set the character set
           Product: Apache httpd-2
           Version: 2.3-HEAD
          Platform: All
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_autoindex
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: poeml@suse.de


[Summarizing from the dev list here. See http://marc.info/?l=apache-httpd-
dev&m=117027634505806&w=2 and following posts.]

Users have a problem with directory listings generated by mod_autoindex:
It is not possible to control the character setting which which the
response is marked. The server cannot know what the real encoding on
disk is, it decides on a very rough guess based on the OS it is running
on: APR_HAS_UNICODE_FS, which is, as far (as little) as I looked, 1 on
Windows, and 0 on Linux. Depending on it, mod_autoindex decides whether
to add a (fixed) charset to the content type:

#if APR_HAS_UNICODE_FS                                                                                                                                                                                                                     
    ap_set_content_type(r, "text/html;charset=utf-8");
#else                                                                                                                                                                                                                                      
    ap_set_content_type(r, "text/html");
#endif                                                                                                                                                                                                                                     

Thing is, that Linux uses filesystems that encode UTF-8 since ages, and
since a system-wide UTF-8 locale is becoming more and more widespread,
filenames encoded as such are occurring much more frequently. This
means, that on many servers the content type needs to be set
appropriately, so the browser can display things correctly.

My first thought was to define APR_HAS_UNICODE_FS to 1, but that could
be just as wrong; it only means that the filesystem is unicode capable
but not that the actual filenames happen to be encoded like that.
Instead, it only depends on site specific needs.

Thus, I think the right way is to make the character set configurable.
I am attaching a patch which adds a "AddDirectoryIndexCharset" directive
to the mod_autoindex configuration.

The patch actually removes the dependency on APR_HAS_UNICODE_FS. My
train of thought here is that utf-8 can (and should) be the default,
unless configured otherwise. This fits Windows (it has always been like
that), and it (largely) fits Linux. But I don't know about other
platforms.

On Thu, Feb 01, 2007 at 11:13:38AM -0600, William A. Rowe, Jr. wrote:
> Dr. Peter Poeml wrote:                                                                                                                                                                                                                   
> > On Thu, Feb 01, 2007 at 10:59:46 +0000, Joe Orton wrote:                                                                                                                                                                               
> >> On Wed, Jan 31, 2007 at 09:45:12PM +0100, Dr. Peter Poeml wrote:                                                                                                                                                                      
> >>> Users have a problem with directory listings generated by mod_autoindex:                                                                                                                                                             
> >>> It is not possible to control the character setting which which the                                                                                                                                                                  
> >>> response is marked.                                                                                                                                                                                                                  
> >> AddDefaultCharset does allow this already as you mention in the bug.                                                                                                                                                                  
> >> Can't users who insist on using filenames using one encoding and file                                                                                                                                                                 
> >> content using another simply use:                                                                                                                                                                                                     
> >>                                                                                                                                                                                                                                       
> >> AddDefaultCharset UTF-8                                                                                                                                                                                                               
> >> AddCharset ISO-8859-1 .html                                                                                                                                                                                                           
> >>                                                                                                                                                                                                                                       
> >> or similar?                                                                                                                                                                                                                           
> >                                                                                                                                                                                                                                        
> > I don't think so, because it means                                                                                                                                                                                                     
> >  1) that all .html files would need to be ISO-8859-1                                                                                                                                                                                   
> >  2) you cannot have files with charset=somethingelse anymore                                                                                                                                                                           
> >  3) all non-html files would need to be UTF-8 then, unless you add                                                                                                                                                                     
> >     AddCharset directives for all of them...                                                                                                                                                                                           
>                                                                                                                                                                                                                                          
> And you can't match by name.  I'm reviewing the patch, but I'll already                                                                                                                                                                  
> offer a +1 on the concept.                                                                                                                                                                                                               

On Thu, Feb 01, 2007 at 10:01:52PM +0100, Ruediger Pluem wrote:
> In the general case I agree with Joe that if things can be done with existing                                                                                                                                                            
> directives / code, no new directives / code should be added, but this case here                                                                                                                                                          
> is different.                                                                                                                                                                                                                            
>                                                                                                                                                                                                                                          
> I think it is the ultimate duty of the content generator to set the correct                                                                                                                                                              
> content type / encoding. So in this case this would be mod_autoindex. Whether                                                                                                                                                            
> mod_autoindex detects this automatically or has a directive to set this is another                                                                                                                                                       
> story. Currently I would be in favour of a directive provided that there is                                                                                                                                                              
> no reliable and performant autodetection mechanism.                                                                                                                                                                                      
>                                                                                                                                                                                                                                          
> From my point of view AddDefaultCharset and AddCharset should be used to                                                                                                                                                                 
>                                                                                                                                                                                                                                          
> - configure the "core content generator" of httpd (serving static files)                                                                                                                                                                 
> - help fixing broken content generators who cannot set the encoding correctly                                                                                                                                                            
>   by themselves                                                                                                                                                                                                                          
>                                                                                                                                                                                                                                          
> So +1 on the general concept.                                                                                                                                                                                                            

Cool.

Here is the patch against trunk, with documentation added.

I hope I got the way of patching the documentation right. A review would
be very much appreciated.

Thanks,
Peter

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 42105] - Patch for mod_autoindex to set the character set

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105


wrowe@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 42105] - Patch for mod_autoindex to set the character set

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105





------- Additional Comments From poeml@suse.de  2007-04-12 10:15 -------
Created an attachment (id=19938)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=19938&action=view)
Patch for trunk adding AddDirectoryIndexCharset directive


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 42105] - Patch for mod_autoindex to set the character set

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105





------- Additional Comments From wrowe@apache.org  2007-08-30 14:45 -------
Something similar was created to add IndexOptions Type=content/type Charset=foo

and will be available in the next 2.0 and 2.2 releases of httpd.

We are a bit premature to presume a utf-8 on unix-ish systems, because by
definition they are bytestreams.  But that said, OS/X made it explicit that
filenames are UTF-8, so we follow your suggestion on at least one 'unix' :)

Thank you for your report!

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 42105] - Patch for mod_autoindex to set the character set

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105


poeml@suse.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org