You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2005/04/16 15:17:25 UTC

svn.collab.net httpd configuration: default charset

If you visit a URL like

  http://svn.collab.net/repos/svn/tags/1.2.0-rc1/CHANGES

you will see some UTF-8 text misrenderd as ISO-8859-1, because the
response has a header of

  Content-Type: text/plain; charset=ISO-8859-1

Since CHANGES, like most text files in Subversion, has no
svn:mime-type, the mod_dav_svn default of "text/plain" is used.  httpd
automatically adds the value of AddDefaultCharset to this mime type,
which on svn.collab.net is evidently set to the default value of
ISO-8859-1.

I suggest that AddDefaultCharset on svn.collab.net be set to UTF-8.
That could presumably be scoped to the Subversion project if desired.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by "Øyvind A. Holm" <su...@sunbase.org>.
On 2005-04-16 21:43:09 Greg Hudson wrote:
> On Sat, 2005-04-16 at 18:23, Marcus Rueckert wrote:
> > i would remove the default charset at all. otherwise scripts which 
> > are in latin1 might look weird.
>
> We have scripts in latin1?

These files contains non-UTF-8 sequences in trunk:

trunk$ grep -r . . | nosvn | find_inv_utf8 | cut -f 1 -d : | sort | uniq
./INSTALL
./doc/translations/french/appendices.texi
./doc/translations/french/client.texi
./doc/translations/french/getting_started.texi
./doc/translations/french/repos_admin.texi
./doc/translations/french/svn-handbook-french.texi
./doc/translations/russian/misc-docs/quick_walkthrough.xml
./notes/fs-improvements.txt
./packages/windows-innosetup/Readme.txt
./packages/windows-innosetup/svn.iss
./packages/windows-innosetup/tools/svnpath/svnpath.rc
./www/httpd-win32.patch.txt
trunk$

All of them are text files, except the windows-innosetup stuff which 
probably have to be that way.

AddDefaultCharset UTF-8 — yesthankyou.

-- sunny256

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

detecting charset wishlist enhancement

Posted by Karl Chen <qu...@NOSPAM.quarl.org>.
I imagine that the text-encoding of a file in Subversion might be
useful to other applications than just mod_dav_svn.  In any case,
once the method for specifying charset in properties is ironed
out, I suggest 'svn add' detect the charset of a text/* document
when possible.

For example, UTF-16 can be inferred from the presence of an
initial zero-width non-breaking space; charset can be specified by
a "-*- encoding: foo -*-" line on the first or second line (the
Emacs format, which at least Python also supports).


-- 
Karl 2005-04-17 01:05


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2005-04-16 at 18:23, Marcus Rueckert wrote:
> i would remove the default charset at all. otherwise scripts which are
> in latin1 might look weird.

We have scripts in latin1?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Marcus Rueckert <da...@web.de>.
On 2005-04-16 11:17:25 -0400, Greg Hudson wrote:
> I suggest that AddDefaultCharset on svn.collab.net be set to UTF-8.
> That could presumably be scoped to the Subversion project if desired.

i would remove the default charset at all. otherwise scripts which are
in latin1 might look weird.

just my 2 cents

darix

-- 
irssi - the client of the smart and beautiful people

              http://www.irssi.de/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Greg Hudson <gh...@MIT.EDU> writes:

> Perhaps.  But we can't change the svn:mime-type for historical
> and tagged versions of text documents in Subversion.

Ah, that's a good reason for using AddDefaultCharset, though it
still seems useful to set svn:mime-type as well.

> It doesn't?  Why would we want to use any other encoding for
> text files in the Subversion repository?

I said "feel"; the implication is that i don't have a solid
reason to offer.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2005-04-16 at 14:47, Eric Gillespie wrote:
> Greg Hudson <gh...@MIT.EDU> writes:
> 
> > Why?
> 
> On the assumption that more than just web browsers might be
> interested in the encoding of a document.

Perhaps.  But we can't change the svn:mime-type for historical and
tagged versions of text documents in Subversion.

>   Less importantly, it
> just doesn't "feel right" to slap UTF-8 on all documents.

It doesn't?  Why would we want to use any other encoding for text files
in the Subversion repository?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Greg Hudson <gh...@MIT.EDU> writes:

> Why?

On the assumption that more than just web browsers might be
interested in the encoding of a document.  Less importantly, it
just doesn't "feel right" to slap UTF-8 on all documents.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2005-04-17 at 03:07, Justin Erenkrantz wrote:
> FWIW, httpd 2.0.53 removed AddDefaultCharset from the default httpd.conf
> configuration because of all the brokenness default charsets introduce.

As far as I can tell, all of the arguments presented in the issue you
cite make sense when considering a piece of software imposing a default
charset on an unsuspecting server administrator, but do not make sense
when considering a server administrator specifying a default charset.

> So, I believe keeping that directive present is really a bad idea.  See:

So I don't see how this follows.

> So, +1 to Eric's idea as it is the best one.  (Historical versions of CHANGES
> not showing up in UTF-8 aren't that big of a deal, IMHO.)

Nonetheless, I don't see what the problem is in saying "our text
documents are almost all UTF-8 documents".

(I saw Oyvind's list of files containing non-UTF-8 sequences, but as far
as I can tell they're a combination of rare exceptions and mistakes.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sat, Apr 16, 2005 at 01:56:10PM -0400, Greg Hudson wrote:
> On Sat, 2005-04-16 at 13:50, Eric Gillespie wrote:
> > I'd prefer
> > AddDefaultCharset off and set svn:mime-type on CHANGES to
> > 'text/plain; charset=UTF-8'.
> 
> Why?

FWIW, httpd 2.0.53 removed AddDefaultCharset from the default httpd.conf
configuration because of all the brokenness default charsets introduce.
So, I believe keeping that directive present is really a bad idea.  See:

http://issues.apache.org/bugzilla/show_bug.cgi?id=23421

for more information as to why we removed this.

So, +1 to Eric's idea as it is the best one.  (Historical versions of CHANGES
not showing up in UTF-8 aren't that big of a deal, IMHO.)  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2005-04-16 at 13:50, Eric Gillespie wrote:
> I'd prefer
> AddDefaultCharset off and set svn:mime-type on CHANGES to
> 'text/plain; charset=UTF-8'.

Why?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Ben Collins-Sussman <su...@collab.net> writes:

> As a followup to this question -- what was the latest thinking
> on charsets in general?  mod_dav_svn already looks for
> svn:mime-type when sending a file, and sets the Content-type:
> header appropriately.  Didn't somebody once propose creating an
> svn:charset property that mod_dav_svn could also notice?

MIME types have parameters; see RFC 2045.  I'd prefer
AddDefaultCharset off and set svn:mime-type on CHANGES to
'text/plain; charset=UTF-8'.  mod_dav_svn already supports this.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn.collab.net httpd configuration: default charset

Posted by Ben Collins-Sussman <su...@collab.net>.
On Apr 16, 2005, at 10:17 AM, Greg Hudson wrote:

> If you visit a URL like
>
>   http://svn.collab.net/repos/svn/tags/1.2.0-rc1/CHANGES
>
> you will see some UTF-8 text misrenderd as ISO-8859-1, because the
> response has a header of
>
>   Content-Type: text/plain; charset=ISO-8859-1
>
> Since CHANGES, like most text files in Subversion, has no
> svn:mime-type, the mod_dav_svn default of "text/plain" is used.  httpd
> automatically adds the value of AddDefaultCharset to this mime type,
> which on svn.collab.net is evidently set to the default value of
> ISO-8859-1.
>
> I suggest that AddDefaultCharset on svn.collab.net be set to UTF-8.
> That could presumably be scoped to the Subversion project if desired.
>


As a followup to this question -- what was the latest thinking on 
charsets in general?  mod_dav_svn already looks for svn:mime-type when 
sending a file, and sets the Content-type: header appropriately.  
Didn't somebody once propose creating an svn:charset property that 
mod_dav_svn could also notice?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org