You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Dag-Erling Smørgrav <de...@des.no> on 2008/06/24 14:21:39 UTC

svn:charset

The svn:charset property is widely used, and has been for years.  There
are even clients (e.g. SmartSVN) that rely on it when diffing files or
revisions with different encodings.

Is there a good reason why it should not be officially sanctioned and
documented?

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, Jun 24, 2008 at 7:21 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
> The svn:charset property is widely used, and has been for years.  There
> are even clients (e.g. SmartSVN) that rely on it when diffing files or
> revisions with different encodings.
>
> Is there a good reason why it should not be officially sanctioned and
> documented?

I've been on this list since before 1.0 and have never even heard of
this before.  Other than a couple of examples of people proposing that
a property with this name ought to be included in the product, there
is not a lot of mailing list hits in the archive either.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-13 10:42:37 +0200, Dag-Erling Smørgrav wrote:
> One thing you *can't* do with svn:mime-type is specify the encoding
> for a file that doesn't have a media type.

If the file has an encoding (charset), then it is some form of text,
in which case you can still set the media type to text/plain. It may
not be the best media type, but this is better than no media type at
all.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
Karl Fogel <kf...@red-bean.com> writes:
> I don't remember the thread from a year ago, but I think it's not true
> that no one's interested.  It's just that most needs are being met via
> svn:mime-type (although there are some problems with doing it that
> way).

One thing you *can't* do with svn:mime-type is specify the encoding for
a file that doesn't have a media type.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Karl Fogel <kf...@red-bean.com>.
Alexander Kitaev <Al...@svnkit.com> writes:
> We'll change its namespace to "svnkit", so that it would be
> svnkit:charset and svn:charset will never get into release version of
> SVNKit.

Thanks.

> I also would like to say that I see certain positive side in such sort
> of a unintentional "hostage-taking" - about a year ago, when I asked
> whether there are any plans to provide charset conversion support in
> Subversion either using new svn:charset or existing svn:mime-type
> property - I was said that no one interested in that and most probably
> there will be no such feature in Subversion.

Well, we'd certainly listen to your experiences with the svnkit:charset
property.  In other words, custom properties can get promoted to "svn:"
properties (and their names changed accordingly) as we learn how they're
useful.

I don't remember the thread from a year ago, but I think it's not true
that no one's interested.  It's just that most needs are being met via
svn:mime-type (although there are some problems with doing it that way).

Re: svn:charset

Posted by Alexander Kitaev <Al...@svnkit.com>.
Hello,

 > I don't think that the Subversion project needs to negotiate with
 > hostage-takers.
SVNKit uses svn:charset property, but I'd like to clarify that it has 
not been "used for years" and only available in the latest beta version 
of SVNKit.

We'll change its namespace to "svnkit", so that it would be 
svnkit:charset and svn:charset will never get into release version of 
SVNKit.

I also would like to say that I see certain positive side in such sort 
of a unintentional "hostage-taking" - about a year ago, when I asked 
whether there are any plans to provide charset conversion support in 
Subversion either using new svn:charset or existing svn:mime-type 
property - I was said that no one interested in that and most probably 
there will be no such feature in Subversion.

Alexander Kitaev,
TMate Software,
http://svnkit.com/ - Java [Sub]Versioning Library!

David Glasser wrote:
> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
> 
>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>   path of least resistance (though I understand your annoyance at the
>>   encroachment on what I understand is a reserved namespace)
> 
> I don't think that the Subversion project needs to negotiate with
> hostage-takers.
> 
> --dave
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-13 07:38:45 -0700, Kevin Grover wrote:
> Sorry. I must have been too tired when I was reading through the
> messages.

In fact, I mentioned "charmap" because this is the standard term in
POSIX, and the standard way to get the charset/charmap/encoding from
a POSIX shell is "locale charmap".

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Sun, Jul 13, 2008 at 1:39 AM, Dag-Erling Smørgrav <de...@des.no> wrote:

> "Kevin Grover" <ke...@kevingrover.net> writes:
> > From what little I found while looking around, charmap is taken to
> > refer to the Microsoft utility for picking characters (that's what
> > Wikipedia brings up, and just that).  It seems that 'encoding'
> > (meaning character encoding) or 'charset' would indeed be the most
> > commonly used terms.
>
> I'm not sure why you dragged charmap into this - it was never an
> option.
>
> Character sets and encodings are not the same thing, but "charset" has
> historically been used to mean a particular encoding of a particular
> character set.
>

Sorry.  I must have been too tired when I was reading through the messages.

- Kevin

Re: svn:charset

Posted by "C. Michael Pilato" <cm...@collab.net>.
Kevin Grover wrote:
> I don't really know enough to have an option on what the correct
> approach is, but I do agree that more thinking needs to occur.
> 
> This svn:mime-type discussion seems similar to (or at least related
> to) the svn:mime-type discussion that went on not too long ago about
> identifying binary files (and a possible property to help _really_
> know if the file is textual or binary regardless of the intended use).
>  I ranted on that about re-purposing svn:mime-type for unintended
> uses.
> 
> As in that case, it seems to me that it's cleaner to have the actual
> use cases separated out (charset, binary, base-mime-type, whatever)
> and then use whatever info is needed to construct the Content-Type on
> the fly as needed).

+1.

> And, whichever why this leads, I think the autoprops could definitely
> use some revamping.  Is that something that could happen with 1.6?  By
> this I mean, if I (or someone) beat on it and found something people
> like, would it be a candidate for inclusion in 1.6 or 1.7, or would it
> have to wait for a 2.x release.

That depends almost entirely on the end result and, specifically, how 
gracefully it allows previous incarnations of the feature to continue to work.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Tue, Jul 8, 2008 at 4:15 PM, C. Michael Pilato <cm...@collab.net> wrote:
> David Glasser wrote:
>>
>> On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
>>>
>>> "David Glasser" <gl...@davidglasser.net> writes:
>>>>
>>>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>>>
>>>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>>>  path of least resistance (though I understand your annoyance at the
>>>>>  encroachment on what I understand is a reserved namespace)
>>>>
>>>> I don't think that the Subversion project needs to negotiate with
>>>> hostage-takers.
>>>
>>> Heh :-).
>>>
>>> But, as much as I enjoyed David's comment, the OP's sentiment is right:
>>> however we got here, here is where we are.  (Besides, he wasn't
>>> proposing negotiation.)
>>
>> In all seriousness, though, I highly suspect that any group of
>> projects that are so lax that they don't even bother to understand
>> that svn:* are the only properties they *aren't* allowed to use would
>> also have failed to resolve all the issues discussed in this thread.
>> In that case, using svn:charset for a potential new property would be
>> the absolute *worst* thing to do, since it would conflict with
>> poorly-specified prior use.
>>
>> Anyway, if the main problem with the svn:mime-type solution is that
>> it's annoying to parse (which is true), then why don't we just add an
>> svn API that extracts the charset= from a given mime-type string?  (I
>> guess there's still the autoprops issue, but perhaps there is a more
>> general fix to autoprops that could deal with this.)
>
> So, I don't think this has been noted so far in this thread, but we've been
> here before:
>
>   http://svn.haxx.se/dev/archive-2002-08/0674.shtml
>   http://svn.haxx.se/dev/archive-2006-03/1182.shtml
>   (more?)
>
> I'm personally in favor of using a distinct property for this stuff, and am
> not terribly concerned about the sorts of compatibility issues that doing so
> would cause for folks (including us) who today use svn:mime-type -- which
> was intended to be a MIME media type and subtype, not a full-blown
> Content-type HTTP header value -- as the latter.
>
> That said, I don't think "character set" is the right nomenclature. Wouldn't
> svn:encoding be the more accurate description?  (And let's just ignore for a
> minute the "other" interpretation of "encoding" proposed in
> http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)
>
> --
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
>

I don't really know enough to have an option on what the correct
approach is, but I do agree that more thinking needs to occur.

This svn:mime-type discussion seems similar to (or at least related
to) the svn:mime-type discussion that went on not too long ago about
identifying binary files (and a possible property to help _really_
know if the file is textual or binary regardless of the intended use).
 I ranted on that about re-purposing svn:mime-type for unintended
uses.

As in that case, it seems to me that it's cleaner to have the actual
use cases separated out (charset, binary, base-mime-type, whatever)
and then use whatever info is needed to construct the Content-Type on
the fly as needed).

And, whichever why this leads, I think the autoprops could definitely
use some revamping.  Is that something that could happen with 1.6?  By
this I mean, if I (or someone) beat on it and found something people
like, would it be a candidate for inclusion in 1.6 or 1.7, or would it
have to wait for a 2.x release.

- Kevin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-09 00:15:59 +0100, C. Michael Pilato wrote:
> That said, I don't think "character set" is the right nomenclature.  
> Wouldn't svn:encoding be the more accurate description?  (And let's just  
> ignore for a minute the "other" interpretation of "encoding" proposed in  
> http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)

"encoding" can mean too many things, and I think svn:encoding would be
a bad idea concerning the consistency with other standards (precisely
because of this other interpretation). I'd say that svn:charmap would
be a more accurate description.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
"Kevin Grover" <ke...@kevingrover.net> writes:
> From what little I found while looking around, charmap is taken to
> refer to the Microsoft utility for picking characters (that's what
> Wikipedia brings up, and just that).  It seems that 'encoding'
> (meaning character encoding) or 'charset' would indeed be the most
> commonly used terms.

I'm not sure why you dragged charmap into this - it was never an
option.

Character sets and encodings are not the same thing, but "charset" has
historically been used to mean a particular encoding of a particular
character set.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Wed, Jul 9, 2008 at 1:37 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
> "C. Michael Pilato" <cm...@collab.net> writes:
>> That said, I don't think "character set" is the right
>> nomenclature. Wouldn't svn:encoding be the more accurate description?
>
> I've covered this already.  Neither "charset" nor "encoding" is entirely
> correct, but MIME uses "charset" is this sense, while HTTP uses
> "encoding" to mean something completely different, so "charset" is the
> closest you'll get without inventing a new term.
>
> DES
> --
> Dag-Erling Smørgrav - des@des.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
"C. Michael Pilato" <cm...@collab.net> writes:
> That said, I don't think "character set" is the right
> nomenclature. Wouldn't svn:encoding be the more accurate description?

I've covered this already.  Neither "charset" nor "encoding" is entirely
correct, but MIME uses "charset" is this sense, while HTTP uses
"encoding" to mean something completely different, so "charset" is the
closest you'll get without inventing a new term.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by "C. Michael Pilato" <cm...@collab.net>.
David Glasser wrote:
> On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
>> "David Glasser" <gl...@davidglasser.net> writes:
>>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>>
>>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>>   path of least resistance (though I understand your annoyance at the
>>>>   encroachment on what I understand is a reserved namespace)
>>> I don't think that the Subversion project needs to negotiate with
>>> hostage-takers.
>> Heh :-).
>>
>> But, as much as I enjoyed David's comment, the OP's sentiment is right:
>> however we got here, here is where we are.  (Besides, he wasn't
>> proposing negotiation.)
> 
> In all seriousness, though, I highly suspect that any group of
> projects that are so lax that they don't even bother to understand
> that svn:* are the only properties they *aren't* allowed to use would
> also have failed to resolve all the issues discussed in this thread.
> In that case, using svn:charset for a potential new property would be
> the absolute *worst* thing to do, since it would conflict with
> poorly-specified prior use.
> 
> Anyway, if the main problem with the svn:mime-type solution is that
> it's annoying to parse (which is true), then why don't we just add an
> svn API that extracts the charset= from a given mime-type string?  (I
> guess there's still the autoprops issue, but perhaps there is a more
> general fix to autoprops that could deal with this.)

So, I don't think this has been noted so far in this thread, but we've been 
here before:

    http://svn.haxx.se/dev/archive-2002-08/0674.shtml
    http://svn.haxx.se/dev/archive-2006-03/1182.shtml
    (more?)

I'm personally in favor of using a distinct property for this stuff, and am 
not terribly concerned about the sorts of compatibility issues that doing so 
would cause for folks (including us) who today use svn:mime-type -- which 
was intended to be a MIME media type and subtype, not a full-blown 
Content-type HTTP header value -- as the latter.

That said, I don't think "character set" is the right nomenclature. 
Wouldn't svn:encoding be the more accurate description?  (And let's just 
ignore for a minute the "other" interpretation of "encoding" proposed in 
http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn:charset

Posted by David Glasser <gl...@davidglasser.net>.
On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
> "David Glasser" <gl...@davidglasser.net> writes:
>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>
>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>   path of least resistance (though I understand your annoyance at the
>>>   encroachment on what I understand is a reserved namespace)
>>
>> I don't think that the Subversion project needs to negotiate with
>> hostage-takers.
>
> Heh :-).
>
> But, as much as I enjoyed David's comment, the OP's sentiment is right:
> however we got here, here is where we are.  (Besides, he wasn't
> proposing negotiation.)

In all seriousness, though, I highly suspect that any group of
projects that are so lax that they don't even bother to understand
that svn:* are the only properties they *aren't* allowed to use would
also have failed to resolve all the issues discussed in this thread.
In that case, using svn:charset for a potential new property would be
the absolute *worst* thing to do, since it would conflict with
poorly-specified prior use.

Anyway, if the main problem with the svn:mime-type solution is that
it's annoying to parse (which is true), then why don't we just add an
svn API that extracts the charset= from a given mime-type string?  (I
guess there's still the autoprops issue, but perhaps there is a more
general fix to autoprops that could deal with this.)

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Karl Fogel <kf...@red-bean.com>.
"David Glasser" <gl...@davidglasser.net> writes:
> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>
>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>   path of least resistance (though I understand your annoyance at the
>>   encroachment on what I understand is a reserved namespace)
>
> I don't think that the Subversion project needs to negotiate with
> hostage-takers.

Heh :-).

But, as much as I enjoyed David's comment, the OP's sentiment is right:
however we got here, here is where we are.  (Besides, he wasn't
proposing negotiation.)

The right thing to do from this point forward is "whatever's best for
users".  That implies identifying the advantages/disadvantages of
svn:charset vs the svn:mime-type way.  If we determine there the balance
of advantages makes svn:charset desirable, then we need to specify
precisely what effects svn:charset will have; how svn:charset will
interact with legacy charset data appended to svn:mime-type; and
research all current users of svn:charset to determine whether they're
using the property consistently themselves (and if not, figure out what
to do about it).

I've re-read through this thread, and I don't think the above has really
been done yet.  (The only mail that even comes close is
http://svn.haxx.se/dev/archive-2008-06/0948.shtml.)  I don't personally
have the time to do this research; it's a non-trivial task, though
certainly feasible.

I understand that this is a high hurdle, and that I'm sort of putting
"stop energy" on the momentum here.  That's not really my goal.  I just
think Subversion shouldn't adopt a new svn: property without having a
really good reason.  We've done maybe 50% of the thinking we need to do
for this particular one.  I'm merely proposing we not implement it until
we've done that other 50% (which means someone has to supply it).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by David Glasser <gl...@davidglasser.net>.
On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:

>  - Like it or not, svn:charset is already in use; formalizing it is the
>   path of least resistance (though I understand your annoyance at the
>   encroachment on what I understand is a reserved namespace)

I don't think that the Subversion project needs to negotiate with
hostage-takers.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Alan Barrett <ap...@cequrux.com>.
On Fri, 04 Jul 2008, Dag-Erling Smørgrav wrote:
> Sorry about the late reply - I'm used to being Cc:ed on replies to
> mailing list threads in which I participate, so I didn't notice your
> followup right away.

OK, I'll CC you.  My preference is that I not be CC'd on replies to mailing
lists.

> Alan Barrett <ap...@cequrux.com> writes:
> > I really don't like the idea of having two conflicting ways of
> > specifying the same information.  One way (in svn:mine-type) should be
> > enough.
> 
>  - svn:charset is easier to handle for applications that need just the
>    charset, not the media type (incorrectly referred to as mime type)

Yes, it would be easier, if this were being designed from scratch.  But
it's not being designed from scratch.

You say that writing an application to parse a new svn:charset property
is easier than writing an application to extract charset information
from svn:mime-type, and I agree, but I claim that that's not a fair
comparision.  A fair comparison would be between the status quo where
charset information can appear only in svn:mime-type, and a future in
which it's unknown whether charset information apears in svn:mime-type
or in svn:charset or both, so a future application would have to parse
them both, check for consistency, and raise an error if they are
inconsistent.

>  - The full MIME Content-Type parameter syntax is unwieldy.  It allows a
>    variety of quoting and comment styles. [...]
>    How far do you want to go to support the complete syntax?

That's up to the appliction author.

I don't want to get into a long debate about this, and I am not a svn
developer, but I did want to correct what I saw as a misunderstanding of
my position.

--apb (Alan Barrett)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
Sorry about the late reply - I'm used to being Cc:ed on replies to
mailing list threads in which I participate, so I didn't notice your
followup right away.

Alan Barrett <ap...@cequrux.com> writes:
> I really don't like the idea of having two conflicting ways of
> specifying the same information.  One way (in svn:mine-type) should be
> enough.

well,

 - svn:charset is easier to handle for applications that need just the
   charset, not the media type (incorrectly referred to as mime type)

 - The charset is strictly speaking not part of the MIME media type, it
   is an optional parameter for the Content-Type header used in both
   MIME and HTTP.  The media type is the first (unnamed) field, and the
   only non-optional one.

 - The full MIME Content-Type parameter syntax is unwieldy.  It allows a
   variety of quoting and comment styles.  For instance, these two:

     Content-type: text/plain; charset=us-ascii (Plain text)
     Content-type: text/plain; charset="us-ascii"

   are equivalent, and there can be additional semicolon-separated
   parameters, such as the MIME version or anything else you feel like
   adding (as long as it starts with "x-"), which are of no use to
   Subversion.  How far do you want to go to support the complete
   syntax?

 - Like it or not, svn:charset is already in use; formalizing it is the
   path of least resistance (though I understand your annoyance at the
   encroachment on what I understand is a reserved namespace)

 - My patch does not change existing behaviour for people who don't use
   svn:charset, but it improves functionality for those who do (since
   mod_dav_svn now knows about it and uses it)

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Alan Barrett <ap...@cequrux.com>.
On Wed, 25 Jun 2008, Dag-Erling Smørgrav wrote:
> Karl Fogel <kf...@red-bean.com> writes:
> > One advantage of appending to svn:mime-type is that when we serve out
> > the mime-type, those consumers that are prepared to handle a charset
> > addendum get it for free.
> 
> Yes, but it doesn't work with auto-props.

Then the auto-props stuff should grow some kind of quoting or escaping
mechanism to allow them to express properties that contain spaces and
punctuation.

I really don't like the idea of having two conflicting ways of
specifying the same information.  One way (in svn:mine-type) should be
enough.

--apb (Alan Barrett)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
Karl Fogel <kf...@red-bean.com> writes:
> One advantage of appending to svn:mime-type is that when we serve out
> the mime-type, those consumers that are prepared to handle a charset
> addendum get it for free.

Yes, but it doesn't work with auto-props.

>  If that information were in svn:charset
> instead, then when we serve out the mime-type (say, over HTTP through
> mod_dav_svn), would we want to "; " plus the value of svn:charset?

Yes, this is what my patch does.

> If
> so, what do we do when svn:mime-type already specifies a charset, and
> it's not the same as what svn:charset specifies?

AFAIK, the client will ignore the first one.

> (Or is the solution to
> that to check at propset time, and try to avoid ever letting them
> conflict on the same file?)

I'd rather check at propset time that the contents of svn:mime-type
match /^[[:alnum:]-]+/[[:alnum:]-]+$/...

> > I've attached a patch relative to trunk that:
> >
> >  - adds svn:charset to svn_props.h;
> >  - adds it to the help text for propset;
> >  - updates the French and Norwegian translations accordingly (this
> >    doesn't seem to work, but they didn't work before I changed them, nor
> >    do they work in any other language I've tried);
> >  - modifies libsvn_wc to disallow svn:charset on non-file nodes, like it
> >    does for svn:mime-type;
> >  - modifies mod_dav_svn to take svn:charset into account when generating
> >    the Content-Encoding header.
> Thank you for doing this work.  It always helps to post a log message
> along with the patch.

I expect there will be changes before it's committed, anyway.

> Your patch uses TAB for indentation sometimes, and SPACE other times.
> The TAB chars make the indendation be off due to quoting levels when
> expanded inline in an email (such as this reply).  It's no big deal, I
> just mention it in case it's easy for you to use SPACE everywhere.

It's just a matter of telling Emacs to DTRT.

> This doesn't handle the case where mime_type already has an appended
> charset.

See above; but it should be trivial to strip everything after the
semicolon in svn:mime-type (if and only if svn:charset is present)

> Need to specify the namespace for encodings (i.e., whatever the official
> way to refer to the IANA list is).

http://www.iana.org/assignments/character-sets

In practice, most systems only know a subset of these; in particular,
most systems don't know all the names for each character set.  The
correct name for iso-8859-1, for instance, is ISO_8859-1:1987, but the
former is the preferred name for MIME.

BTW, I suspect the reason why translation doesn't work is that the
message IDs are too long.  If I were you, I'd use short symbolic message
IDs (e.g. "SVN_HELP_PROPSET_LONG") and place the full text in an en.po
file.

> By the way, is "charset" the standard word for this?  I know we use it
> informally this way, but as character set and encoding can sometimes be
> different, there might be a more formally correct term.  Thoughts?

Strictly speaking, Unicode is a character set while iso-8859-1 is a
character encoding of a specific subset of Unicode, but historically,
iso8859-1 and the like have been called character sets, and "charset" is
what MIME uses.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Karl Fogel <kf...@red-bean.com>.
Dag-Erling Smørgrav <de...@des.no> writes:
> All I'm asking for is official sanction for the use of the svn:charset
> property to store the IANA character set name that corresponds to the
> file's encoding, independently of its media type.  This is how it's
> being used today by several Open Source projects (a quick search
> uncovers Adium, Mono, Tartarus, Growl and Maccode, not counting projects
> I'm involved in) and by at least one third-party Subversion client
> (SmartSVN, based on SVNKit).
>
> One argument in favor of svn:charset, independently of the above, is
> that unlike the current trick of appending "; charset=%s" to the MIME
> type, it works with auto-props.

One advantage of appending to svn:mime-type is that when we serve out
the mime-type, those consumers that are prepared to handle a charset
addendum get it for free.  If that information were in svn:charset
instead, then when we serve out the mime-type (say, over HTTP through
mod_dav_svn), would we want to "; " plus the value of svn:charset?  If
so, what do we do when svn:mime-type already specifies a charset, and
it's not the same as what svn:charset specifies?  (Or is the solution to
that to check at propset time, and try to avoid ever letting them
conflict on the same file?)

I'm not asking to be difficult, and I'm not opposed in principle to your
proposal.  But there's more involved here than just giving "official
sanction".  When the Subversion project gives official sanction by
putting something in the svn:namespace, we like it to mean we've thought
through the edge cases and handled them :-).

I wish the current svn:charset producers had not used the "svn:"
namespace, because now there may be compatibility concerns that we never
bargained for.

> I've attached a patch relative to trunk that:
>
>  - adds svn:charset to svn_props.h;
>  - adds it to the help text for propset;
>  - updates the French and Norwegian translations accordingly (this
>    doesn't seem to work, but they didn't work before I changed them, nor
>    do they work in any other language I've tried);
>  - modifies libsvn_wc to disallow svn:charset on non-file nodes, like it
>    does for svn:mime-type;
>  - modifies mod_dav_svn to take svn:charset into account when generating
>    the Content-Encoding header.

Thank you for doing this work.  It always helps to post a log message
along with the patch.  I guess the above is the log message; we can
reformat it into the format specified by

   http://subversion.tigris.org/hacking.html#log-messages

(referenced from http://subversion.tigris.org/hacking.html#patches.)

Your patch uses TAB for indentation sometimes, and SPACE other times.
The TAB chars make the indendation be off due to quoting levels when
expanded inline in an email (such as this reply).  It's no big deal, I
just mention it in case it's easy for you to use SPACE everywhere.

On to the substance of the patch:

> Index: subversion/mod_dav_svn/liveprops.c
> ===================================================================
> --- subversion/mod_dav_svn/liveprops.c	(revision 31863)
> +++ subversion/mod_dav_svn/liveprops.c	(working copy)
> @@ -434,6 +434,7 @@
>             safe (and consistent) to assume the same on the server.  */
>          svn_string_t *pval;
>          const char *mime_type = NULL;
> +	const char *charset = NULL;
>  
>          if (resource->baselined && resource->type == DAV_RESOURCE_TYPE_VERSION)
>            return DAV_PROP_INSERT_NOTSUPP;
> @@ -478,9 +479,27 @@
>                  svn_error_clear(serr);
>                  return DAV_PROP_INSERT_NOTDEF;
>                }
> +
> +            if ((serr = svn_fs_node_prop(&pval, resource->info->root.root,
> +                                         resource->info->repos_path,
> +                                         SVN_PROP_CHARSET, p)))
> +              {
> +                svn_error_clear(serr);
> +                pval = NULL;
> +              }
> +
> +            if (pval)
> +              charset = pval->data;
>            }
>  
> -        value = mime_type;
> +	if (charset != NULL)
> +	  {
> +	    value = apr_psprintf(p, "%s; charset=%s", mime_type, charset);
> +	  }
> +	else
> +	  {
> +	    value = mime_type;
> +	  }
>          break;
>        }

This doesn't handle the case where mime_type already has an appended
charset.

> Index: subversion/svn/main.c
> ===================================================================
> --- subversion/svn/main.c	(revision 31863)
> +++ subversion/svn/main.c	(working copy)
> @@ -747,6 +747,7 @@
>       "      whether to merge the file, and how to serve it from Apache.\n"
>       "      A mimetype beginning with 'text/' (or an absent mimetype) is\n"
>       "      treated as text.  Anything else is treated as binary.\n"
> +     "    svn:charset  - the character encoding of the file.\n"
>       "    svn:externals  - A newline separated list of module specifiers,\n"
>       "      each of which consists of a relative directory path, optional\n"
>       "      revision flags and an URL.  The ordering of the three elements\n"

Need to specify the namespace for encodings (i.e., whatever the official
way to refer to the IANA list is).

> Index: subversion/include/svn_props.h
> ===================================================================
> --- subversion/include/svn_props.h	(revision 31863)
> +++ subversion/include/svn_props.h	(working copy)
> @@ -235,6 +235,9 @@
>  /** The mime-type of a given file. */
>  #define SVN_PROP_MIME_TYPE  SVN_PROP_PREFIX "mime-type"
>  
> +/** The character encoding of a given file. */
> +#define SVN_PROP_CHARSET  SVN_PROP_PREFIX "charset"
> +
>  /** The ignore patterns for a given directory. */
>  #define SVN_PROP_IGNORE  SVN_PROP_PREFIX "ignore"

By the way, is "charset" the standard word for this?  I know we use it
informally this way, but as character set and encoding can sometimes be
different, there might be a more formally correct term.  Thoughts?

I haven't looked through the code to see if there are other spots that
would be affected by this.

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
Branko Čibej <br...@xbc.nu> writes:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > The svn:charset property is widely used, and has been for years.  There
> > are even clients (e.g. SmartSVN) that rely on it when diffing files or
> > revisions with different encodings.
> Do all the clients that use it expect it to conform to the same specs?
>
> Whoever intruded on the reserved svn: property name namespace should
> git off their behind and do the groundwork to make sure all uses are
> consistent, then post patches to this list.

All I'm asking for is official sanction for the use of the svn:charset
property to store the IANA character set name that corresponds to the
file's encoding, independently of its media type.  This is how it's
being used today by several Open Source projects (a quick search
uncovers Adium, Mono, Tartarus, Growl and Maccode, not counting projects
I'm involved in) and by at least one third-party Subversion client
(SmartSVN, based on SVNKit).

One argument in favor of svn:charset, independently of the above, is
that unlike the current trick of appending "; charset=%s" to the MIME
type, it works with auto-props.

I've attached a patch relative to trunk that:

 - adds svn:charset to svn_props.h;
 - adds it to the help text for propset;
 - updates the French and Norwegian translations accordingly (this
   doesn't seem to work, but they didn't work before I changed them, nor
   do they work in any other language I've tried);
 - modifies libsvn_wc to disallow svn:charset on non-file nodes, like it
   does for svn:mime-type;
 - modifies mod_dav_svn to take svn:charset into account when generating
   the Content-Encoding header.

DES
-- 
Dag-Erling Smørgrav - des@des.no


Re: svn:charset

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tue, Jun 24, 2008 at 4:20 PM, Branko Čibej <br...@xbc.nu> wrote:
> FWIW, I did a google search for svn:charset; the only exact match I found
> was this thread.
>
> Not saying that necessarily proves anything, but it isn't insignificant,
> either.

There is one conversation on svnkit's users list, but I'm not sure if
it was actually implemented (could be since SmartSVN uses SVNKit).
And that discussion occurred earlier this year:
  http://www.nabble.com/Codepage-conversion-td15091376.html

It looks like Trac will look for charset information in the mime-type,
as evidenced by this bug report:
  http://trac.edgewall.org/ticket/1518

Not sure it affects much... but I thought it might be useful to know.

-John

Re: svn:charset

Posted by Branko Čibej <br...@xbc.nu>.
FWIW, I did a google search for svn:charset; the only exact match I 
found was this thread.

Not saying that necessarily proves anything, but it isn't insignificant, 
either.

-- Brane


Ben Collins-Sussman wrote:
> On Tue, Jun 24, 2008 at 9:27 AM, Branko Čibej <br...@xbc.nu> wrote:
>   
>> Dag-Erling Smørgrav wrote:
>>     
>>> The svn:charset property is widely used, and has been for years.  There
>>> are even clients (e.g. SmartSVN) that rely on it when diffing files or
>>> revisions with different encodings.
>>>
>>> Is there a good reason why it should not be officially sanctioned and
>>> documented?
>>>
>>>       
>> Do all the clients that use it expect it to conform to the same specs? My
>> guess is -- no.
>>
>> Whoever intruded on the reserved svn: property name namespace should git off
>> their behind and do the groundwork to make sure all uses are consistent,
>> then post patches to this list.
>>     
>
> Agreed.  I've never heard of this property, ever, in the entire
> history of the Subversion project.  I know that some svn clients out
> there have started making use of 'bugtraq:*' properties for
> issue-tracker integration, and that's fine.  But if clients have
> started inventing new svn:* properties, that's a violation of the svn
> API -- only 'core' svn libraries are supposed to implement those
> properties.
>
> Is SmartSVN the only client using svn:charset?  Do they document it
> somewhere?  (Keep in mind that SmartSVN isn't using the svn libraries
> at all -- it's a complete third-party implementation of svn in java.
> But still, inventing new svn: properties is quite audacious.)
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On Tue, Jun 24, 2008 at 9:27 AM, Branko Čibej <br...@xbc.nu> wrote:
> Dag-Erling Smørgrav wrote:
>>
>> The svn:charset property is widely used, and has been for years.  There
>> are even clients (e.g. SmartSVN) that rely on it when diffing files or
>> revisions with different encodings.
>>
>> Is there a good reason why it should not be officially sanctioned and
>> documented?
>>
>
> Do all the clients that use it expect it to conform to the same specs? My
> guess is -- no.
>
> Whoever intruded on the reserved svn: property name namespace should git off
> their behind and do the groundwork to make sure all uses are consistent,
> then post patches to this list.

Agreed.  I've never heard of this property, ever, in the entire
history of the Subversion project.  I know that some svn clients out
there have started making use of 'bugtraq:*' properties for
issue-tracker integration, and that's fine.  But if clients have
started inventing new svn:* properties, that's a violation of the svn
API -- only 'core' svn libraries are supposed to implement those
properties.

Is SmartSVN the only client using svn:charset?  Do they document it
somewhere?  (Keep in mind that SmartSVN isn't using the svn libraries
at all -- it's a complete third-party implementation of svn in java.
But still, inventing new svn: properties is quite audacious.)

Re: svn:charset

Posted by Branko Čibej <br...@xbc.nu>.
Dag-Erling Smørgrav wrote:
> The svn:charset property is widely used, and has been for years.  There
> are even clients (e.g. SmartSVN) that rely on it when diffing files or
> revisions with different encodings.
>
> Is there a good reason why it should not be officially sanctioned and
> documented?
>   

Do all the clients that use it expect it to conform to the same specs? 
My guess is -- no.

Whoever intruded on the reserved svn: property name namespace should git 
off their behind and do the groundwork to make sure all uses are 
consistent, then post patches to this list.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org