You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Dag-Erling Smørgrav <de...@des.no> on 2008/07/04 08:30:55 UTC

Re: svn:charset

Sorry about the late reply - I'm used to being Cc:ed on replies to
mailing list threads in which I participate, so I didn't notice your
followup right away.

Alan Barrett <ap...@cequrux.com> writes:
> I really don't like the idea of having two conflicting ways of
> specifying the same information.  One way (in svn:mine-type) should be
> enough.

well,

 - svn:charset is easier to handle for applications that need just the
   charset, not the media type (incorrectly referred to as mime type)

 - The charset is strictly speaking not part of the MIME media type, it
   is an optional parameter for the Content-Type header used in both
   MIME and HTTP.  The media type is the first (unnamed) field, and the
   only non-optional one.

 - The full MIME Content-Type parameter syntax is unwieldy.  It allows a
   variety of quoting and comment styles.  For instance, these two:

     Content-type: text/plain; charset=us-ascii (Plain text)
     Content-type: text/plain; charset="us-ascii"

   are equivalent, and there can be additional semicolon-separated
   parameters, such as the MIME version or anything else you feel like
   adding (as long as it starts with "x-"), which are of no use to
   Subversion.  How far do you want to go to support the complete
   syntax?

 - Like it or not, svn:charset is already in use; formalizing it is the
   path of least resistance (though I understand your annoyance at the
   encroachment on what I understand is a reserved namespace)

 - My patch does not change existing behaviour for people who don't use
   svn:charset, but it improves functionality for those who do (since
   mod_dav_svn now knows about it and uses it)

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-13 10:42:37 +0200, Dag-Erling Smørgrav wrote:
> One thing you *can't* do with svn:mime-type is specify the encoding
> for a file that doesn't have a media type.

If the file has an encoding (charset), then it is some form of text,
in which case you can still set the media type to text/plain. It may
not be the best media type, but this is better than no media type at
all.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
Karl Fogel <kf...@red-bean.com> writes:
> I don't remember the thread from a year ago, but I think it's not true
> that no one's interested.  It's just that most needs are being met via
> svn:mime-type (although there are some problems with doing it that
> way).

One thing you *can't* do with svn:mime-type is specify the encoding for
a file that doesn't have a media type.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Karl Fogel <kf...@red-bean.com>.
Alexander Kitaev <Al...@svnkit.com> writes:
> We'll change its namespace to "svnkit", so that it would be
> svnkit:charset and svn:charset will never get into release version of
> SVNKit.

Thanks.

> I also would like to say that I see certain positive side in such sort
> of a unintentional "hostage-taking" - about a year ago, when I asked
> whether there are any plans to provide charset conversion support in
> Subversion either using new svn:charset or existing svn:mime-type
> property - I was said that no one interested in that and most probably
> there will be no such feature in Subversion.

Well, we'd certainly listen to your experiences with the svnkit:charset
property.  In other words, custom properties can get promoted to "svn:"
properties (and their names changed accordingly) as we learn how they're
useful.

I don't remember the thread from a year ago, but I think it's not true
that no one's interested.  It's just that most needs are being met via
svn:mime-type (although there are some problems with doing it that way).

Re: svn:charset

Posted by Alexander Kitaev <Al...@svnkit.com>.
Hello,

 > I don't think that the Subversion project needs to negotiate with
 > hostage-takers.
SVNKit uses svn:charset property, but I'd like to clarify that it has 
not been "used for years" and only available in the latest beta version 
of SVNKit.

We'll change its namespace to "svnkit", so that it would be 
svnkit:charset and svn:charset will never get into release version of 
SVNKit.

I also would like to say that I see certain positive side in such sort 
of a unintentional "hostage-taking" - about a year ago, when I asked 
whether there are any plans to provide charset conversion support in 
Subversion either using new svn:charset or existing svn:mime-type 
property - I was said that no one interested in that and most probably 
there will be no such feature in Subversion.

Alexander Kitaev,
TMate Software,
http://svnkit.com/ - Java [Sub]Versioning Library!

David Glasser wrote:
> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
> 
>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>   path of least resistance (though I understand your annoyance at the
>>   encroachment on what I understand is a reserved namespace)
> 
> I don't think that the Subversion project needs to negotiate with
> hostage-takers.
> 
> --dave
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-13 07:38:45 -0700, Kevin Grover wrote:
> Sorry. I must have been too tired when I was reading through the
> messages.

In fact, I mentioned "charmap" because this is the standard term in
POSIX, and the standard way to get the charset/charmap/encoding from
a POSIX shell is "locale charmap".

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Sun, Jul 13, 2008 at 1:39 AM, Dag-Erling Smørgrav <de...@des.no> wrote:

> "Kevin Grover" <ke...@kevingrover.net> writes:
> > From what little I found while looking around, charmap is taken to
> > refer to the Microsoft utility for picking characters (that's what
> > Wikipedia brings up, and just that).  It seems that 'encoding'
> > (meaning character encoding) or 'charset' would indeed be the most
> > commonly used terms.
>
> I'm not sure why you dragged charmap into this - it was never an
> option.
>
> Character sets and encodings are not the same thing, but "charset" has
> historically been used to mean a particular encoding of a particular
> character set.
>

Sorry.  I must have been too tired when I was reading through the messages.

- Kevin

Re: svn:charset

Posted by "C. Michael Pilato" <cm...@collab.net>.
Kevin Grover wrote:
> I don't really know enough to have an option on what the correct
> approach is, but I do agree that more thinking needs to occur.
> 
> This svn:mime-type discussion seems similar to (or at least related
> to) the svn:mime-type discussion that went on not too long ago about
> identifying binary files (and a possible property to help _really_
> know if the file is textual or binary regardless of the intended use).
>  I ranted on that about re-purposing svn:mime-type for unintended
> uses.
> 
> As in that case, it seems to me that it's cleaner to have the actual
> use cases separated out (charset, binary, base-mime-type, whatever)
> and then use whatever info is needed to construct the Content-Type on
> the fly as needed).

+1.

> And, whichever why this leads, I think the autoprops could definitely
> use some revamping.  Is that something that could happen with 1.6?  By
> this I mean, if I (or someone) beat on it and found something people
> like, would it be a candidate for inclusion in 1.6 or 1.7, or would it
> have to wait for a 2.x release.

That depends almost entirely on the end result and, specifically, how 
gracefully it allows previous incarnations of the feature to continue to work.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Tue, Jul 8, 2008 at 4:15 PM, C. Michael Pilato <cm...@collab.net> wrote:
> David Glasser wrote:
>>
>> On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
>>>
>>> "David Glasser" <gl...@davidglasser.net> writes:
>>>>
>>>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>>>
>>>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>>>  path of least resistance (though I understand your annoyance at the
>>>>>  encroachment on what I understand is a reserved namespace)
>>>>
>>>> I don't think that the Subversion project needs to negotiate with
>>>> hostage-takers.
>>>
>>> Heh :-).
>>>
>>> But, as much as I enjoyed David's comment, the OP's sentiment is right:
>>> however we got here, here is where we are.  (Besides, he wasn't
>>> proposing negotiation.)
>>
>> In all seriousness, though, I highly suspect that any group of
>> projects that are so lax that they don't even bother to understand
>> that svn:* are the only properties they *aren't* allowed to use would
>> also have failed to resolve all the issues discussed in this thread.
>> In that case, using svn:charset for a potential new property would be
>> the absolute *worst* thing to do, since it would conflict with
>> poorly-specified prior use.
>>
>> Anyway, if the main problem with the svn:mime-type solution is that
>> it's annoying to parse (which is true), then why don't we just add an
>> svn API that extracts the charset= from a given mime-type string?  (I
>> guess there's still the autoprops issue, but perhaps there is a more
>> general fix to autoprops that could deal with this.)
>
> So, I don't think this has been noted so far in this thread, but we've been
> here before:
>
>   http://svn.haxx.se/dev/archive-2002-08/0674.shtml
>   http://svn.haxx.se/dev/archive-2006-03/1182.shtml
>   (more?)
>
> I'm personally in favor of using a distinct property for this stuff, and am
> not terribly concerned about the sorts of compatibility issues that doing so
> would cause for folks (including us) who today use svn:mime-type -- which
> was intended to be a MIME media type and subtype, not a full-blown
> Content-type HTTP header value -- as the latter.
>
> That said, I don't think "character set" is the right nomenclature. Wouldn't
> svn:encoding be the more accurate description?  (And let's just ignore for a
> minute the "other" interpretation of "encoding" proposed in
> http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)
>
> --
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
>

I don't really know enough to have an option on what the correct
approach is, but I do agree that more thinking needs to occur.

This svn:mime-type discussion seems similar to (or at least related
to) the svn:mime-type discussion that went on not too long ago about
identifying binary files (and a possible property to help _really_
know if the file is textual or binary regardless of the intended use).
 I ranted on that about re-purposing svn:mime-type for unintended
uses.

As in that case, it seems to me that it's cleaner to have the actual
use cases separated out (charset, binary, base-mime-type, whatever)
and then use whatever info is needed to construct the Content-Type on
the fly as needed).

And, whichever why this leads, I think the autoprops could definitely
use some revamping.  Is that something that could happen with 1.6?  By
this I mean, if I (or someone) beat on it and found something people
like, would it be a candidate for inclusion in 1.6 or 1.7, or would it
have to wait for a 2.x release.

- Kevin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2008-07-09 00:15:59 +0100, C. Michael Pilato wrote:
> That said, I don't think "character set" is the right nomenclature.  
> Wouldn't svn:encoding be the more accurate description?  (And let's just  
> ignore for a minute the "other" interpretation of "encoding" proposed in  
> http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)

"encoding" can mean too many things, and I think svn:encoding would be
a bad idea concerning the consistency with other standards (precisely
because of this other interpretation). I'd say that svn:charmap would
be a more accurate description.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
"Kevin Grover" <ke...@kevingrover.net> writes:
> From what little I found while looking around, charmap is taken to
> refer to the Microsoft utility for picking characters (that's what
> Wikipedia brings up, and just that).  It seems that 'encoding'
> (meaning character encoding) or 'charset' would indeed be the most
> commonly used terms.

I'm not sure why you dragged charmap into this - it was never an
option.

Character sets and encodings are not the same thing, but "charset" has
historically been used to mean a particular encoding of a particular
character set.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Kevin Grover <ke...@kevingrover.net>.
On Wed, Jul 9, 2008 at 1:37 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
> "C. Michael Pilato" <cm...@collab.net> writes:
>> That said, I don't think "character set" is the right
>> nomenclature. Wouldn't svn:encoding be the more accurate description?
>
> I've covered this already.  Neither "charset" nor "encoding" is entirely
> correct, but MIME uses "charset" is this sense, while HTTP uses
> "encoding" to mean something completely different, so "charset" is the
> closest you'll get without inventing a new term.
>
> DES
> --
> Dag-Erling Smørgrav - des@des.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

Re: svn:charset

Posted by Dag-Erling Smørgrav <de...@des.no>.
"C. Michael Pilato" <cm...@collab.net> writes:
> That said, I don't think "character set" is the right
> nomenclature. Wouldn't svn:encoding be the more accurate description?

I've covered this already.  Neither "charset" nor "encoding" is entirely
correct, but MIME uses "charset" is this sense, while HTTP uses
"encoding" to mean something completely different, so "charset" is the
closest you'll get without inventing a new term.

DES
-- 
Dag-Erling Smørgrav - des@des.no

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by "C. Michael Pilato" <cm...@collab.net>.
David Glasser wrote:
> On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
>> "David Glasser" <gl...@davidglasser.net> writes:
>>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>>
>>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>>   path of least resistance (though I understand your annoyance at the
>>>>   encroachment on what I understand is a reserved namespace)
>>> I don't think that the Subversion project needs to negotiate with
>>> hostage-takers.
>> Heh :-).
>>
>> But, as much as I enjoyed David's comment, the OP's sentiment is right:
>> however we got here, here is where we are.  (Besides, he wasn't
>> proposing negotiation.)
> 
> In all seriousness, though, I highly suspect that any group of
> projects that are so lax that they don't even bother to understand
> that svn:* are the only properties they *aren't* allowed to use would
> also have failed to resolve all the issues discussed in this thread.
> In that case, using svn:charset for a potential new property would be
> the absolute *worst* thing to do, since it would conflict with
> poorly-specified prior use.
> 
> Anyway, if the main problem with the svn:mime-type solution is that
> it's annoying to parse (which is true), then why don't we just add an
> svn API that extracts the charset= from a given mime-type string?  (I
> guess there's still the autoprops issue, but perhaps there is a more
> general fix to autoprops that could deal with this.)

So, I don't think this has been noted so far in this thread, but we've been 
here before:

    http://svn.haxx.se/dev/archive-2002-08/0674.shtml
    http://svn.haxx.se/dev/archive-2006-03/1182.shtml
    (more?)

I'm personally in favor of using a distinct property for this stuff, and am 
not terribly concerned about the sorts of compatibility issues that doing so 
would cause for folks (including us) who today use svn:mime-type -- which 
was intended to be a MIME media type and subtype, not a full-blown 
Content-type HTTP header value -- as the latter.

That said, I don't think "character set" is the right nomenclature. 
Wouldn't svn:encoding be the more accurate description?  (And let's just 
ignore for a minute the "other" interpretation of "encoding" proposed in 
http://svn.haxx.se/users/archive-2005-08/0139.shtml, shall we?)

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn:charset

Posted by David Glasser <gl...@davidglasser.net>.
On Tue, Jul 8, 2008 at 8:58 AM, Karl Fogel <kf...@red-bean.com> wrote:
> "David Glasser" <gl...@davidglasser.net> writes:
>> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>>
>>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>>   path of least resistance (though I understand your annoyance at the
>>>   encroachment on what I understand is a reserved namespace)
>>
>> I don't think that the Subversion project needs to negotiate with
>> hostage-takers.
>
> Heh :-).
>
> But, as much as I enjoyed David's comment, the OP's sentiment is right:
> however we got here, here is where we are.  (Besides, he wasn't
> proposing negotiation.)

In all seriousness, though, I highly suspect that any group of
projects that are so lax that they don't even bother to understand
that svn:* are the only properties they *aren't* allowed to use would
also have failed to resolve all the issues discussed in this thread.
In that case, using svn:charset for a potential new property would be
the absolute *worst* thing to do, since it would conflict with
poorly-specified prior use.

Anyway, if the main problem with the svn:mime-type solution is that
it's annoying to parse (which is true), then why don't we just add an
svn API that extracts the charset= from a given mime-type string?  (I
guess there's still the autoprops issue, but perhaps there is a more
general fix to autoprops that could deal with this.)

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Karl Fogel <kf...@red-bean.com>.
"David Glasser" <gl...@davidglasser.net> writes:
> On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:
>
>>  - Like it or not, svn:charset is already in use; formalizing it is the
>>   path of least resistance (though I understand your annoyance at the
>>   encroachment on what I understand is a reserved namespace)
>
> I don't think that the Subversion project needs to negotiate with
> hostage-takers.

Heh :-).

But, as much as I enjoyed David's comment, the OP's sentiment is right:
however we got here, here is where we are.  (Besides, he wasn't
proposing negotiation.)

The right thing to do from this point forward is "whatever's best for
users".  That implies identifying the advantages/disadvantages of
svn:charset vs the svn:mime-type way.  If we determine there the balance
of advantages makes svn:charset desirable, then we need to specify
precisely what effects svn:charset will have; how svn:charset will
interact with legacy charset data appended to svn:mime-type; and
research all current users of svn:charset to determine whether they're
using the property consistently themselves (and if not, figure out what
to do about it).

I've re-read through this thread, and I don't think the above has really
been done yet.  (The only mail that even comes close is
http://svn.haxx.se/dev/archive-2008-06/0948.shtml.)  I don't personally
have the time to do this research; it's a non-trivial task, though
certainly feasible.

I understand that this is a high hurdle, and that I'm sort of putting
"stop energy" on the momentum here.  That's not really my goal.  I just
think Subversion shouldn't adopt a new svn: property without having a
really good reason.  We've done maybe 50% of the thinking we need to do
for this particular one.  I'm merely proposing we not implement it until
we've done that other 50% (which means someone has to supply it).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by David Glasser <gl...@davidglasser.net>.
On Fri, Jul 4, 2008 at 1:30 AM, Dag-Erling Smørgrav <de...@des.no> wrote:

>  - Like it or not, svn:charset is already in use; formalizing it is the
>   path of least resistance (though I understand your annoyance at the
>   encroachment on what I understand is a reserved namespace)

I don't think that the Subversion project needs to negotiate with
hostage-takers.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn:charset

Posted by Alan Barrett <ap...@cequrux.com>.
On Fri, 04 Jul 2008, Dag-Erling Smørgrav wrote:
> Sorry about the late reply - I'm used to being Cc:ed on replies to
> mailing list threads in which I participate, so I didn't notice your
> followup right away.

OK, I'll CC you.  My preference is that I not be CC'd on replies to mailing
lists.

> Alan Barrett <ap...@cequrux.com> writes:
> > I really don't like the idea of having two conflicting ways of
> > specifying the same information.  One way (in svn:mine-type) should be
> > enough.
> 
>  - svn:charset is easier to handle for applications that need just the
>    charset, not the media type (incorrectly referred to as mime type)

Yes, it would be easier, if this were being designed from scratch.  But
it's not being designed from scratch.

You say that writing an application to parse a new svn:charset property
is easier than writing an application to extract charset information
from svn:mime-type, and I agree, but I claim that that's not a fair
comparision.  A fair comparison would be between the status quo where
charset information can appear only in svn:mime-type, and a future in
which it's unknown whether charset information apears in svn:mime-type
or in svn:charset or both, so a future application would have to parse
them both, check for consistency, and raise an error if they are
inconsistent.

>  - The full MIME Content-Type parameter syntax is unwieldy.  It allows a
>    variety of quoting and comment styles. [...]
>    How far do you want to go to support the complete syntax?

That's up to the appliction author.

I don't want to get into a long debate about this, and I am not a svn
developer, but I did want to correct what I saw as a misunderstanding of
my position.

--apb (Alan Barrett)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org