You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Peter N. Lundblad" <pe...@famlundblad.se> on 2005/01/05 15:39:32 UTC
Re: [Issue 2194] Unicde UTF-16 files detected as binary
On Wed, 5 Jan 2005 maxb@tigris.org wrote:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2194
>
>
>
> User maxb changed the following:
>
> What |Old value |New value
> ================================================================================
> Status|NEW |RESOLVED
> --------------------------------------------------------------------------------
> Resolution| |INVALID
> --------------------------------------------------------------------------------
>
>
>
>
> ------- Additional comments from maxb@tigris.org Wed Jan 5 06:48:02 -0800 2005 -------
> There's some huge red text on the issue tracker front page.
> Please read it.
> Thanks.
>
But don't you aggree this would be a good enhancement, i.e. better support
for other Unicode encodings than UTF8?
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:
>On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
>
>
>
>>Peter N. Lundblad wrote:
>>
>>
>>
>>>On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
>>>
>>>
>>>
>>>
>>>
>>>>It is much more complicated than that. If we're to treat UTF-16 files as
>>>>text, we have to teach libsvn_diff to do diffs and merges correctly on
>>>>such files, and possibly enhance keyword expansion and newline
>>>>conversion, too.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Or convert to/from UTF8 as we do with other encodings.
>>>
>>>
>>>
>>>
>>We don't convert file /contents/ between encodings, we don't even know
>>which encoding they're in.
>>
>>
>>
>We don't know that *currently*.
>
Exactly. Bravo. And that change is where the can of worms is hidden,
because it's not only about writing, but also about parsing the files.
> That could change, however. Right now, we
>output UTF8 (I think, or is it native). Still, we just insert stuff in
>files without knowing the encoding.
>
Yes, we do broken things like that.
> So, I think we will want to add an
>encoding property (or support it in the svn:mime-type) someday.
>
>
Yup. We almost support it already, in the sense that we don't die it
it's there.
>Note that I don't say it is trivial, but it should be doable.
>
>
Neither did I say it wasn't doable, just that it frobs 90% of the
client-side code. But I admit this estimate might be a bit pessimistic;
it's probably closer to 85%. :-p
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
> Peter N. Lundblad wrote:
>
> >On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
> >
> >
> >
> >>It is much more complicated than that. If we're to treat UTF-16 files as
> >>text, we have to teach libsvn_diff to do diffs and merges correctly on
> >>such files, and possibly enhance keyword expansion and newline
> >>conversion, too.
> >>
> >>
> >>
> >Or convert to/from UTF8 as we do with other encodings.
> >
> >
> We don't convert file /contents/ between encodings, we don't even know
> which encoding they're in.
>
We don't know that *currently*. That could change, however. Right now, we
output UTF8 (I think, or is it native). Still, we just insert stuff in
files without knowing the encoding. So, I think we will want to add an
encoding property (or support it in the svn:mime-type) someday.
Note that I don't say it is trivial, but it should be doable.
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:
>On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
>
>
>
>>Peter N. Lundblad wrote:
>>
>>
>>
>>>Yes, it is more complicated than that, since it is an enconding where a
>>>line break is not one or two bytes, and for some other reasons. Still, I
>>>think we really need to support other Unicode encodings thatn UTF8, like
>>>we support other 8-bit encodings.
>>>
>>>
>>>
>>>
>>It is much more complicated than that. If we're to treat UTF-16 files as
>>text, we have to teach libsvn_diff to do diffs and merges correctly on
>>such files, and possibly enhance keyword expansion and newline
>>conversion, too.
>>
>>
>>
>Or convert to/from UTF8 as we do with other encodings.
>
>
We don't convert file /contents/ between encodings, we don't even know
which encoding they're in.
>>In short, it's a whole can of worms that probably affects 90% of the
>>client-side code.
>>
>>
>I can't belive that.
>
>
You don't have to take my word for it, the code is right there.
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
> Peter N. Lundblad wrote:
>
> >Yes, it is more complicated than that, since it is an enconding where a
> >line break is not one or two bytes, and for some other reasons. Still, I
> >think we really need to support other Unicode encodings thatn UTF8, like
> >we support other 8-bit encodings.
> >
> >
> It is much more complicated than that. If we're to treat UTF-16 files as
> text, we have to teach libsvn_diff to do diffs and merges correctly on
> such files, and possibly enhance keyword expansion and newline
> conversion, too.
>
Or convert to/from UTF8 as we do with other encodings.
> In short, it's a whole can of worms that probably affects 90% of the
> client-side code.
>
I can't belive that.
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Branko Čibej <br...@xbc.nu>.
Barry Scott wrote:
>
> On Jan 6, 2005, at 01:41, Branko Čibej wrote:
>
>> Peter N. Lundblad wrote:
>>
>>> On Wed, 5 Jan 2005, Max Bowsher wrote:
>>>
>>>
>>>> Peter N. Lundblad wrote:
>>>> I agree with what you are saying, but what 2194 was saying was "UTF-16
>>>> should be detected as textual".
>>>>
>>>>
>>> Yes, it is more complicated than that, since it is an enconding where a
>>> line break is not one or two bytes, and for some other reasons.
>>> Still, I
>>> think we really need to support other Unicode encodings thatn UTF8,
>>> like
>>> we support other 8-bit encodings.
>>>
>> It is much more complicated than that. If we're to treat UTF-16 files
>> as text, we have to teach libsvn_diff to do diffs and merges
>> correctly on such files, and possibly enhance keyword expansion and
>> newline conversion, too.
>>
>> In short, it's a whole can of worms that probably affects 90% of the
>> client-side code.
>
>
> When the rewrite of the client eventually happens design wide char
> support in on day 1 then.
This won't help in general. You can only guarantee identical conversions
between the various Unicode encodings, but if the file is in some other
encoding, there's not always a valid way to convert the contents to
Unicode, operate on that, and convert back without changing some of the
original characters that shouldn't have changed. For example, the
various ISO-2022 encodings are notorious for not behaving nicely in this
context, and for that matter so is UTF-7.
The only universally correct way is to find the replaceable strings
*without* converting the file contents, then only convert the
replacements once from Unicode to the file's encoding.
> I do not expect a quick fix, but this issue should be nagging at svn
> devos.
Not to worry, it's in the issue tracker. :-)
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Barry Scott <ba...@barrys-emacs.org>.
On Jan 6, 2005, at 01:41, Branko Čibej wrote:
> Peter N. Lundblad wrote:
>
>> On Wed, 5 Jan 2005, Max Bowsher wrote:
>>
>>
>>> Peter N. Lundblad wrote:
>>> I agree with what you are saying, but what 2194 was saying was
>>> "UTF-16
>>> should be detected as textual".
>>>
>>>
>> Yes, it is more complicated than that, since it is an enconding where
>> a
>> line break is not one or two bytes, and for some other reasons.
>> Still, I
>> think we really need to support other Unicode encodings thatn UTF8,
>> like
>> we support other 8-bit encodings.
>>
> It is much more complicated than that. If we're to treat UTF-16 files
> as text, we have to teach libsvn_diff to do diffs and merges correctly
> on such files, and possibly enhance keyword expansion and newline
> conversion, too.
>
> In short, it's a whole can of worms that probably affects 90% of the
> client-side code.
When the rewrite of the client eventually happens design wide char
support in on day 1 then.
I do not expect a quick fix, but this issue should be nagging at svn
devos.
Barry
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:
>On Wed, 5 Jan 2005, Max Bowsher wrote:
>
>
>
>>Peter N. Lundblad wrote:
>>I agree with what you are saying, but what 2194 was saying was "UTF-16
>>should be detected as textual".
>>
>>
>>
>Yes, it is more complicated than that, since it is an enconding where a
>line break is not one or two bytes, and for some other reasons. Still, I
>think we really need to support other Unicode encodings thatn UTF8, like
>we support other 8-bit encodings.
>
>
It is much more complicated than that. If we're to treat UTF-16 files as
text, we have to teach libsvn_diff to do diffs and merges correctly on
such files, and possibly enhance keyword expansion and newline
conversion, too.
In short, it's a whole can of worms that probably affects 90% of the
client-side code.
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Wed, 5 Jan 2005, Max Bowsher wrote:
> Peter N. Lundblad wrote:
> I agree with what you are saying, but what 2194 was saying was "UTF-16
> should be detected as textual".
>
Yes, it is more complicated than that, since it is an enconding where a
line break is not one or two bytes, and for some other reasons. Still, I
think we really need to support other Unicode encodings thatn UTF8, like
we support other 8-bit encodings.
> IMO, given the current level of software support for UTF-16, it is more
> binary than text.
>
I don't agree. Will make a comment on this on users@ to give Berry some
support:-)
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [Issue 2194] Unicde UTF-16 files detected as binary
Posted by Max Bowsher <ma...@ukf.net>.
Peter N. Lundblad wrote:
> On Wed, 5 Jan 2005 maxb@tigris.org wrote:
>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2194
>>
>>
>>
>> User maxb changed the following:
>>
>> What |Old value |New value
>> ================================================================================
>> Status|NEW |RESOLVED
>> --------------------------------------------------------------------------------
>> Resolution| |INVALID
>> --------------------------------------------------------------------------------
>>
>>
>>
>>
>> ------- Additional comments from maxb@tigris.org Wed Jan 5
>> 06:48:02 -0800
>> 2005 ------- There's some huge red text on the issue tracker front page.
>> Please read it.
>> Thanks.
>>
> But don't you aggree this would be a good enhancement, i.e. better support
> for other Unicode encodings than UTF8?
I agree with what you are saying, but what 2194 was saying was "UTF-16
should be detected as textual".
IMO, given the current level of software support for UTF-16, it is more
binary than text.
Ma.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org