You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Wang Jian <la...@linux.net.cn> on 2005/02/05 21:03:24 UTC

svn diff character set handling problem

Hi,

I am using svn 1.1.3 under LC_* = zh_CN, and

[lark@home net]$ svn diff baserequestgenerator.class.php 
Index: baserequestgenerator.class.php
===================================================================
--- baserequestgenerator.class.php      (修订版 956)�?
++ baserequestgenerator.class.php      (工作拷贝)
@@ -494,7 +494,8 @@


[lark@home net]$ svn diff --diff-cmd=/usr/bin/diff --extensions "-u" baserequestgenerator.class.php 
Index: baserequestgenerator.class.php
===================================================================
--- baserequestgenerator.class.php      ���޶��� 956��
+++ baserequestgenerator.class.php      ������������
@@ -494,7 +494,8 @@


The difference is in the label. /usr/bin/diff output is correct, while
svn builtin diff is not.

I haven't digged in bug database, so this bug may be already known.

The version information

[lark@home net]$ LC_ALL=C svn --version
svn, version 1.1.3 (r12730)
   compiled Jan 21 2005, 17:40:13

Copyright (C) 2000-2004 CollabNet.
Subversion is open source software, see http://subversion.tigris.org/
This product includes software developed by CollabNet (http://www.Collab.Net/).

The following repository access (RA) modules are available:

* ra_dav : Module for accessing a repository via WebDAV (DeltaV) protocol.
  - handles 'http' schema
  - handles 'https' schema
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' schema
* ra_svn : Module for accessing a repository using the svn network protocol.
  - handles 'svn' schema




-- 
  lark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re[4]: svn diff character set handling problem

Posted by Wang Jian <la...@linux.net.cn>.
Hi kfogel,


On 05 Feb 2005 23:17:11 -0600, kfogel@collab.net wrote:

> Wang Jian <la...@linux.net.cn> writes:
> > Hi kfogel,
> > 
> 
> > Here is my test
> > 
> > $ cat ./mydiff
> > #!/bin/sh
> > 
> > echo "$@"
> > $ svn diff --diff-cmd=./mydiff baserequestgenerator.class.php 
> > Index: baserequestgenerator.class.php
> > ===================================================================
> > -u -L baserequestgenerator.class.php    (修订版 956) -L baserequestgenerator.class.php        (工作拷贝) .svn/text-base/baserequestgenerator.class.php.svn-base baserequestgenerator.class.php
> > 
> > 
> > If I deliberately set to another locale
> > 
> > $ LC_ALL=zh_TW svn diff --diff-cmd=./mydiff baserequestgenerator.class.php 
> > Index: baserequestgenerator.class.php
> > ===================================================================
> > svn: Can't recode string
> 
> Hmmm. First, I don't understand why your script works as a diff-cmd.
> All it does is echo its arguments, right?  It never actually runs a
> diff program.  So how is it producing the output shown above?

The first one of above two tests shows correct Simplified Chinese
characters. So the calling external diff command code path looks correct,
the conversion is made.

For the builtin diff code path, like my former mail refers to, doesn't
seem to handle encoding conversion. I haven't looked at the code yet.
The Spring Festival is coming here, so I have no much time.


> 
> In any case, I can see that the first output looks correctly encoded
> ("revision 956" versus "working copy"), and the second output shows a
> failure to encode -- which means encoding was at least attempted.  I
> don't know all the reasons encoding might fail (note that r12920 made
> that error much more informative, you might want to try with the very
> latest trunk svn), but anyway it seems like this error is completely
> different from the error you were reporting in your original mail.
> There, the problem was that the diff labels were simply not in the
> right encoding for your console (that is, it was not clear whether
> encoding had been attempted or not).
>

I did several tests later and find the above second test's error is
bogus. The rpm system didn't install zh_TW version of subversion.mo. So
it fallbacks to zh and thus use zh_CN version of subversion.mo instead.

So please forget the second test :)

> So I'm not sure how many different bugs are being reported now.
> Perhaps you could clarify, by stating exactly what output you
> expected, versus the output you received?  That would help us sort out
> the exact bug or bugs present here.  If it's something beyond what we
> already have in issue #1533, then that would be useful to know.
> 

Here is:

When zh_CN locale is used, it is expected 'svn diff' (builtin diff)
outputs in locale encoding (gb2312/gbk/gb18030, the latter is superset
of the former), however it still outputs in UTF-8.

The zh_CN version of subversion.mo is in UTF-8.


-- 
  lark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Re[2]: svn diff character set handling problem

Posted by kf...@collab.net.
Wang Jian <la...@linux.net.cn> writes:
> Hi kfogel,
> 
> It looks like issue #1977 is the cause. In comment of issue #1533, Max
> Bowsher refered to issue #1977.
> 
> It seems that you fixed this issue in r12687 but actually no :) There
> should be another code path that you didn't cover.

(Please keep the mailing list CC'd.)

I think you mean issue #1997, not #1977.  However, look at what Max
Bowsher said in issue #1533, about #1997:

   "Issue 1997 was a mixed bag of encoding problems - transcribing the
    diff parts here: (I will comment in issue 1997 that the diff parts
    are now tracked elsewhere)"

So, we moved the diff part of #1997 over to #1533.  After that, #1997
was only about the 'svnadmin dump' problem, which was fixed in r12687.

In other words, your original problem is indeed issue #1533, not #1997 :-).

> Here is my test
> 
> $ cat ./mydiff
> #!/bin/sh
> 
> echo "$@"
> $ svn diff --diff-cmd=./mydiff baserequestgenerator.class.php 
> Index: baserequestgenerator.class.php
> ===================================================================
> -u -L baserequestgenerator.class.php    (修订版 956) -L baserequestgenerator.class.php        (工作拷贝) .svn/text-base/baserequestgenerator.class.php.svn-base baserequestgenerator.class.php
> 
> 
> If I deliberately set to another locale
> 
> $ LC_ALL=zh_TW svn diff --diff-cmd=./mydiff baserequestgenerator.class.php 
> Index: baserequestgenerator.class.php
> ===================================================================
> svn: Can't recode string

Hmmm. First, I don't understand why your script works as a diff-cmd.
All it does is echo its arguments, right?  It never actually runs a
diff program.  So how is it producing the output shown above?

In any case, I can see that the first output looks correctly encoded
("revision 956" versus "working copy"), and the second output shows a
failure to encode -- which means encoding was at least attempted.  I
don't know all the reasons encoding might fail (note that r12920 made
that error much more informative, you might want to try with the very
latest trunk svn), but anyway it seems like this error is completely
different from the error you were reporting in your original mail.
There, the problem was that the diff labels were simply not in the
right encoding for your console (that is, it was not clear whether
encoding had been attempted or not).

So I'm not sure how many different bugs are being reported now.
Perhaps you could clarify, by stating exactly what output you
expected, versus the output you received?  That would help us sort out
the exact bug or bugs present here.  If it's something beyond what we
already have in issue #1533, then that would be useful to know.

Thanks,
-Karl

> On 05 Feb 2005 16:58:02 -0600, kfogel@collab.net wrote:
> 
> > > --- baserequestgenerator.class.php      (修订版 956)
> > > +++ baserequestgenerator.class.php      (工作拷贝)
> > > @@ -494,7 +494,8 @@
> > > 
> > > 
> > > The difference is in the label. /usr/bin/diff output is correct, while
> > > svn builtin diff is not.
> > >
> > > I haven't digged in bug database, so this bug may be already known.
> > 
> > The closest we have is
> > 
> >    http://subversion.tigris.org/issues/show_bug.cgi?id=1533
> > 
> > ... but I'm not sure (I only had a moment to glance it it right now)
> > if that is your bug.  I think not, but am not positive about that.
> > 
> > Is Subversion's output correctly encoded in all other circumstances
> > for you, except diff headers?
> > 
> > -Karl
> > 
> > > The version information
> > > 
> > > [lark@home net]$ LC_ALL=C svn --version
> > > svn, version 1.1.3 (r12730)
> > >    compiled Jan 21 2005, 17:40:13
> > > 
> > > Copyright (C) 2000-2004 CollabNet.
> > > Subversion is open source software, see http://subversion.tigris.org/
> > > This product includes software developed by CollabNet (http://www.Collab.Net/).
> > > 
> > > The following repository access (RA) modules are available:
> > > 
> > > * ra_dav : Module for accessing a repository via WebDAV (DeltaV) protocol.
> > >   - handles 'http' schema
> > >   - handles 'https' schema
> > > * ra_local : Module for accessing a repository on local disk.
> > >   - handles 'file' schema
> > > * ra_svn : Module for accessing a repository using the svn network protocol.
> > >   - handles 'svn' schema
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > >   lark
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > > For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
> 
> -- 
>   lark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn diff character set handling problem

Posted by kf...@collab.net.
Wang Jian <la...@linux.net.cn> writes:
> I am using svn 1.1.3 under LC_* = zh_CN, and
> 
> [lark@home net]$ svn diff baserequestgenerator.class.php 
> Index: baserequestgenerator.class.php
> ===================================================================
> --- baserequestgenerator.class.php      锛