You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nathan Hartman <ha...@gmail.com> on 2019/11/11 15:38:31 UTC

Issue tracker cleanup: SVN-807

For this week's issue, SVN-807 "gracefully degrade from failed charset
conversion":

This issue was created in 2002; I don't see any newer comments (the
only recent activity appears to be a property change made last year).

From the issue:

> Right now, if a log message contains characters that cannot be
> represented in the client's locale, that log message will simply show
> up as:
>
>    "[unconvertible log msg]"
>
> Graceful degradation would be nice here :-).

Questions:

Is this still the case?

If we're not sure, how can this be tested? (i.e., how to create and
commit a log message that will cause this to manifest?)

Speaking of testing, is this related to last week's issue, SVN-2079,
which is unresolved, "SVN-2079 "utf8_tests.py should be made non-
iso8859-1 specific" and if one is solved, would that (help) solve the
other?

Thanks,
Nathan

Re: Issue tracker cleanup: SVN-807

Posted by Branko Čibej <br...@apache.org>.
On 12.11.2019 16:13, Nathan Hartman wrote:
> I'm guessing
> there was a desire to do more than print the offending hex codes but
> I don't know what else you could do.

If it's invalid UTF-8, there's nothing you can do.

There's a lot you can do if the UTF-8 is valid, but the target locale
can't represent the characters. Transliteration is a thing, but if we go
down that path, it's hard to decide when to stop. Subversion is not a
linguistics tool, it's a version control tool.

-- Brane

P.S.: I think there's even code that removes diacritical marks from the
source Unicode to make conversion to the target locale easier, at least
for Latin-based locales. That's already a kind of transliteration, and
even that may be way too much.

Fwd: Issue tracker cleanup: SVN-807

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 12, 2019 at 1:08 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Nathan Hartman wrote on Tue, 12 Nov 2019 15:13 +00:00:
> > Daniel, thanks for testing this and documenting how. Please, could
> > you add that as a comment in the issue tracker? Or, if you'd like,
> > I'll be happy to do that and attribute it to you.
>
> Go ahead.
>

Done. SVN-807 is closed.

Re: Issue tracker cleanup: SVN-807

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Tue, 12 Nov 2019 15:13 +00:00:
> Daniel, thanks for testing this and documenting how. Please, could
> you add that as a comment in the issue tracker? Or, if you'd like,
> I'll be happy to do that and attribute it to you.

Go ahead.

> I agree that this issue should be closed.
> 
> From my reading, it looks like it was not closed as a reminder to move
> this to APR. (Though that might make sense from a refactoring
> standpoint, I think it would cause dependency headaches.)

It's not that bad; we've done it before.  We create an svn__foo()
function, implement it, ask APR to add it too, then we use #if
APR_VERSION_AT_LEAST() to have svn__foo() use the APR implementation in
preference to ours, and remove the conditional and our implementation
when we bump the minimum supported APR version.

Re: Issue tracker cleanup: SVN-807

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 12, 2019 at 4:07 AM Branko Čibej <br...@apache.org> wrote:

> On 11.11.2019 17:30, Daniel Shahaf wrote:
>
(snip test procedure)

> > So I think we can close it as "Fixed at some point"?
>
> Ah, yes, I added some code to at least print something readable (or
> let's say "analyzable") in such cases when I added utf8proc to our code.
> I can't think of anything better to do, so we can close that issue as
> far as I'm concerned.


Brane, thanks for your input.

Daniel, thanks for testing this and documenting how. Please, could
you add that as a comment in the issue tracker? Or, if you'd like,
I'll be happy to do that and attribute it to you.

I agree that this issue should be closed.

From my reading, it looks like it was not closed as a reminder to move
this to APR. (Though that might make sense from a refactoring
standpoint, I think it would cause dependency headaches.) Also there
was a patch from Ulrich Drepper but it looks incomplete. I'm guessing
there was a desire to do more than print the offending hex codes but
I don't know what else you could do.

If there are no objections, I'll go ahead and close SVN-807.

As a side note, I feel dumb because I went spelunking all over the
code and found r842879...

[[[

r842879 | kfogel | 2002-07-30 18:33:13 -0400 (Tue, 30 Jul 2002) | 12
lines

Start on issue #807.  Thanks to Justin Erenkrantz for his initial
patch to check_non_ascii().

* subversion/libsvn_subr/utf.c
  (check_non_ascii): Return the more appropriate APR_EINVAL.

* subversion/include/svn_utf.h, subversion/libsvn_subr/utf.c
  (svn_utf_cstring_from_utf8_fuzzy): New function.

* subversion/clients/cmdline/log-cmd.c
  (log_message_receiver): Degrade gracefully yet insistently.

]]]

and only AFTER that, I noticed that it was written in the issue
tracker all along:

[[[

Karl Fogel added a comment - 30/Jul/02 22:46
Okay, the worst has been resolved by revision 2805.

We still need to move the code over to APR, if appropriate.  And apply
Brane and Ulrich's patches, but that may be a separate issue, will
mull on that.

]]]

r2805 + 840074 = r842879... Which, by the way, is before 1.0.

The only other thing I found that deals with bad UTF-8 in log messages
is the logic that prevents bad UTF-8 and mixed line endings from
getting into the log and the automated test for it.

Nathan

Re: Issue tracker cleanup: SVN-807

Posted by Branko Čibej <br...@apache.org>.
On 11.11.2019 17:30, Daniel Shahaf wrote:
> Nathan Hartman wrote on Mon, Nov 11, 2019 at 10:38:31 -0500:
>> If we're not sure, how can this be tested? (i.e., how to create and
>> commit a log message that will cause this to manifest?)
> Well, I'm sure there are better ways, but I just did this:
> .
>     % svnadmin create r
>     % vim -b r/db/revprops/0/0
> .
> and manually added an svn:log property with a value that's invalid UTF-8 [svn:*
> properties must use UTF-8 with LF line endings]:
> .
>     % xxd r/db/revprops/0/0 | vipe
>     00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032  K 8.svn:date.V 2
>     00000010: 370a 3230 3139 2d31 312d 3131 5431 363a  7.2019-11-11T16:
>     00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20  08:01.344744Z.K 
>     00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff  7.svn:log.V 3...
>                                                  ^^^^                  
>     00000040: ff0a 454e 440a                           ..END.
>               ^^                                             
>     % 
>
> You can confirm it's invalid:
> .
>     % iconv -f utf8 < r/db/revprops/0/0 > /dev/null
>     iconv: illegal input sequence at position 62
>     zsh: exit 1     iconv -f utf8 < r/db/revprops/0/0 > /dev/null
>
> 'svn log' gives:
> .
>     % svn log file://$PWD/r 
>     ------------------------------------------------------------------------
>     r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line
>     
>     ?\FF?\FF?\FF
>     ------------------------------------------------------------------------
>     % 
>
> So I think we can close it as "Fixed at some point"?


Ah, yes, I added some code to at least print something readable (or
let's say "analyzable") in such cases when I added utf8proc to our code.
I can't think of anything better to do, so we can close that issue as
far as I'm concerned.

-- Brane

Re: Issue tracker cleanup: SVN-807

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Mon, Nov 11, 2019 at 10:38:31 -0500:
> Speaking of testing, is this related to last week's issue, SVN-2079,
> which is unresolved, "SVN-2079 "utf8_tests.py should be made non-
> iso8859-1 specific" and if one is solved, would that (help) solve the
> other?

The two issues are related, yes: both of them are about conversion from the
internal encoding (UTF-8) to the encoding used by stdout and argv.  In #2079,
we could do something along the lines of:

1. Check if "en_US.ISO-8859-1" is a valid locale name.

2. From Python, create a file whose name uses some non-ASCII.

3. add/commit that file.

4. Verify that 'svnadmin dump', 'svn ls', etc give the expected results.

5. Do the same thing with another encoding.

> From the issue [#807]:
> 
> > Right now, if a log message contains characters that cannot be
> > represented in the client's locale, that log message will simply show
> > up as:
> >
> >    "[unconvertible log msg]"
> >
> > Graceful degradation would be nice here :-).
> 
> Questions:
> 
> Is this still the case?
> 
> If we're not sure, how can this be tested? (i.e., how to create and
> commit a log message that will cause this to manifest?)

Well, I'm sure there are better ways, but I just did this:
.
    % svnadmin create r
    % vim -b r/db/revprops/0/0
.
and manually added an svn:log property with a value that's invalid UTF-8 [svn:*
properties must use UTF-8 with LF line endings]:
.
    % xxd r/db/revprops/0/0 | vipe
    00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032  K 8.svn:date.V 2
    00000010: 370a 3230 3139 2d31 312d 3131 5431 363a  7.2019-11-11T16:
    00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20  08:01.344744Z.K 
    00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff  7.svn:log.V 3...
                                                 ^^^^                  
    00000040: ff0a 454e 440a                           ..END.
              ^^                                             
    % 

You can confirm it's invalid:
.
    % iconv -f utf8 < r/db/revprops/0/0 > /dev/null
    iconv: illegal input sequence at position 62
    zsh: exit 1     iconv -f utf8 < r/db/revprops/0/0 > /dev/null

'svn log' gives:
.
    % svn log file://$PWD/r 
    ------------------------------------------------------------------------
    r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line
    
    ?\FF?\FF?\FF
    ------------------------------------------------------------------------
    % 

So I think we can close it as "Fixed at some point"?

Thanks for bringing it up!

Daniel
(who wonders how many years ago the issue was fixed…)