You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nathan Hartman <ha...@gmail.com> on 2019/11/11 15:38:31 UTC
Issue tracker cleanup: SVN-807
For this week's issue, SVN-807 "gracefully degrade from failed charset
conversion":
This issue was created in 2002; I don't see any newer comments (the
only recent activity appears to be a property change made last year).
From the issue:
> Right now, if a log message contains characters that cannot be
> represented in the client's locale, that log message will simply show
> up as:
>
> "[unconvertible log msg]"
>
> Graceful degradation would be nice here :-).
Questions:
Is this still the case?
If we're not sure, how can this be tested? (i.e., how to create and
commit a log message that will cause this to manifest?)
Speaking of testing, is this related to last week's issue, SVN-2079,
which is unresolved, "SVN-2079 "utf8_tests.py should be made non-
iso8859-1 specific" and if one is solved, would that (help) solve the
other?
Thanks,
Nathan
Re: Issue tracker cleanup: SVN-807
Posted by Branko Čibej <br...@apache.org>.
On 12.11.2019 16:13, Nathan Hartman wrote:
> I'm guessing
> there was a desire to do more than print the offending hex codes but
> I don't know what else you could do.
If it's invalid UTF-8, there's nothing you can do.
There's a lot you can do if the UTF-8 is valid, but the target locale
can't represent the characters. Transliteration is a thing, but if we go
down that path, it's hard to decide when to stop. Subversion is not a
linguistics tool, it's a version control tool.
-- Brane
P.S.: I think there's even code that removes diacritical marks from the
source Unicode to make conversion to the target locale easier, at least
for Latin-based locales. That's already a kind of transliteration, and
even that may be way too much.
Fwd: Issue tracker cleanup: SVN-807
Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 12, 2019 at 1:08 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:
> Nathan Hartman wrote on Tue, 12 Nov 2019 15:13 +00:00:
> > Daniel, thanks for testing this and documenting how. Please, could
> > you add that as a comment in the issue tracker? Or, if you'd like,
> > I'll be happy to do that and attribute it to you.
>
> Go ahead.
>
Done. SVN-807 is closed.
Re: Issue tracker cleanup: SVN-807
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Tue, 12 Nov 2019 15:13 +00:00:
> Daniel, thanks for testing this and documenting how. Please, could
> you add that as a comment in the issue tracker? Or, if you'd like,
> I'll be happy to do that and attribute it to you.
Go ahead.
> I agree that this issue should be closed.
>
> From my reading, it looks like it was not closed as a reminder to move
> this to APR. (Though that might make sense from a refactoring
> standpoint, I think it would cause dependency headaches.)
It's not that bad; we've done it before. We create an svn__foo()
function, implement it, ask APR to add it too, then we use #if
APR_VERSION_AT_LEAST() to have svn__foo() use the APR implementation in
preference to ours, and remove the conditional and our implementation
when we bump the minimum supported APR version.
Re: Issue tracker cleanup: SVN-807
Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Nov 12, 2019 at 4:07 AM Branko Čibej <br...@apache.org> wrote:
> On 11.11.2019 17:30, Daniel Shahaf wrote:
>
(snip test procedure)
> > So I think we can close it as "Fixed at some point"?
>
> Ah, yes, I added some code to at least print something readable (or
> let's say "analyzable") in such cases when I added utf8proc to our code.
> I can't think of anything better to do, so we can close that issue as
> far as I'm concerned.
Brane, thanks for your input.
Daniel, thanks for testing this and documenting how. Please, could
you add that as a comment in the issue tracker? Or, if you'd like,
I'll be happy to do that and attribute it to you.
I agree that this issue should be closed.
From my reading, it looks like it was not closed as a reminder to move
this to APR. (Though that might make sense from a refactoring
standpoint, I think it would cause dependency headaches.) Also there
was a patch from Ulrich Drepper but it looks incomplete. I'm guessing
there was a desire to do more than print the offending hex codes but
I don't know what else you could do.
If there are no objections, I'll go ahead and close SVN-807.
As a side note, I feel dumb because I went spelunking all over the
code and found r842879...
[[[
r842879 | kfogel | 2002-07-30 18:33:13 -0400 (Tue, 30 Jul 2002) | 12
lines
Start on issue #807. Thanks to Justin Erenkrantz for his initial
patch to check_non_ascii().
* subversion/libsvn_subr/utf.c
(check_non_ascii): Return the more appropriate APR_EINVAL.
* subversion/include/svn_utf.h, subversion/libsvn_subr/utf.c
(svn_utf_cstring_from_utf8_fuzzy): New function.
* subversion/clients/cmdline/log-cmd.c
(log_message_receiver): Degrade gracefully yet insistently.
]]]
and only AFTER that, I noticed that it was written in the issue
tracker all along:
[[[
Karl Fogel added a comment - 30/Jul/02 22:46
Okay, the worst has been resolved by revision 2805.
We still need to move the code over to APR, if appropriate. And apply
Brane and Ulrich's patches, but that may be a separate issue, will
mull on that.
]]]
r2805 + 840074 = r842879... Which, by the way, is before 1.0.
The only other thing I found that deals with bad UTF-8 in log messages
is the logic that prevents bad UTF-8 and mixed line endings from
getting into the log and the automated test for it.
Nathan
Re: Issue tracker cleanup: SVN-807
Posted by Branko Čibej <br...@apache.org>.
On 11.11.2019 17:30, Daniel Shahaf wrote:
> Nathan Hartman wrote on Mon, Nov 11, 2019 at 10:38:31 -0500:
>> If we're not sure, how can this be tested? (i.e., how to create and
>> commit a log message that will cause this to manifest?)
> Well, I'm sure there are better ways, but I just did this:
> .
> % svnadmin create r
> % vim -b r/db/revprops/0/0
> .
> and manually added an svn:log property with a value that's invalid UTF-8 [svn:*
> properties must use UTF-8 with LF line endings]:
> .
> % xxd r/db/revprops/0/0 | vipe
> 00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032 K 8.svn:date.V 2
> 00000010: 370a 3230 3139 2d31 312d 3131 5431 363a 7.2019-11-11T16:
> 00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20 08:01.344744Z.K
> 00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff 7.svn:log.V 3...
> ^^^^
> 00000040: ff0a 454e 440a ..END.
> ^^
> %
>
> You can confirm it's invalid:
> .
> % iconv -f utf8 < r/db/revprops/0/0 > /dev/null
> iconv: illegal input sequence at position 62
> zsh: exit 1 iconv -f utf8 < r/db/revprops/0/0 > /dev/null
>
> 'svn log' gives:
> .
> % svn log file://$PWD/r
> ------------------------------------------------------------------------
> r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line
>
> ?\FF?\FF?\FF
> ------------------------------------------------------------------------
> %
>
> So I think we can close it as "Fixed at some point"?
Ah, yes, I added some code to at least print something readable (or
let's say "analyzable") in such cases when I added utf8proc to our code.
I can't think of anything better to do, so we can close that issue as
far as I'm concerned.
-- Brane
Re: Issue tracker cleanup: SVN-807
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Nathan Hartman wrote on Mon, Nov 11, 2019 at 10:38:31 -0500:
> Speaking of testing, is this related to last week's issue, SVN-2079,
> which is unresolved, "SVN-2079 "utf8_tests.py should be made non-
> iso8859-1 specific" and if one is solved, would that (help) solve the
> other?
The two issues are related, yes: both of them are about conversion from the
internal encoding (UTF-8) to the encoding used by stdout and argv. In #2079,
we could do something along the lines of:
1. Check if "en_US.ISO-8859-1" is a valid locale name.
2. From Python, create a file whose name uses some non-ASCII.
3. add/commit that file.
4. Verify that 'svnadmin dump', 'svn ls', etc give the expected results.
5. Do the same thing with another encoding.
> From the issue [#807]:
>
> > Right now, if a log message contains characters that cannot be
> > represented in the client's locale, that log message will simply show
> > up as:
> >
> > "[unconvertible log msg]"
> >
> > Graceful degradation would be nice here :-).
>
> Questions:
>
> Is this still the case?
>
> If we're not sure, how can this be tested? (i.e., how to create and
> commit a log message that will cause this to manifest?)
Well, I'm sure there are better ways, but I just did this:
.
% svnadmin create r
% vim -b r/db/revprops/0/0
.
and manually added an svn:log property with a value that's invalid UTF-8 [svn:*
properties must use UTF-8 with LF line endings]:
.
% xxd r/db/revprops/0/0 | vipe
00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032 K 8.svn:date.V 2
00000010: 370a 3230 3139 2d31 312d 3131 5431 363a 7.2019-11-11T16:
00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20 08:01.344744Z.K
00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff 7.svn:log.V 3...
^^^^
00000040: ff0a 454e 440a ..END.
^^
%
You can confirm it's invalid:
.
% iconv -f utf8 < r/db/revprops/0/0 > /dev/null
iconv: illegal input sequence at position 62
zsh: exit 1 iconv -f utf8 < r/db/revprops/0/0 > /dev/null
'svn log' gives:
.
% svn log file://$PWD/r
------------------------------------------------------------------------
r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line
?\FF?\FF?\FF
------------------------------------------------------------------------
%
So I think we can close it as "Fixed at some point"?
Thanks for bringing it up!
Daniel
(who wonders how many years ago the issue was fixed…)