You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Dag-Erling Smørgrav <de...@des.no> on 2009/10/18 19:33:01 UTC
"200 OK" error caused by non-ASCII characters?
Repo URL: http://svn.des.no/svn/openpam
Server: FreeBSD 7.2, Apache 2.2.13, Subversion 1.6.5
Client: FreeBSD 7.2, Subversion 1.6.5 with ra_neon:
% svn log -r30 http://svn.des.no/svn/openpam | wc -l
7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: REPORT of '/svn/openpam/!svn/bc/31': 200 OK (http://svn.des.no)
0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
9
Client: Ubuntu Hardy, Subversion 1.5.1 with ra_neon:
% svn log -r30 http://svn.des.no/svn/openpam | wc -l
7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: REPORT of '/svn/openpam/!svn/bc/31': 200 OK (http://svn.des.no)
0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
9
Client: FreeBSD 9.0, Subversion 1.6.5 with ra_serf:
% svn log -r30 http://svn.des.no/svn/openpam | wc -l
7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: XML parsing failed: (200 OK)
0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
9
It so happens that the log message for revision 31 contains an ISO8859-1
character. It is not the only such revision in the repo, but it is the
first. In the other direction, a plain 'svn log' stops at revision 191,
because the log message for revision 190 also contains ISO8859-1
characters.
Obiously, filtering a dump through iconv is not going to work, since the
size of the log messages (and anything else that might contain non-ASCII
characters) will change. I could write a script that fixed both the log
messages and the lengths, but I'd rather not, unless there is no other
solution.
Any suggestions?
DES
--
Dag-Erling Smørgrav - des@des.no
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408704
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Dag-Erling Smørgrav <de...@des.no>.
Fixed with the attached script. It expects two arguments: the character
set to convert from and the repo URL, so in this case:
% perl -w svn-log-iconv.pl iso-8859-1 http://svn.des.no/svn/openpam
(except it's not actually writable over http)
DES
--
Dag-Erling Smørgrav - des@des.no
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408744
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Dag-Erling Smørgrav <de...@des.no>.
Stefan Sperling <st...@elego.de> writes:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > Won't work - 'propget svn:log' doesn't return the real log message:
> Well, it is legal UTF-8 :)
For what it's worth...
> Can you try with file:// access?
No difference.
> Maybe the log message has not even been stored correctly?
It is definitely stored as ISO8859-1, I checked the db (fsfs).
> Are you sure it was correct latin1 when it exited your editor,
Yes.
> and that you were running with a latin1 locale so that the svn client
> would auto-convert the data from latin1 to UTF-8?
I'm 100% positive that I used a ISO8859-1 locale at the time (7+ ago),
and 100% positive that it's stored as ISO8859-1 in the db.
I didn't switch to a UTF-8 locale until around 2006.
I've never committed over http:, always over file: or svn+ssh:.
DES
--
Dag-Erling Smørgrav - des@des.no
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408738
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Stefan Sperling <st...@elego.de>.
On Sun, Oct 18, 2009 at 11:03:32PM +0200, Dag-Erling Smørgrav wrote:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > Stefan Sperling <st...@elego.de> writes:
> > > If you need to script it you can use svn propget, iconv, and svn propset.
> > > There is no need to dump/filter/load at all to fix log messages, it can
> > > be done with an svn client.
> > I'll try that, thanks...
>
> Won't work - 'propget svn:log' doesn't return the real log message:
>
> % svn propget svn:log --revprop -r 31 http://svn.des.no/svn/openpam | hexdump -C
> 00000000 49 20 73 75 63 6b 2e 20 20 54 68 69 73 20 74 69 |I suck. This ti|
> 00000010 6d 65 2c 20 74 65 73 74 20 62 65 66 6f 72 65 20 |me, test before |
> 00000020 63 6f 6d 6d 69 74 74 69 6e 67 2e 20 20 49 20 61 |committing. I a|
> 00000030 70 6f 6c 6f 67 69 7a 65 20 66 6f 72 20 74 68 65 |pologize for the|
> 00000040 0a 61 63 75 74 65 20 65 6d 62 61 72 61 73 73 6d |.acute embarassm|
> 00000050 65 6e 74 20 79 6f 75 20 6d 75 73 74 20 61 6c 6c |ent you must all|
> 00000060 20 66 65 65 6c 20 66 6f 72 20 6b 6e 6f 77 69 6e | feel for knowin|
> 00000070 67 20 6d 65 2e 20 20 49 20 73 68 61 6c 6c 0a 70 |g me. I shall.p|
> 00000080 65 72 66 6f 72 6d 20 53 65 70 70 75 6b 75 20 61 |erform Seppuku a|
> 00000090 74 20 73 75 6e 64 6f 77 6e 20 74 6f 20 61 74 6f |t sundown to ato|
> 000000a0 6e 65 20 66 6f 72 20 6d 79 20 63 72 69 6d 65 73 |ne for my crimes|
> 000000b0 2e 0a 0a 47 61 72 3f 5c 32 33 31 6f 6e 21 20 20 |...Gar?\231on! |
> 000000c0 4d 6f 72 65 20 63 6f 66 66 65 65 21 0a 0a |More coffee!..|
Well, it is legal UTF-8 :)
$ echo 0x3f 0x5c 0x32 0x33 0x31 | xxd -r-p | uniname
character byte UTF-32 encoded as glyph name
0 0 00003F 3F ? QUESTION MARK
1 1 00005C 5C \ REVERSE SOLIDUS
2 2 000032 32 2 DIGIT TWO
3 3 000033 33 3 DIGIT THREE
4 4 000031 31 1 DIGIT ONE
Can you try with file:// access?
Maybe the log message has not even been stored correctly?
Are you sure it was correct latin1 when it exited your editor,
and that you were running with a latin1 locale so that the svn client
would auto-convert the data from latin1 to UTF-8?
Also, Apache httpd modules are agnostic about locales.
However, the Subversion libraries will often try to convert data to UTF-8
from the native charset, which in the context of apache modules is always
just "C". So it's possible that charset conversion won't work reliably if
you're getting non-UTF-8 log messages via http, and might not even work
reliably if you're committing non-UTF-8 log messages (which, as of 1.6.0,
should always fail.)
Stefan
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408733
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Dag-Erling Smørgrav <de...@des.no>.
Dag-Erling Smørgrav <de...@des.no> writes:
> Stefan Sperling <st...@elego.de> writes:
> > If you need to script it you can use svn propget, iconv, and svn propset.
> > There is no need to dump/filter/load at all to fix log messages, it can
> > be done with an svn client.
> I'll try that, thanks...
Won't work - 'propget svn:log' doesn't return the real log message:
% svn propget svn:log --revprop -r 31 http://svn.des.no/svn/openpam | hexdump -C
00000000 49 20 73 75 63 6b 2e 20 20 54 68 69 73 20 74 69 |I suck. This ti|
00000010 6d 65 2c 20 74 65 73 74 20 62 65 66 6f 72 65 20 |me, test before |
00000020 63 6f 6d 6d 69 74 74 69 6e 67 2e 20 20 49 20 61 |committing. I a|
00000030 70 6f 6c 6f 67 69 7a 65 20 66 6f 72 20 74 68 65 |pologize for the|
00000040 0a 61 63 75 74 65 20 65 6d 62 61 72 61 73 73 6d |.acute embarassm|
00000050 65 6e 74 20 79 6f 75 20 6d 75 73 74 20 61 6c 6c |ent you must all|
00000060 20 66 65 65 6c 20 66 6f 72 20 6b 6e 6f 77 69 6e | feel for knowin|
00000070 67 20 6d 65 2e 20 20 49 20 73 68 61 6c 6c 0a 70 |g me. I shall.p|
00000080 65 72 66 6f 72 6d 20 53 65 70 70 75 6b 75 20 61 |erform Seppuku a|
00000090 74 20 73 75 6e 64 6f 77 6e 20 74 6f 20 61 74 6f |t sundown to ato|
000000a0 6e 65 20 66 6f 72 20 6d 79 20 63 72 69 6d 65 73 |ne for my crimes|
000000b0 2e 0a 0a 47 61 72 3f 5c 32 33 31 6f 6e 21 20 20 |...Gar?\231on! |
000000c0 4d 6f 72 65 20 63 6f 66 66 65 65 21 0a 0a |More coffee!..|
DES
--
Dag-Erling Smørgrav - des@des.no
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408728
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Dag-Erling Smørgrav <de...@des.no>.
Stefan Sperling <st...@elego.de> writes:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > It so happens that the log message for revision 31 contains an ISO8859-1
> > character.
> Looks like it's something else:
> $ svn propget --revprop -r31 svn:log http://svn.des.no/svn/openpam | file -
> /dev/stdin: ASCII English text
file is wrong - IIRC, it just looks at the first 32 characters. The
complete log message is:
| I suck. This time, test before committing. I apologize for the
| acute embarassment you must all feel for knowing me. I shall
| perform Seppuku at sundown to atone for my crimes.
|
| Gar<E7>on! More coffee!
> If you need to script it you can use svn propget, iconv, and svn propset.
> There is no need to dump/filter/load at all to fix log messages, it can
> be done with an svn client.
I'll try that, thanks...
DES
--
Dag-Erling Smørgrav - des@des.no
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408726
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: "200 OK" error caused by non-ASCII characters?
Posted by Stefan Sperling <st...@elego.de>.
On Sun, Oct 18, 2009 at 09:33:01PM +0200, Dag-Erling Smørgrav wrote:
> It so happens that the log message for revision 31 contains an ISO8859-1
> character.
Looks like it's something else:
$ svn propget --revprop -r31 svn:log http://svn.des.no/svn/openpam | file -
/dev/stdin: ASCII English text
> It is not the only such revision in the repo, but it is the
> first. In the other direction, a plain 'svn log' stops at revision 191,
> because the log message for revision 190 also contains ISO8859-1
> characters.
>
> Obiously, filtering a dump through iconv is not going to work, since the
> size of the log messages (and anything else that might contain non-ASCII
> characters) will change. I could write a script that fixed both the log
> messages and the lengths, but I'd rather not, unless there is no other
> solution.
>
> Any suggestions?
I'd just propedit the log messages using svn propedit:
svn propedit --revprop -r31 svn:log http://svn.des.no/svn/openpam
If you need to script it you can use svn propget, iconv, and svn propset.
There is no need to dump/filter/load at all to fix log messages, it can
be done with an svn client.
Log messages in the repository should always be in UTF-8 but
unfortunately older servers didn't enforce this. As of 1.6.x the server
does enforce it.
I don't like way 'svn log' is behaving here BTW. Subversion should at
least print an error saying what is wrong with the log message instead
of bailing out after writing the first few opening tags of XML.
(By the way it's probably one of the darkest log message I've ever
seen :)
Stefan
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408710
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].