You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Dag-Erling Smørgrav <de...@des.no> on 2009/10/18 19:33:01 UTC

"200 OK" error caused by non-ASCII characters?

Repo URL: http://svn.des.no/svn/openpam
Server: FreeBSD 7.2, Apache 2.2.13, Subversion 1.6.5

Client: FreeBSD 7.2, Subversion 1.6.5 with ra_neon:

% svn log -r30 http://svn.des.no/svn/openpam | wc -l
       7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: REPORT of '/svn/openpam/!svn/bc/31': 200 OK (http://svn.des.no)
       0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
       9

Client: Ubuntu Hardy, Subversion 1.5.1 with ra_neon:

% svn log -r30 http://svn.des.no/svn/openpam | wc -l
7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: REPORT of '/svn/openpam/!svn/bc/31': 200 OK (http://svn.des.no)
0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
9

Client: FreeBSD 9.0, Subversion 1.6.5 with ra_serf:

% svn log -r30 http://svn.des.no/svn/openpam | wc -l
       7
% svn log -r31 http://svn.des.no/svn/openpam | wc -l
svn: XML parsing failed: (200 OK)
       0
% svn log -r32 http://svn.des.no/svn/openpam | wc -l
       9

It so happens that the log message for revision 31 contains an ISO8859-1
character.  It is not the only such revision in the repo, but it is the
first.  In the other direction, a plain 'svn log' stops at revision 191,
because the log message for revision 190 also contains ISO8859-1
characters.

Obiously, filtering a dump through iconv is not going to work, since the
size of the log messages (and anything else that might contain non-ASCII
characters) will change.  I could write a script that fixed both the log
messages and the lengths, but I'd rather not, unless there is no other
solution.

Any suggestions?

DES
-- 
Dag-Erling Smørgrav - des@des.no

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408704

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Dag-Erling Smørgrav <de...@des.no>.
Fixed with the attached script.  It expects two arguments: the character
set to convert from and the repo URL, so in this case:

% perl -w svn-log-iconv.pl iso-8859-1 http://svn.des.no/svn/openpam

(except it's not actually writable over http)

DES
-- 
Dag-Erling Smørgrav - des@des.no

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408744

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Dag-Erling Smørgrav <de...@des.no>.
Stefan Sperling <st...@elego.de> writes:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > Won't work - 'propget svn:log' doesn't return the real log message:
> Well, it is legal UTF-8 :)

For what it's worth...

> Can you try with file:// access?

No difference.

> Maybe the log message has not even been stored correctly?

It is definitely stored as ISO8859-1, I checked the db (fsfs).

> Are you sure it was correct latin1 when it exited your editor,

Yes.

> and that you were running with a latin1 locale so that the svn client
> would auto-convert the data from latin1 to UTF-8?

I'm 100% positive that I used a ISO8859-1 locale at the time (7+ ago),
and 100% positive that it's stored as ISO8859-1 in the db.

I didn't switch to a UTF-8 locale until around 2006.

I've never committed over http:, always over file: or svn+ssh:.

DES
-- 
Dag-Erling Smørgrav - des@des.no

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408738

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Oct 18, 2009 at 11:03:32PM +0200, Dag-Erling Smørgrav wrote:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > Stefan Sperling <st...@elego.de> writes:
> > > If you need to script it you can use svn propget, iconv, and svn propset.
> > > There is no need to dump/filter/load at all to fix log messages, it can
> > > be done with an svn client.
> > I'll try that, thanks...
> 
> Won't work - 'propget svn:log' doesn't return the real log message:
> 
> % svn propget svn:log --revprop -r 31 http://svn.des.no/svn/openpam | hexdump -C
> 00000000  49 20 73 75 63 6b 2e 20  20 54 68 69 73 20 74 69  |I suck.  This ti|
> 00000010  6d 65 2c 20 74 65 73 74  20 62 65 66 6f 72 65 20  |me, test before |
> 00000020  63 6f 6d 6d 69 74 74 69  6e 67 2e 20 20 49 20 61  |committing.  I a|
> 00000030  70 6f 6c 6f 67 69 7a 65  20 66 6f 72 20 74 68 65  |pologize for the|
> 00000040  0a 61 63 75 74 65 20 65  6d 62 61 72 61 73 73 6d  |.acute embarassm|
> 00000050  65 6e 74 20 79 6f 75 20  6d 75 73 74 20 61 6c 6c  |ent you must all|
> 00000060  20 66 65 65 6c 20 66 6f  72 20 6b 6e 6f 77 69 6e  | feel for knowin|
> 00000070  67 20 6d 65 2e 20 20 49  20 73 68 61 6c 6c 0a 70  |g me.  I shall.p|
> 00000080  65 72 66 6f 72 6d 20 53  65 70 70 75 6b 75 20 61  |erform Seppuku a|
> 00000090  74 20 73 75 6e 64 6f 77  6e 20 74 6f 20 61 74 6f  |t sundown to ato|
> 000000a0  6e 65 20 66 6f 72 20 6d  79 20 63 72 69 6d 65 73  |ne for my crimes|
> 000000b0  2e 0a 0a 47 61 72 3f 5c  32 33 31 6f 6e 21 20 20  |...Gar?\231on!  |
> 000000c0  4d 6f 72 65 20 63 6f 66  66 65 65 21 0a 0a        |More coffee!..|

Well, it is legal UTF-8 :)
$ echo 0x3f 0x5c 0x32 0x33 0x31 | xxd  -r-p | uniname
character  byte       UTF-32   encoded as     glyph   name
        0          0  00003F   3F             ?      QUESTION MARK
        1          1  00005C   5C             \      REVERSE SOLIDUS
        2          2  000032   32             2      DIGIT TWO
        3          3  000033   33             3      DIGIT THREE
        4          4  000031   31             1      DIGIT ONE

Can you try with file:// access?
Maybe the log message has not even been stored correctly?
Are you sure it was correct latin1 when it exited your editor,
and that you were running with a latin1 locale so that the svn client
would auto-convert the data from latin1 to UTF-8?

Also, Apache httpd modules are agnostic about locales.
However, the Subversion libraries will often try to convert data to UTF-8
from the native charset, which in the context of apache modules is always
just "C". So it's possible that charset conversion won't work reliably if
you're getting non-UTF-8 log messages via http, and might not even work
reliably if you're committing non-UTF-8 log messages (which, as of 1.6.0,
should always fail.)

Stefan

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408733

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Dag-Erling Smørgrav <de...@des.no>.
Dag-Erling Smørgrav <de...@des.no> writes:
> Stefan Sperling <st...@elego.de> writes:
> > If you need to script it you can use svn propget, iconv, and svn propset.
> > There is no need to dump/filter/load at all to fix log messages, it can
> > be done with an svn client.
> I'll try that, thanks...

Won't work - 'propget svn:log' doesn't return the real log message:

% svn propget svn:log --revprop -r 31 http://svn.des.no/svn/openpam | hexdump -C
00000000  49 20 73 75 63 6b 2e 20  20 54 68 69 73 20 74 69  |I suck.  This ti|
00000010  6d 65 2c 20 74 65 73 74  20 62 65 66 6f 72 65 20  |me, test before |
00000020  63 6f 6d 6d 69 74 74 69  6e 67 2e 20 20 49 20 61  |committing.  I a|
00000030  70 6f 6c 6f 67 69 7a 65  20 66 6f 72 20 74 68 65  |pologize for the|
00000040  0a 61 63 75 74 65 20 65  6d 62 61 72 61 73 73 6d  |.acute embarassm|
00000050  65 6e 74 20 79 6f 75 20  6d 75 73 74 20 61 6c 6c  |ent you must all|
00000060  20 66 65 65 6c 20 66 6f  72 20 6b 6e 6f 77 69 6e  | feel for knowin|
00000070  67 20 6d 65 2e 20 20 49  20 73 68 61 6c 6c 0a 70  |g me.  I shall.p|
00000080  65 72 66 6f 72 6d 20 53  65 70 70 75 6b 75 20 61  |erform Seppuku a|
00000090  74 20 73 75 6e 64 6f 77  6e 20 74 6f 20 61 74 6f  |t sundown to ato|
000000a0  6e 65 20 66 6f 72 20 6d  79 20 63 72 69 6d 65 73  |ne for my crimes|
000000b0  2e 0a 0a 47 61 72 3f 5c  32 33 31 6f 6e 21 20 20  |...Gar?\231on!  |
000000c0  4d 6f 72 65 20 63 6f 66  66 65 65 21 0a 0a        |More coffee!..|

DES
-- 
Dag-Erling Smørgrav - des@des.no

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408728

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Dag-Erling Smørgrav <de...@des.no>.
Stefan Sperling <st...@elego.de> writes:
> Dag-Erling Smørgrav <de...@des.no> writes:
> > It so happens that the log message for revision 31 contains an ISO8859-1
> > character.
> Looks like it's something else:
> $ svn propget --revprop -r31 svn:log http://svn.des.no/svn/openpam | file -
> /dev/stdin: ASCII English text

file is wrong - IIRC, it just looks at the first 32 characters.  The
complete log message is:

| I suck.  This time, test before committing.  I apologize for the
| acute embarassment you must all feel for knowing me.  I shall
| perform Seppuku at sundown to atone for my crimes.
|
| Gar<E7>on!  More coffee!

> If you need to script it you can use svn propget, iconv, and svn propset.
> There is no need to dump/filter/load at all to fix log messages, it can
> be done with an svn client.

I'll try that, thanks...

DES
-- 
Dag-Erling Smørgrav - des@des.no

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408726

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: "200 OK" error caused by non-ASCII characters?

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Oct 18, 2009 at 09:33:01PM +0200, Dag-Erling Smørgrav wrote:
> It so happens that the log message for revision 31 contains an ISO8859-1
> character.

Looks like it's something else:
$ svn propget --revprop -r31 svn:log http://svn.des.no/svn/openpam | file -
/dev/stdin: ASCII English text

> It is not the only such revision in the repo, but it is the
> first.  In the other direction, a plain 'svn log' stops at revision 191,
> because the log message for revision 190 also contains ISO8859-1
> characters.
> 
> Obiously, filtering a dump through iconv is not going to work, since the
> size of the log messages (and anything else that might contain non-ASCII
> characters) will change.  I could write a script that fixed both the log
> messages and the lengths, but I'd rather not, unless there is no other
> solution.
> 
> Any suggestions?

I'd just propedit the log messages using svn propedit:
  svn propedit --revprop -r31 svn:log http://svn.des.no/svn/openpam

If you need to script it you can use svn propget, iconv, and svn propset.
There is no need to dump/filter/load at all to fix log messages, it can
be done with an svn client.

Log messages in the repository should always be in UTF-8 but
unfortunately older servers didn't enforce this. As of 1.6.x the server
does enforce it.

I don't like way 'svn log' is behaving here BTW. Subversion should at
least print an error saying what is wrong with the log message instead
of bailing out after writing the first few opening tags of XML.

(By the way it's probably one of the darkest log message I've ever
seen :)

Stefan

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2408710

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].