You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Phil Pinkerton <pc...@gmail.com> on 2012/02/08 20:00:01 UTC

Error doing a svnsync

We have been doing a few hundred svnsync's  from a 1.6.5 repositories
to 1.7.2 repositories

for the most par this has gone quite well. but we have encountered an
error that is not to clear and we seek any insight to this error:

svnsync: E000022: Valid UTF-8 data
(hex: 53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20)
followed by invalid UTF-8 sequence
(hex: 96 20 42 61)


-- 
" The fundamental principle here is that the justification for a
physical concept lies exclusively in its clear and unambiguous
relation to the facts that it can be experienced"   AE

Please Feed and Educate the Children... it's the least any of us can do.

Re: Error doing a svnsync

Posted by Stephen Butler <sb...@elego.de>.
[ cc'ing users@ again]

On Feb 8, 2012, at 23:25 , Phil Pinkerton wrote:

> Thanks when I try using --source-prop-encoding I get
> 
> svnsync: E000022: Safe data 'Server Currency ' was followed by
> non-ASCII byte 150: unable to convert to/from UTF-8

What was the name of the (non-UTF-8) encoding?  I think

  --source-prop-encoding cp1252

might work, thanks to a handy table I found at

  http://www.prismnet.com/~jdawson/cp1252.html

[[[
>>> s = "".join([chr(int(n, 16)).decode("cp1252") for n in "53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20 96 20 42 61".split()])
>>> with open("log.txt", "w") as f:
...     f.write(s.encode("utf-8"))
]]]

When I open log.txt in Emacs I see a long dash:

  Server Currency – Ba

Bingo! :-)

Steve

> 
> On Wed, Feb 8, 2012 at 4:28 PM, Stephen Butler <sb...@elego.de> wrote:
>> 
>> On Feb 8, 2012, at 20:00 , Phil Pinkerton wrote:
>> 
>>> We have been doing a few hundred svnsync's  from a 1.6.5 repositories
>>> to 1.7.2 repositories
>>> 
>>> for the most par this has gone quite well. but we have encountered an
>>> error that is not to clear and we seek any insight to this error:
>>> 
>>> svnsync: E000022: Valid UTF-8 data
>>> (hex: 53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20)
>>> followed by invalid UTF-8 sequence
>>> (hex: 96 20 42 61)
>> 
>> 
>> Indeed, the 0x96 is invalid in UTF-8.
>> 
>>>>> "".join([chr(int(n, 16)).decode("utf-8") for n in "53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20".split()])
>> u'Server Currency '
>> 
>>>>> "".join([chr(int(n, 16)).decode("utf-8") for n in "96 20 42 61".split()])
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
>>    return codecs.utf_8_decode(input, errors, True)
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 0: invalid start byte
>> 
>> Does that text appear in a log message?  The 1.7 server is stricter
>> about UTF-8.
>> 
>> The svnsync command has a new option --source-prop-encoding,
>> which may be useful if some old client committed a log message in
>> some other encoding.
>> 
>> Regards,
>> Steve
> 
> 
> 
> -- 
> " The fundamental principle here is that the justification for a
> physical concept lies exclusively in its clear and unambiguous
> relation to the facts that it can be experienced"   AE
> 
> Please Feed and Educate the Children... it's the least any of us can do.

--
Stephen Butler | Consultant
elego Software Solutions GmbH
Gustav-Meyer-Allee 25, 13355 Berlin, Germany
tel: +49 30 2345 8696 | mobile: +49 163 25 45 015
fax: +49 30 2345 8695 | http://www.elego.de
Geschäftsführer: Olaf Wagner | Sitz der Gesellschaft: Berlin
Amtsgericht Charlottenburg HRB 77719



Re: Error doing a svnsync

Posted by Stephen Butler <sb...@elego.de>.
On Feb 8, 2012, at 20:00 , Phil Pinkerton wrote:

> We have been doing a few hundred svnsync's  from a 1.6.5 repositories
> to 1.7.2 repositories
> 
> for the most par this has gone quite well. but we have encountered an
> error that is not to clear and we seek any insight to this error:
> 
> svnsync: E000022: Valid UTF-8 data
> (hex: 53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20)
> followed by invalid UTF-8 sequence
> (hex: 96 20 42 61)


Indeed, the 0x96 is invalid in UTF-8.

>>> "".join([chr(int(n, 16)).decode("utf-8") for n in "53 65 72 76 65 72 20 43 75 72 72 65 6e 63 79 20".split()])
u'Server Currency '

>>> "".join([chr(int(n, 16)).decode("utf-8") for n in "96 20 42 61".split()])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 0: invalid start byte

Does that text appear in a log message?  The 1.7 server is stricter 
about UTF-8.  

The svnsync command has a new option --source-prop-encoding, 
which may be useful if some old client committed a log message in 
some other encoding.

Regards,
Steve