You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Ben Schonle <be...@amigos24.net> on 2008/03/12 16:04:14 UTC

Problem with non-UTF8

Hello,

when importing a folder with mutiple subfolders and files I get the error:

svn: Valid UTF-8 data
(hex: 4b 61 69 72 6f 73 20 70)
followed by invalid UTF-8 sequence
(hex: f5 68 69 76)

As I understand it there are some characters in the file names that are 
not UTF8. I was thinking not to rename the respective files / folders, 
but would need to know their names, locations first.

How do you suggest to proceed?

Thx,
Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem with non-UTF8

Posted by Erik Huelsmann <eh...@gmail.com>.
On 3/19/08, Ben Schonle <be...@amigos24.net> wrote:
> Ryan Schmidt wrote:
> > On Mar 18, 2008, at 03:50, Ben Schonle wrote:
> >
> >> Lars Grunewaldt wrote:
> >>
> >>> Am 12.03.2008 um 17:04 schrieb Ben Schonle:
> >>>
> >>>> when importing a folder with mutiple subfolders and files I get the
> >>>> error:
> >>>>
> >>>> svn: Valid UTF-8 data
> >>>> (hex: 4b 61 69 72 6f 73 20 70)
> >>>> followed by invalid UTF-8 sequence
> >>>> (hex: f5 68 69 76)
> >>>>
> >>>> As I understand it there are some characters in the file names that
> >>>> are not UTF8. I was thinking not to rename the respective files /
> >>>> folders, but would need to know their names, locations first.
> >>>>
> >>>> How do you suggest to proceed?
> >>>
> >>> this happens mostly (for me), when entering a commit message that
> >>> contains German Umlaute or other non-ISO8859-1-Characters on a unix
> >>> terminal. Maybe that's the case for you, too?
> >>>
> >>> Otherwise, you could hex-decode the string (treat the numbers as
> >>> 8-Bit-ANSI-Characters), that should be a part of your filename.
> >>
> >> for me the problem is that some folders / file names contain German
> >> Umlaute.
> >>
> >> according to Lars I should hex-decode the string. Thus the questions:
> >>
> >>    * how do I find out which folders / files I need to hexdecode?
> >>    * how do I hexdecode them once found?
> >
> > Take the hex in the error message and turn it into characters.
> >
> >>>> svn: Valid UTF-8 data
> >>>> (hex: 4b 61 69 72 6f 73 20 70)
> >
> > That's all ASCII data (all bytes are less than hex 80) so in any
> > character encoding that's "Kairos p"
> >
> >>>> followed by invalid UTF-8 sequence
> >>>> (hex: f5 68 69 76)
> >
> > If we assume the character encoding of these bytes is ISO-8859-1, then
> > this is "õhiv".
> >
> > One solution is to rename the files to have no non-ASCII characters.
> >
> > What you probably want to do, though, is set the LANG environment
> > variable correctly so that svn knows what character encoding to use to
> > read the file names.
> >
> Hey Ryan,
>
> I now renamed the respective files to only use ASCII characters.
> However, I would be still interested to know WHERE to set the LANG
> environment variable? Do you refer her to the OS or to SVN settings?

Which operating system? (and probably some more questions depending on
that answer)


Bye,

Erik

Re: Problem with non-UTF8

Posted by Ben Schonle <be...@amigos24.net>.
Ryan Schmidt wrote:
> On Mar 18, 2008, at 03:50, Ben Schonle wrote:
>
>> Lars Grunewaldt wrote:
>>
>>> Am 12.03.2008 um 17:04 schrieb Ben Schonle:
>>>
>>>> when importing a folder with mutiple subfolders and files I get the 
>>>> error:
>>>>
>>>> svn: Valid UTF-8 data
>>>> (hex: 4b 61 69 72 6f 73 20 70)
>>>> followed by invalid UTF-8 sequence
>>>> (hex: f5 68 69 76)
>>>>
>>>> As I understand it there are some characters in the file names that 
>>>> are not UTF8. I was thinking not to rename the respective files / 
>>>> folders, but would need to know their names, locations first.
>>>>
>>>> How do you suggest to proceed?
>>>
>>> this happens mostly (for me), when entering a commit message that 
>>> contains German Umlaute or other non-ISO8859-1-Characters on a unix 
>>> terminal. Maybe that's the case for you, too?
>>>
>>> Otherwise, you could hex-decode the string (treat the numbers as 
>>> 8-Bit-ANSI-Characters), that should be a part of your filename.
>>
>> for me the problem is that some folders / file names contain German 
>> Umlaute.
>>
>> according to Lars I should hex-decode the string. Thus the questions:
>>
>>    * how do I find out which folders / files I need to hexdecode?
>>    * how do I hexdecode them once found?
>
> Take the hex in the error message and turn it into characters.
>
>>>> svn: Valid UTF-8 data
>>>> (hex: 4b 61 69 72 6f 73 20 70)
>
> That's all ASCII data (all bytes are less than hex 80) so in any 
> character encoding that's "Kairos p"
>
>>>> followed by invalid UTF-8 sequence
>>>> (hex: f5 68 69 76)
>
> If we assume the character encoding of these bytes is ISO-8859-1, then 
> this is "õhiv".
>
> One solution is to rename the files to have no non-ASCII characters.
>
> What you probably want to do, though, is set the LANG environment 
> variable correctly so that svn knows what character encoding to use to 
> read the file names.
>
Hey Ryan,

I now renamed the respective files to only use ASCII characters. 
However, I would be still interested to know WHERE to set the LANG 
environment variable? Do you refer her to the OS or to SVN settings?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem with non-UTF8

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 18, 2008, at 03:50, Ben Schonle wrote:

> Lars Grunewaldt wrote:
>
>> Am 12.03.2008 um 17:04 schrieb Ben Schonle:
>>
>>> when importing a folder with mutiple subfolders and files I get  
>>> the error:
>>>
>>> svn: Valid UTF-8 data
>>> (hex: 4b 61 69 72 6f 73 20 70)
>>> followed by invalid UTF-8 sequence
>>> (hex: f5 68 69 76)
>>>
>>> As I understand it there are some characters in the file names  
>>> that are not UTF8. I was thinking not to rename the respective  
>>> files / folders, but would need to know their names, locations  
>>> first.
>>>
>>> How do you suggest to proceed?
>>
>> this happens mostly (for me), when entering a commit message that  
>> contains German Umlaute or other non-ISO8859-1-Characters on a  
>> unix terminal. Maybe that's the case for you, too?
>>
>> Otherwise, you could hex-decode the string (treat the numbers as 8- 
>> Bit-ANSI-Characters), that should be a part of your filename.
>
> for me the problem is that some folders / file names contain German  
> Umlaute.
>
> according to Lars I should hex-decode the string. Thus the questions:
>
>    * how do I find out which folders / files I need to hexdecode?
>    * how do I hexdecode them once found?

Take the hex in the error message and turn it into characters.

>>> svn: Valid UTF-8 data
>>> (hex: 4b 61 69 72 6f 73 20 70)

That's all ASCII data (all bytes are less than hex 80) so in any  
character encoding that's "Kairos p"

>>> followed by invalid UTF-8 sequence
>>> (hex: f5 68 69 76)

If we assume the character encoding of these bytes is ISO-8859-1,  
then this is "õhiv".

One solution is to rename the files to have no non-ASCII characters.

What you probably want to do, though, is set the LANG environment  
variable correctly so that svn knows what character encoding to use  
to read the file names.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Problem with non-UTF8

Posted by Ben Schonle <be...@amigos24.net>.
Hi,

for me the problem is that some folders / file names contain German Umlaute.

according to Lars I should hex-decode the string. Thus the questions:

    * how do I find out which folders / files I need to hexdecode?
    * how do I hexdecode them once found?


Thx!
Ben



Lars Grunewaldt wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> this happens mostly (for me), when entering a commit message that 
> contains German Umlaute or other non-ISO8859-1-Characters on a unix 
> terminal. Maybe that's the case for you, too?
>
> Otherwise, you could hex-decode the string (treat the numbers as 
> 8-Bit-ANSI-Characters), that should be a part of your filename.
>
> best regards,
>   Lars
>
> Am 12.03.2008 um 17:04 schrieb Ben Schonle:
>
>> Hello,
>>
>> when importing a folder with mutiple subfolders and files I get the 
>> error:
>>
>> svn: Valid UTF-8 data
>> (hex: 4b 61 69 72 6f 73 20 70)
>> followed by invalid UTF-8 sequence
>> (hex: f5 68 69 76)
>>
>> As I understand it there are some characters in the file names that 
>> are not UTF8. I was thinking not to rename the respective files / 
>> folders, but would need to know their names, locations first.
>>
>> How do you suggest to proceed?
>>
>> Thx,
>> Ben
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: users-help@subversion.tigris.org
>>
>
> - --
> Lars Grunewaldt - Dipl. Inf. (FH)
> * software development
> * multimedia design
> skills: C/C++/Java/Python/PHP/(X)HTML/Flash/audio/video
> web: http://www.dark-reality.de
> mail: lgw@dark-reality.de
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (Darwin)
>
> iD8DBQFH2AYeISCS20rPIYsRAkYOAKC0aMG3qTemZR4FfRtSAt5fNma8FACgkbcj
> QTjkzTE63AFPv+gR5m+34Kk=
> =+K45
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem with non-UTF8

Posted by Lars Grunewaldt <lg...@dark-reality.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

this happens mostly (for me), when entering a commit message that  
contains German Umlaute or other non-ISO8859-1-Characters on a unix  
terminal. Maybe that's the case for you, too?

Otherwise, you could hex-decode the string (treat the numbers as 8-Bit- 
ANSI-Characters), that should be a part of your filename.

best regards,
   Lars

Am 12.03.2008 um 17:04 schrieb Ben Schonle:

> Hello,
>
> when importing a folder with mutiple subfolders and files I get the  
> error:
>
> svn: Valid UTF-8 data
> (hex: 4b 61 69 72 6f 73 20 70)
> followed by invalid UTF-8 sequence
> (hex: f5 68 69 76)
>
> As I understand it there are some characters in the file names that  
> are not UTF8. I was thinking not to rename the respective files /  
> folders, but would need to know their names, locations first.
>
> How do you suggest to proceed?
>
> Thx,
> Ben
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>

- --
Lars Grunewaldt - Dipl. Inf. (FH)
* software development
* multimedia design
skills: C/C++/Java/Python/PHP/(X)HTML/Flash/audio/video
web: http://www.dark-reality.de
mail: lgw@dark-reality.de



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFH2AYeISCS20rPIYsRAkYOAKC0aMG3qTemZR4FfRtSAt5fNma8FACgkbcj
QTjkzTE63AFPv+gR5m+34Kk=
=+K45
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem with non-UTF8

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 12, 2008, at 11:04, Ben Schonle wrote:

> when importing a folder with mutiple subfolders and files I get the  
> error:
>
> svn: Valid UTF-8 data
> (hex: 4b 61 69 72 6f 73 20 70)
> followed by invalid UTF-8 sequence
> (hex: f5 68 69 76)
>
> As I understand it there are some characters in the file names that  
> are not UTF8. I was thinking not to rename the respective files /  
> folders, but would need to know their names, locations first.
>
> How do you suggest to proceed?

You shouldn't need to rename your files. You should be able to commit  
them as they are, provided you've explained to Subversion how to  
translate the character encodings by setting the LANG environment  
variable to an appropriate locale first.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org