You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Jamie <ja...@gmail.com> on 2006/05/04 06:23:31 UTC

character encoding on import

I'm trying to import a project for the first time into subversion,  
the project is 2.5gb and has thousands of files. Some of the file  
names in the project contain non utf-8 characters. I'm getting errors  
when I try to import such as the following:

svn: Valid UTF-8 data
(hex: 72 6f 6d 61 6e 20 26 20 6a)
followed by invalid UTF-8 sequence
(hex: 9a 72 6e 20)

The system is RHEL 4, is there any way to either, convert all  
filenames to valid utf-8, or make subversion import without errors?  
Its taking a very long time to figure out where the invalid filenames  
are as its taking a long time to import such a large amount of data  
only to have an error crop up and then have to start all over again  
once its been fixed.

Thanx
Jamie

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: character encoding on import

Posted by Kalin KOZHUHAROV <ka...@thinrope.net>.
Jamie wrote:
> I'm trying to import a project for the first time into subversion, the 
> project is 2.5gb and has thousands of files. Some of the file names in 
> the project contain non utf-8 characters. I'm getting errors when I try 
> to import such as the following:
> 
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
> 
> The system is RHEL 4, is there any way to either, convert all filenames 
> to valid utf-8, or make subversion import without errors? Its taking a 
> very long time to figure out where the invalid filenames are as its 
> taking a long time to import such a large amount of data only to have an 
> error crop up and then have to start all over again once its been fixed.

Have you tried convmv [1]?

Do you have your locale set up correctly? What does `locale` give?

One dumb method is to do:
 find /your/start/dir >/tmp/list
 cat /tmp/list|iconv -t UTF-8
and look at where does it error.

[1]	http://j3e.de/linux/convmv/

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: character encoding on import

Posted by Markus KARG <ma...@quipsy.de>.
Actually I do not know the solution, but just for my own curiosity: 
UTF-8 is able to encode ANY UNICODE character (which includes virtually 
all characters of all known languages of the world). So what character 
is that...?

Jamie schrieb:

> I'm trying to import a project for the first time into subversion,  
> the project is 2.5gb and has thousands of files. Some of the file  
> names in the project contain non utf-8 characters. I'm getting errors  
> when I try to import such as the following:
>
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
>
> The system is RHEL 4, is there any way to either, convert all  
> filenames to valid utf-8, or make subversion import without errors?  
> Its taking a very long time to figure out where the invalid filenames  
> are as its taking a long time to import such a large amount of data  
> only to have an error crop up and then have to start all over again  
> once its been fixed.
>
> Thanx
> Jamie
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>


Re: character encoding on import

Posted by Ryan Schmidt <su...@ryandesign.com>.
On May 4, 2006, at 08:23, Jamie wrote:

> I'm trying to import a project for the first time into subversion,  
> the project is 2.5gb and has thousands of files. Some of the file  
> names in the project contain non utf-8 characters. I'm getting  
> errors when I try to import such as the following:
>
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
>
> The system is RHEL 4, is there any way to either, convert all  
> filenames to valid utf-8, or make subversion import without errors?  
> Its taking a very long time to figure out where the invalid  
> filenames are as its taking a long time to import such a large  
> amount of data only to have an error crop up and then have to start  
> all over again once its been fixed.

As far as I know, all you need to do is set the LANG environment  
variable to the correct value. See the directory listing of /usr/ 
share/locale on your system for the possible values you can assign to  
this variable.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org