You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Jamie <ja...@gmail.com> on 2006/05/04 06:23:31 UTC
character encoding on import
I'm trying to import a project for the first time into subversion,
the project is 2.5gb and has thousands of files. Some of the file
names in the project contain non utf-8 characters. I'm getting errors
when I try to import such as the following:
svn: Valid UTF-8 data
(hex: 72 6f 6d 61 6e 20 26 20 6a)
followed by invalid UTF-8 sequence
(hex: 9a 72 6e 20)
The system is RHEL 4, is there any way to either, convert all
filenames to valid utf-8, or make subversion import without errors?
Its taking a very long time to figure out where the invalid filenames
are as its taking a long time to import such a large amount of data
only to have an error crop up and then have to start all over again
once its been fixed.
Thanx
Jamie
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: character encoding on import
Posted by Kalin KOZHUHAROV <ka...@thinrope.net>.
Jamie wrote:
> I'm trying to import a project for the first time into subversion, the
> project is 2.5gb and has thousands of files. Some of the file names in
> the project contain non utf-8 characters. I'm getting errors when I try
> to import such as the following:
>
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
>
> The system is RHEL 4, is there any way to either, convert all filenames
> to valid utf-8, or make subversion import without errors? Its taking a
> very long time to figure out where the invalid filenames are as its
> taking a long time to import such a large amount of data only to have an
> error crop up and then have to start all over again once its been fixed.
Have you tried convmv [1]?
Do you have your locale set up correctly? What does `locale` give?
One dumb method is to do:
find /your/start/dir >/tmp/list
cat /tmp/list|iconv -t UTF-8
and look at where does it error.
[1] http://j3e.de/linux/convmv/
Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: character encoding on import
Posted by Markus KARG <ma...@quipsy.de>.
Actually I do not know the solution, but just for my own curiosity:
UTF-8 is able to encode ANY UNICODE character (which includes virtually
all characters of all known languages of the world). So what character
is that...?
Jamie schrieb:
> I'm trying to import a project for the first time into subversion,
> the project is 2.5gb and has thousands of files. Some of the file
> names in the project contain non utf-8 characters. I'm getting errors
> when I try to import such as the following:
>
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
>
> The system is RHEL 4, is there any way to either, convert all
> filenames to valid utf-8, or make subversion import without errors?
> Its taking a very long time to figure out where the invalid filenames
> are as its taking a long time to import such a large amount of data
> only to have an error crop up and then have to start all over again
> once its been fixed.
>
> Thanx
> Jamie
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
Re: character encoding on import
Posted by Ryan Schmidt <su...@ryandesign.com>.
On May 4, 2006, at 08:23, Jamie wrote:
> I'm trying to import a project for the first time into subversion,
> the project is 2.5gb and has thousands of files. Some of the file
> names in the project contain non utf-8 characters. I'm getting
> errors when I try to import such as the following:
>
> svn: Valid UTF-8 data
> (hex: 72 6f 6d 61 6e 20 26 20 6a)
> followed by invalid UTF-8 sequence
> (hex: 9a 72 6e 20)
>
> The system is RHEL 4, is there any way to either, convert all
> filenames to valid utf-8, or make subversion import without errors?
> Its taking a very long time to figure out where the invalid
> filenames are as its taking a long time to import such a large
> amount of data only to have an error crop up and then have to start
> all over again once its been fixed.
As far as I know, all you need to do is set the LANG environment
variable to the correct value. See the directory listing of /usr/
share/locale on your system for the possible values you can assign to
this variable.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org