You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Vincent Lefevre <vi...@vinc17.net> on 2016/02/29 16:24:00 UTC

Unversioned files with invalid UTF-8 sequence in name confuse svn

With:

svn, version 1.9.3 (r1718519)
   compiled Jan 16 2016, 04:46:46 on x86_64-pc-linux-gnu

I have a working copy where "make check" has created files whose
name contain invalid UTF-8 sequences. The consequence is that
such files confuse svn:

$ =svn st
svn: E000022: Error converting entry in directory '/home/vlefevre/software/mpfr-3.1/tests' to UTF-8
svn: E000022: Valid UTF-8 data
(hex: 04 10 40 04 04 04 02 01 46 04 40)
followed by invalid UTF-8 sequence
(hex: c0 2e 69 64)
zsh: exit 1     =svn st

I think that a fatal error is a bug, i.e. "svn st" should just report
that these files are unversioned. The requirement on the validity of
filenames should just apply to versioned files or files to be versioned.

These files can't even be removed with svn-clean, which reports the
same error.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Philip Martin <ph...@wandisco.com>.

Stefan Sperling <st...@elego.de> writes:

> I agree this is a problem. 'svn cleanup --remove-unversioned' should
> remove such files, but it won't in the current implementation.

Subversion works in non-UTF8 locales.  If one normally uses Subversion
in a non-UTF8 locale then non-UTF8 paths on disk can represent versioned
files.  If one were to accidentally invoke Subversion with a UTF8 locale
any such files would trigger the invalid UTF8 problem.  When the client
encounters this problem there are two interpretations: the file could be
unversioned or the user could be using the wrong locale.  If we choose
to treat such files as unversioned then we would be deleting what might
be a versioned file with modifications.

-- 
Philip Martin
WANdisco

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Feb 29, 2016 at 07:30:14PM +0100, Vincent Lefevre wrote:
> For "svn st", I do not try to access the file. A file with an invalid
> name cannot be a versioned file anyway. So, it could also just be
> ignored, and outputting a non-fatal warning would be sufficient, IMHO.
> Note that even "svn st -q" fails!

All parts of Subversion handle paths (or any string, really) as UTF-8.
The downside is that invalid UTF-8 leads to problems like you're seeing.
The upside is that things tend to work just fine for any encoding, as long
as encodings involved are valid and configured as intended.

> Concerning svn-clean, I think that instead of failing, svn-clean
> should fallback to some alternate way. After all, a part of its code
> does not use the internal filename representation.

Fixing svn-clean to solve this problem is probably your easiest way out.
'svn cleanup --remove-unversioned' would be the built-in equivalent but
it performs a status walk internally so it won't help you :-/

svn-clean is in the contrib directory, which means it's not officially
maintained by the Subversion project itself. Please contact the author
or provide a patch.

> The problem is that it is too easy to create files with a name using
> invalid UTF-8 sequences (in my case, it seems just to be due to a bug
> in Automake or Libtool). But the user should not be required to find
> them and delete manually.

I agree this is a problem. 'svn cleanup --remove-unversioned' should
remove such files, but it won't in the current implementation.

But I doubt it's possible to solve with our current set of APIs without
breaking API guarantees Subversion provides. If you believe otherwise,
please try to write a patch to solve this and see for yourself. I imagine
that we'd quickly find ourselves hitting a barrier in form of a public API
promise that can't be broken. Perhaps we could add a special API just
for this use case, though. Would that be worth the effort?

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Vincent Lefevre <vi...@vinc17.net>.

On 2016-03-01 17:12:05 +0100, Branko Čibej wrote:
> On 01.03.2016 16:58, Markus Schaber wrote:
> > Hi, Bert,
> >
> > From: Bert Huijben [mailto:bert@qqmail.nl]
> >> From: Markus Schaber [mailto:m.schaber@codesys.com]
> >>> Hi, Brane and Vincent,
> >>>
> >>> From: Branko Čibej [mailto:brane@apache.org]
> >>>>> However Subversion doesn't handle that (BTW it would be much
> >>>>> better to
> >>>>> remember the expected locale by storing it in the .svn directory
> >         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> >>>>> rather than giving obscure error messages: if it did, Subversion
> >>>>> would know that the user was using an incorrect locale without any
> >>>>> ambiguity).
> >>>> And if the user changes the locale for valid reasons, the Subversion
> >>>> working copy would break in a different way.
> >>> I guess we would need some "change locale" operation, which would at
> >>> least update the saved locale in the .svn directory.
> >> There is no saved locale in the .svn directory...
> > Currently, yes, but it was suggested in the discussion, see the line above.
> 
> The major problem with the "saved locale" idea is that it creates yet
> another potential discrepancy. I really can't imagine how we'd be doing
> the majority of our users a service by adding another knob that can
> seriously break things when it's misconfigured, but doesn't do anything
> useful most of the time.

I don't understand what you mean here. As you said, Subversion
expects the user to always use the same locale with a given working
copy.

Currently:

  * If the user always uses the same locale, then everything is fine
    most of the time. But if some tool writes filenames that cannot
    be interpreted in this locale (this was the beginning of this
    thread), the working copy gets unusable until the user removes
    these files manually (which can be tedious).

  * If the user changes his locale, then the working copy is in an
    inconsistent state; it is either unusable, or usable but with
    incorrect information, which could lead to incorrect commits.

If the locale were recorded in .svn:

  * If the user always uses the same locale, then everything would be
    fine all the time. If some tool writes filenames that cannot be
    interpreted in this locale, then svn would know that it is not
    a problem due to the change of the locale, i.e. it would know that
    the file is necessarily unversioned. So, "svn st" would no longer
    have any reason to fail, and svn-clean could work as expected
    without this risk of being wrong.

  * If the user changes his locale, then svn would be able to emit
    a clear error message about the locale mismatch. Then the user
    could easily know what was wrong and change back to the previous
    locale (the one recorded in .svn).

So, this would be a big improvement.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Branko Čibej <br...@apache.org>.

On 01.03.2016 16:58, Markus Schaber wrote:
> Hi, Bert,
>
> From: Bert Huijben [mailto:bert@qqmail.nl]
>> From: Markus Schaber [mailto:m.schaber@codesys.com]
>>> Hi, Brane and Vincent,
>>>
>>> From: Branko Čibej [mailto:brane@apache.org]
>>>>> However Subversion doesn't handle that (BTW it would be much
>>>>> better to
>>>>> remember the expected locale by storing it in the .svn directory
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>>>>> rather than giving obscure error messages: if it did, Subversion
>>>>> would know that the user was using an incorrect locale without any
>>>>> ambiguity).
>>>> And if the user changes the locale for valid reasons, the Subversion
>>>> working copy would break in a different way.
>>> I guess we would need some "change locale" operation, which would at
>>> least update the saved locale in the .svn directory.
>> There is no saved locale in the .svn directory...
> Currently, yes, but it was suggested in the discussion, see the line above.

The major problem with the "saved locale" idea is that it creates yet
another potential discrepancy. I really can't imagine how we'd be doing
the majority of our users a service by adding another knob that can
seriously break things when it's misconfigured, but doesn't do anything
useful most of the time.

-- Brane

RE: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Markus Schaber <m....@codesys.com>.

Hi, Bert,

From: Bert Huijben [mailto:bert@qqmail.nl]
> From: Markus Schaber [mailto:m.schaber@codesys.com]
> > Hi, Brane and Vincent,
> >
> > From: Branko Čibej [mailto:brane@apache.org]

> > > > However Subversion doesn't handle that (BTW it would be much
> > > > better to

> > > > remember the expected locale by storing it in the .svn directory
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > > > rather than giving obscure error messages: if it did, Subversion
> > > > would know that the user was using an incorrect locale without any
> > > > ambiguity).
> > >
> > > And if the user changes the locale for valid reasons, the Subversion
> > > working copy would break in a different way.
> >
> > I guess we would need some "change locale" operation, which would at
> > least update the saved locale in the .svn directory.
> 
> There is no saved locale in the .svn directory...

Currently, yes, but it was suggested in the discussion, see the line above.

Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.

RE: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Markus Schaber [mailto:m.schaber@codesys.com]
> Sent: dinsdag 1 maart 2016 15:07
> To: dev@subversion.apache.org
> Cc: Vincent Lefevre <vi...@vinc17.net>
> Subject: RE: Unversioned files with invalid UTF-8 sequence in name confuse
> svn
> 
> Hi, Brane and Vincent,
> 
> From: Branko Čibej [mailto:brane@apache.org]
> > >> A fairly plausible cause for getting the wrong representation is
> > >> changing the locale for the duration of a script invocation. Another
> > >> plausible way is to create files based on the contents of some
> > >> script, which are not encoded the as expected by the current locale.
> > > However Subversion doesn't handle that (BTW it would be much better
> to
> > > remember the expected locale by storing it in the .svn directory
> > > rather than giving obscure error messages: if it did, Subversion would
> > > know that the user was using an incorrect locale without any
> > > ambiguity).
> >
> > And if the user changes the locale for valid reasons, the Subversion
> > working copy would break in a different way.
> 
> I guess we would need some "change locale" operation, which would at least
> update the saved locale in the .svn directory.

There is no saved locale in the .svn directory...

	Bert

RE: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Markus Schaber <m....@codesys.com>.

Hi, Brane and Vincent,

From: Branko Čibej [mailto:brane@apache.org]
> >> A fairly plausible cause for getting the wrong representation is
> >> changing the locale for the duration of a script invocation. Another
> >> plausible way is to create files based on the contents of some
> >> script, which are not encoded the as expected by the current locale.
> > However Subversion doesn't handle that (BTW it would be much better to
> > remember the expected locale by storing it in the .svn directory
> > rather than giving obscure error messages: if it did, Subversion would
> > know that the user was using an incorrect locale without any
> > ambiguity).
> 
> And if the user changes the locale for valid reasons, the Subversion
> working copy would break in a different way.

I guess we would need some "change locale" operation, which would at least update the saved locale in the .svn directory.

(Updating the actual on-disk filenames could be left to the tools the user uses to also update his other filenames...)

> > Currently you can't avoid the problem: if the user has used UTF-8 then
> > runs Subversion under ISO-8859-1 locales, the "misconfiguration"
> > is not detected, and "svn up" can yield corrupt a working copy as
> > shown in the past. Subversion should remember the locale that was used
> > initially to avoid such a problem.
> 
> Well? This issue isn't limited to Subversion; most applications with fail
> at some point once you start playing games with the locale and/or filename
> encoding. That's why both Windows and OS X mandate one of the Unicode
> representations for filenames.

Python actually adopted a workaround to this problem called "surrogate escaping".
https://www.python.org/dev/peps/pep-0383/

This mechanism is applied to filenames and similar "byte strings" during communication with the outer world, with the limitation that their purpose is just to transfer the contents of the 8 bit string from one OS interface to the other, with only limited interpretation or processing of them.

Basically, they encapsulate invalid bytes (which cannot be successfully transformed to the internal Unicode representation) to a lonely surrogate, and decode it back to the original byte on the output side.

A solution like this could help SVN to deal with miscoded filenames, and would allow e. G. an "svn rm" or "svn mv" etc.

When adopting such a solution, it should be strictly restricted to local filenames (the RA layers should refuse them), and I guess we could get away with not even allowing them to enter the local working copy database.

For screen output, we could translate them to escape sequences like \x1A, so "svn status" could work...

However, I'm not sure whether it's worth the work to support basically broken environments, but on the other hand, the Python guys did go that way.

> You might as well say that Unix (Linux) is broken and should be fixed (with
> which I'd heartily agree, but that's water under the bridge).

All recent Linux installations I saw had UTF-8 as their encoding (independent of the language / country settings actually in use). And I don't see any valid reason to use anything else nowadays, except for keeping compatibility with existing installations...


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Branko Čibej <br...@apache.org>.

On 29.02.2016 20:45, Vincent Lefevre wrote:
>>> The problem is that it is too easy to create files with a name using
>>> invalid UTF-8 sequences
>> File names on disk DO NOT have to be represented in UTF-8. They do have
>> to be represented in consistently with the current locale settings.
> which must in practice be UTF-8. Otherwise one gets failures sooner
> or later.

I'm just going to say "nonsense" to that without much further
discussion. The encoding "must" be consistent, but by no means must it
be UTF-8.

>> A fairly plausible cause for getting the wrong representation is
>> changing the locale for the duration of a script invocation. Another
>> plausible way is to create files based on the contents of some script,
>> which are not encoded the as expected by the current locale.
> However Subversion doesn't handle that (BTW it would be much better
> to remember the expected locale by storing it in the .svn directory
> rather than giving obscure error messages: if it did, Subversion
> would know that the user was using an incorrect locale without any
> ambiguity).

And if the user changes the locale for valid reasons, the Subversion
working copy would break in a different way.

> Currently you can't avoid the problem: if the user has used UTF-8
> then runs Subversion under ISO-8859-1 locales, the "misconfiguration"
> is not detected, and "svn up" can yield corrupt a working copy as
> shown in the past. Subversion should remember the locale that was
> used initially to avoid such a problem.

Well? This issue isn't limited to Subversion; most applications with
fail at some point once you start playing games with the locale and/or
filename encoding. That's why both Windows and OS X mandate one of the
Unicode representations for filenames.

You might as well say that Unix (Linux) is broken and should be fixed
(with which I'd heartily agree, but that's water under the bridge).

>> I'd really, really strongly suggest not to make such a thing the
>> default in Subversion.
> Then fix Subversion.

Patches welcome.

-- Brane

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Vincent Lefevre <vi...@vinc17.net>.

On 2016-02-29 19:57:04 +0100, Branko Čibej wrote:
> On 29.02.2016 19:30, Vincent Lefevre wrote:
> > On 2016-02-29 17:00:01 +0100, Bert Huijben wrote:
> >> The problem is most likely not that they have an invalid utf-8 sequence in
> >> their name, but that your settings report that filenames are encoded in one
> >> way, while there is a file which name can't be expressed by that format.
> >>
> >> You get this error when Subversion isn't able to convert the filename to its
> >> internal utf-8 format, which should be capable to express any valid
> >> filename. (If you declare that all filenames are utf-8, there wouldn't be a
> >> conversion, so in most cases not an error)
> >>
> >> To just handle it as unversioned as you suggest we need to at least be able
> >> to express its name.
> > There are two ways to express a filename:
> >   1. The only from the OS (e.g., in POSIX, this is just a sequence
> >      of bytes).
> 
> This isn't entirely correct. It's true as far as most (but certainly not
> all) filesystem implementations are concerned; but applications expect
> to interpret those bytes in the context of the active locale.

Not all applications. Most command-line utilities run fine without
having to interpret those bytes (which is very useful, in particular
for "rm"). The point is that they do not need to interpret them for
what they are required to do.

> >   2. The one used by Subversion internally.
> >
> > (2) is necessary for versioned files, but for unversioned files,
> > you do not need to do the (1) -> (2) conversion.
> 
> Sure you do. How else are you going to know that the file is
> unversioned? (The working copy database stores paths encoded as UTF-8.)

Well, you need to do the (1) -> (2) conversion only to test whether
the file is versioned or not. But if the (1) -> (2) conversion fails,
this means that the file is unversioned.

> > The problem is that it is too easy to create files with a name using
> > invalid UTF-8 sequences
> 
> File names on disk DO NOT have to be represented in UTF-8. They do have
> to be represented in consistently with the current locale settings.

which must in practice be UTF-8. Otherwise one gets failures sooner
or later.

> A fairly plausible cause for getting the wrong representation is
> changing the locale for the duration of a script invocation. Another
> plausible way is to create files based on the contents of some script,
> which are not encoded the as expected by the current locale.

However Subversion doesn't handle that (BTW it would be much better
to remember the expected locale by storing it in the .svn directory
rather than giving obscure error messages: if it did, Subversion
would know that the user was using an incorrect locale without any
ambiguity).

> > (in my case, it seems just to be due to a bug in Automake or Libtool).
> 
> Or the way you're using them, perhaps?

I've eventually found that this is a bug in dash, which reexecutes
a command for a foreign architecture as a shell script instead of
giving an exec format error like the other shells.

> > But the user should not be required to find them and delete manually.
> 
> It's also too easy to ignore (or delete) files because someone managed
> to misconfigure their locale.

Currently you can't avoid the problem: if the user has used UTF-8
then runs Subversion under ISO-8859-1 locales, the "misconfiguration"
is not detected, and "svn up" can yield corrupt a working copy as
shown in the past. Subversion should remember the locale that was
used initially to avoid such a problem.

> I'd really, really strongly suggest not to make such a thing the
> default in Subversion.

Then fix Subversion.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Branko Čibej <br...@apache.org>.

On 29.02.2016 19:30, Vincent Lefevre wrote:
> On 2016-02-29 17:00:01 +0100, Bert Huijben wrote:
>> The problem is most likely not that they have an invalid utf-8 sequence in
>> their name, but that your settings report that filenames are encoded in one
>> way, while there is a file which name can't be expressed by that format.
>>
>> You get this error when Subversion isn't able to convert the filename to its
>> internal utf-8 format, which should be capable to express any valid
>> filename. (If you declare that all filenames are utf-8, there wouldn't be a
>> conversion, so in most cases not an error)
>>
>> To just handle it as unversioned as you suggest we need to at least be able
>> to express its name.
> There are two ways to express a filename:
>   1. The only from the OS (e.g., in POSIX, this is just a sequence
>      of bytes).

This isn't entirely correct. It's true as far as most (but certainly not
all) filesystem implementations are concerned; but applications expect
to interpret those bytes in the context of the active locale.

>   2. The one used by Subversion internally.
>
> (2) is necessary for versioned files, but for unversioned files,
> you do not need to do the (1) -> (2) conversion.

Sure you do. How else are you going to know that the file is
unversioned? (The working copy database stores paths encoded as UTF-8.)

...

> The problem is that it is too easy to create files with a name using
> invalid UTF-8 sequences

File names on disk DO NOT have to be represented in UTF-8. They do have
to be represented in consistently with the current locale settings.

A fairly plausible cause for getting the wrong representation is
changing the locale for the duration of a script invocation. Another
plausible way is to create files based on the contents of some script,
which are not encoded the as expected by the current locale.

> (in my case, it seems just to be due to a bug in Automake or Libtool).

Or the way you're using them, perhaps?

> But the user should not be required to find them and delete manually.

It's also too easy to ignore (or delete) files because someone managed
to misconfigure their locale. I'd really, really strongly suggest not to
make such a thing the default in Subversion.

-- Brane

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Vincent Lefevre <vi...@vinc17.net>.

On 2016-02-29 17:00:01 +0100, Bert Huijben wrote:
> The problem is most likely not that they have an invalid utf-8 sequence in
> their name, but that your settings report that filenames are encoded in one
> way, while there is a file which name can't be expressed by that format.
> 
> You get this error when Subversion isn't able to convert the filename to its
> internal utf-8 format, which should be capable to express any valid
> filename. (If you declare that all filenames are utf-8, there wouldn't be a
> conversion, so in most cases not an error)
> 
> To just handle it as unversioned as you suggest we need to at least be able
> to express its name.

There are two ways to express a filename:
  1. The only from the OS (e.g., in POSIX, this is just a sequence
     of bytes).
  2. The one used by Subversion internally.

(2) is necessary for versioned files, but for unversioned files,
you do not need to do the (1) -> (2) conversion.

> As you found out cleanup is not going to help here... we just can't access
> this file (or directory, or symlink), so we can't delete it or anything to
> help you.

For "svn st", I do not try to access the file. A file with an invalid
name cannot be a versioned file anyway. So, it could also just be
ignored, and outputting a non-fatal warning would be sufficient, IMHO.
Note that even "svn st -q" fails!

Concerning svn-clean, I think that instead of failing, svn-clean
should fallback to some alternate way. After all, a part of its code
does not use the internal filename representation.

The problem is that it is too easy to create files with a name using
invalid UTF-8 sequences (in my case, it seems just to be due to a bug
in Automake or Libtool). But the user should not be required to find
them and delete manually.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

RE: Unversioned files with invalid UTF-8 sequence in name confuse svn

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Vincent Lefevre [mailto:vincent-svn@vinc17.net]
> Sent: maandag 29 februari 2016 16:24
> To: dev@subversion.apache.org
> Subject: Unversioned files with invalid UTF-8 sequence in name confuse svn
> 
> With:
> 
> svn, version 1.9.3 (r1718519)
>    compiled Jan 16 2016, 04:46:46 on x86_64-pc-linux-gnu
> 
> I have a working copy where "make check" has created files whose
> name contain invalid UTF-8 sequences. The consequence is that
> such files confuse svn:
> 
> $ =svn st
> svn: E000022: Error converting entry in directory
> '/home/vlefevre/software/mpfr-3.1/tests' to UTF-8
> svn: E000022: Valid UTF-8 data
> (hex: 04 10 40 04 04 04 02 01 46 04 40)
> followed by invalid UTF-8 sequence
> (hex: c0 2e 69 64)
> zsh: exit 1     =svn st
> 
> I think that a fatal error is a bug, i.e. "svn st" should just report
> that these files are unversioned. The requirement on the validity of
> filenames should just apply to versioned files or files to be versioned.
> 
> These files can't even be removed with svn-clean, which reports the
> same error.

The problem is most likely not that they have an invalid utf-8 sequence in
their name, but that your settings report that filenames are encoded in one
way, while there is a file which name can't be expressed by that format.

You get this error when Subversion isn't able to convert the filename to its
internal utf-8 format, which should be capable to express any valid
filename. (If you declare that all filenames are utf-8, there wouldn't be a
conversion, so in most cases not an error)

To just handle it as unversioned as you suggest we need to at least be able
to express its name.

As you found out cleanup is not going to help here... we just can't access
this file (or directory, or symlink), so we can't delete it or anything to
help you.

	Bert