You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by "Ignacio González (Eliop)" <ig...@googlemail.com> on 2012/02/01 09:00:39 UTC

check-mime-type, Windows client, non-ASCII path

Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)
Server: Linux (CentOS), svn 1.6.16 (Spanish)

Repository created OK
Hundreds of revisions already checked-in OK
Hook "check-mime-type" (bash) added in server
A couple of revisions checked-in OK
New file added with non-ASCII characters -> Problem:
Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
(note the u with an acute accent: ú)

C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
Adding         acentos
Adding         acentos\In£til.TXT
Transmitting file data .svn: Commit failed (details follow):
svn: Commit blocked by pre-commit hook (exit code 1) with output:
/opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
`/opt/csvn/bin/sv
nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
acentos/In
?\195?\186til.TXT' failed with this output:
svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist

To help diagnose it, I tried to check out an already existing file with
accents in its name
(checked in before the Hook "check-mime-type" (bash) was added in the
server).
Check out fails.
Oh, my God.

Suggestions ?
-- 
Ignacio Gonzalez T.

Re: check-mime-type, Windows client, non-ASCII path

Posted by Ulrich Eckhardt <ul...@dominolaser.com>.
Am 02.02.2012 08:37, schrieb Ignacio González (Eliop):
> I cannot imagine a sensible way of using svn mv with the files that are
> already checked-in with Windows clients and want to check out from Linux
> clients.
>
> Is there some way to force the Linux server / svn server / svn hooks to
> use the 'Windows'  filename convention, whatever it is?

The paths inside the repository are stored as UTF-8, which provides full 
Unicode capabilities. UTF-8 is often (though not always!) used on Linux 
systems as encoding for paths in the filesystem, so this is not a 
problem. For MS Windows, the filesystem encoding is UTF-16, to which 
UTF-8 can be converted without loss, so this is also not a problem.

To address your question, there is no filesystem convention to adjust 
to, and it also doesn't matter from which system something was checked in.


> Is there some way to reprocess (dump / load or whatever) the whole
> repository in order to make happy both Linux clients and Windows clients
> without changing the non-ASCII characters as seen by both clients?

This should not be necessary, unless you have garbage inside the 
repository, see my earlier response.


> What I was really surprised is to see that the filenames flowing through the
> communication channel is not normalised (well, that is just speculation,
> am I wrong?). Is there a way to force the clients to convert the names
> to same canonical form (e.g., UTF-8) and back to the resident filesystem
> convention again?

This is exactly what is done, what makes you think otherwise? Generally, 
as Stefan suggested, please don't tell us your interpretation of what 
you saw without first telling us what you saw, because that makes it 
impossible to distinguish between a fault in what happened and a fault 
in your interpretation.


Uli
**************************************************************************************
Domino Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
Visit our website at http://www.dominolaser.com
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Domino Laser GmbH ist für diese Folgen nicht verantwortlich.
**************************************************************************************


Re: check-mime-type, Windows client, non-ASCII path

Posted by "Ignacio González (Eliop)" <ig...@googlemail.com>.
Hello, Nico.

El 1 de febrero de 2012 13:17, Nico Kadel-Garcia <nk...@gmail.com>escribió:

2012/2/1 Ignacio González (Eliop) <ig...@googlemail.com>:
> > Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)
> > Server: Linux (CentOS), svn 1.6.16 (Spanish)
> >
> > Repository created OK
> > Hundreds of revisions already checked-in OK
> > Hook "check-mime-type" (bash) added in server
> > A couple of revisions checked-in OK
> > New file added with non-ASCII characters -> Problem:
> > Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
> > (note the u with an acute accent: ú)
> >
> > C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
> > Adding         acentos
> > Adding         acentos\In£til.TXT
> > Transmitting file data .svn: Commit failed (details follow):
> > svn: Commit blocked by pre-commit hook (exit code 1) with output:
> > /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
> > `/opt/csvn/bin/sv
> > nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
> > acentos/In
> > ?\195?\186til.TXT' failed with this output:
> > svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist
> >
> > To help diagnose it, I tried to check out an already existing file with
> > accents in its name
> > (checked in before the Hook "check-mime-type" (bash) was added in the
> > server).
> > Check out fails.
> > Oh, my God.
> >
> > Suggestions ?
>
> First, avoid non-ascii characters in file names.


Yes, I tend to do so, especially with software code.
But, al last, we  convinced our hardware, marketing and field people
to use Subversion as well. Please don't suggest me to tell them to stop
naming the documents they produce without accents and ñ :-)
I can hear them: 'What the heck? Until yesterday I was able to navigate
with my web browser your repository and all the documents had the names I
gave them. You are telling me now that you are re-naming them? Crazy!'


> It causes real
> problems with pre-commit hooks


Yes!


> and post-commit hooks that do not
> properly quote their arguments, and that's not something you can
> handle on the client side. And I've had network file shares used for
> working copies, such as NetApp serviced home directories NFS mounted
> as /home/$USER on the Linux side and CIFS mounted $HOMEDIR on the
> Windows side, where people generated files on the Windows side,
> successfully checked them in, and couldn't successfully read or delete
> the files on the Linux NFS side because the server hadn't been set up
> with UTF compatible NetApp filesystems.
>
> Second, can you check out the *directory* that ocntains the file by
> using the directory name, and letting the client deal with the
> wackiness?


Done it. No joy :-(


> And can you use 'svn mv', or do your operations with a
> TortoiseSVN client, which is really handy for getting files with funny
> names correctly processed?
>

95 % of our uses use TortoiseSVN to check-in and check-out. Many of them
use a browser to surf the repository. In fact, the problems I reported were
found using TortoiseSVN, so the first thing I always do in these cases
is to check the problem with svn command line. In this case,
the observed behaviour was the same.

I cannot imagine a sensible way of using svn mv with the files that are
already
checked-in with Windows clients and want to check out from Linux clients.

Is there some way to force the Linux server / svn server / svn hooks to use
the
'Windows'  filename convention, whatever it is?

Is there some way to reprocess (dump / load or whatever) the whole
repository in order to make happy both Linux clients and Windows clients
without changing the non-ASCII characters as seen by both clients?

What I was really surprised is to see that the filenames flowing through the
communication channel is not normalised (well, that is just speculation,
am I wrong?). Is there a way to force the clients to convert the names
to same canonical form (e.g., UTF-8) and back to the resident filesystem
convention again?

What are doing my Czech, German and Turk friends in this forum?

Re: check-mime-type, Windows client, non-ASCII path

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
2012/2/1 Ignacio González (Eliop) <ig...@googlemail.com>:
> Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)
> Server: Linux (CentOS), svn 1.6.16 (Spanish)
>
> Repository created OK
> Hundreds of revisions already checked-in OK
> Hook "check-mime-type" (bash) added in server
> A couple of revisions checked-in OK
> New file added with non-ASCII characters -> Problem:
> Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
> (note the u with an acute accent: ú)
>
> C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
> Adding         acentos
> Adding         acentos\In£til.TXT
> Transmitting file data .svn: Commit failed (details follow):
> svn: Commit blocked by pre-commit hook (exit code 1) with output:
> /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
> `/opt/csvn/bin/sv
> nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
> acentos/In
> ?\195?\186til.TXT' failed with this output:
> svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist
>
> To help diagnose it, I tried to check out an already existing file with
> accents in its name
> (checked in before the Hook "check-mime-type" (bash) was added in the
> server).
> Check out fails.
> Oh, my God.
>
> Suggestions ?

First, avoid non-ascii characters in file names. It causes real
problems with pre-commit hooks and post-commit hooks that do not
properly quote their arguments, and that's not something you can
handle on the client side. And I've had network file shares used for
working copies, such as NetApp serviced home directories NFS mounted
as /home/$USER on the Linux side and CIFS mounted $HOMEDIR on the
Windows side, where people generated files on the Windows side,
successfully checked them in, and couldn't successfully read or delete
the files on the Linux NFS side because the server hadn't been set up
with UTF compatible NetApp filesystems.

Second, can you check out the *directory* that ocntains the file by
using the directory name, and letting the client deal with the
wackiness? And can you use 'svn mv', or do your operations with a
TortoiseSVN client, which is really handy for getting files with funny
names correctly processed?

Re: check-mime-type, Windows client, non-ASCII path

Posted by Ulrich Eckhardt <ul...@dominolaser.com>.
Am 01.02.2012 09:00, schrieb Ignacio González (Eliop):
> Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)

Just to make sure, are you using a native MS Windows client or are you 
using Cygwin? Also, the clients differ slightly depending on the 
distribution, it would be helpful to have those, too.

> Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
> (note the u with an acute accent: ú)
>
> C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
> Adding         acentos
> Adding         acentos\In£til.TXT

Hmmm. Here something already failed, the accented u changed to a pound 
sign. Or is that just a transmission error, caused by email?


> Transmitting file data .svn: Commit failed (details follow):
> svn: Commit blocked by pre-commit hook (exit code 1) with output:
> /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
> `/opt/csvn/bin/sv
> nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
> acentos/In
> ?\195?\186til.TXT' failed with this output:
> svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist

Just for the record, I guess the ?\195?\186 could be a representation 
derived from the byte values of UTF-8, but I haven't verified that. What 
I'm not 100% sure is whether that is a fault in the hook script and how 
it handles those arguments. It would be interesting to know if this 
works with the hook script active.


> To help diagnose it, I tried to check out an already existing file with
> accents in its name (checked in before the Hook "check-mime-type" (bash)
> was added in the server).
> Check out fails.

That shouldn't happen, no matter what the hook scripts say. What exactly 
is the error? What is the name of the file?

The only reason I could imagine is when you somehow got a path into the 
repository that is invalid UTF-8. While checking out that path, the 
client would then try to transcode the UTF-8 to MS Windows' native 
UTF-16 and fail. I believe some older SVNs relied on the client sending 
well-formed UTF-8, instead of validating it server-side. With the client 
being less than perfect, this could then lead to invalid paths. How old 
is your repository? Can you back it up and run svnadmin verify on it?

That said, I have been using lots of different characters (extended 
Latin, Greek, Chinese, Japanese, Indian) inside a Linux-hosted repo, 
accessed via svnserve by clients on MS Windows XP and 7 without any 
issues. The warning by Nico IMHO only applies if you want to share 
working copies between different systems, which is discouraged anyway, 
but those problems are actually not specific to SVN.


Uli
**************************************************************************************
Domino Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
Visit our website at http://www.dominolaser.com
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Domino Laser GmbH ist für diese Folgen nicht verantwortlich.
**************************************************************************************


Re: check-mime-type, Windows client, non-ASCII path

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Feb 02, 2012 at 08:12:52PM +0100, Ignacio González (Eliop) wrote:
> OK, I'll investigate further.
> Just to summarize, I have a problem and a no-problem:
> 
> Problem: how to use the aforementioned check-mime-type with 'accented' files
> checked-in from Windows clients.

This is exactly what issue 2487 is about (I've already linked to it
but here is the link again for completeness in the archives):
http://subversion.tigris.org/issues/show_bug.cgi?id=2487

You see, whatever locale settings you have on your machine do not
affect mod_dav_svn in any way. They do not ever reach mod_dav_svn
because httpd makes sure all apache modules run in the "C" locale,
which only supports ASCII. This was done in httpd deliberately,
for security reasons.

One workaround discussed in the issue is to use mod_setlocale to set
a locale within httpd, bypassing httpd's default behaviour.
This will then indirectly enable the Subversion libraries called from
mod_dav_svn (and thus, hook scripts) to deal with non-ASCII characters.
This would require you to compile an apache module against the httpd
you use as Subversion server (i.e. you'd have to compile a module
against the version of httpd shipped with Subversion Edge) and then
use this module to configure a locale for mod_dav_svn that supports UTF-8.

The other way to fix this problem is to apply the patch which was
committed to trunk in r1239203. It adds a SVNUseUTF8 option to mod_dav_svn
which can be used to work around this problem by enabling UTF-8 support
in the Subversion libraries independently of locale settings.
We cannot backport this fix to 1.7 because doing so would violate our
release-compatibility guarantees. The 1.8 release will include this fix
but it is still far off.

I'm afraid there is no ready-made solution for Subversion Edge at this time.
Though you could ask CollabNet to include a patch or the mod_setlocale
module in Subversion Edge.

> No-problem: how to check out 'accented' files already in the
> repository with a Linux client. "Solved".

Ok, that's good to know :)

Re: check-mime-type, Windows client, non-ASCII path

Posted by "Ignacio González (Eliop)" <ig...@googlemail.com>.
Hello, Stefan.

El 2 de febrero de 2012 10:33, Stefan Sperling <st...@elego.de> escribió:
> On Wed, Feb 01, 2012 at 09:00:39AM +0100, Ignacio González (Eliop) wrote:
> > Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)
> > Server: Linux (CentOS), svn 1.6.16 (Spanish)
> >
> > Repository created OK
> > Hundreds of revisions already checked-in OK
> > Hook "check-mime-type" (bash) added in server
> > A couple of revisions checked-in OK
> > New file added with non-ASCII characters -> Problem:
> > Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
> > (note the u with an acute accent: ú)
> >
> > C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
> > Adding         acentos
> > Adding         acentos\In£til.TXT
> > Transmitting file data .svn: Commit failed (details follow):
> > svn: Commit blocked by pre-commit hook (exit code 1) with output:
> > /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
> > `/opt/csvn/bin/sv
> > nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
> > acentos/In
> > ?\195?\186til.TXT' failed with this output:
> > svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist
>
> 195 186 in hex is 0xc38a
>
> $ echo 0xc3ba | xxd -r | ExplicateUTF8
> The sequence 0xC3     0xBA
>             11000011 10111010
> is a valid UTF-8 character encoding equivalent to UTF32 0x000000FA.
>
> (ExplicateUTF8 is part of the 'unitools' suit).
>
> Written out as UTF-8 in email, unicode code point 0xfa is the character 'ú'.

Right.

> > To help diagnose it, I tried to check out an already existing file with
> > accents in its name
> > (checked in before the Hook "check-mime-type" (bash) was added in the
> > server).
> > Check out fails.
>
> And how exactly does it fail? What's the error message?
> Does it print the same error message as you get with the hook?
>
> Whenever you write a problem report and you describe parts of the
> problem by "X fails" without showing how X fails, recipients of your
> report can only make wild guesses.

Agree, I forgot to detail this part.

And I should really have been more careful! What I was trying to do is
to checkout the file directly, instead of its parent directory. So:

svn co http://localhost/svn/arenero/pru/úsame.TXT

fails telling me that blah,blah was a file, not a directory, but

svn co http://localhost/svn/arenero/pru/

succeeds.

Stupid, stupid, stupid.

> > Oh, my God.
>
> Don't panic. This is nothing that cannot be fixed.
> You'll just have to figure out where it goes wrong.
>
> You didn't specify what type of server you are running (svnserve or
> mod_dav_svn), so I'm going to guess that you're using mod_dav_svn,
> i.e. an Apache HTTPD server is serving your repositories.

I'm using  httpd / mod_dav_svn, in fact, CollabNet Subversion Edge.

> In that case, issue #2487 might be the problem:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2487
> Though this would not explain a failing checkout, only problems
> in the hook script. Does your hook script set any of the LANG, LC_CTYPE
> or LC_ALL environment variables to some value? (If possible, please just
> show us the entire hook script.)

Locale in this Linux server is:

[csvn@svn tmp]$ locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
[csvn@svn tmp]$


Here's the hook script (note that I have to comment out the line with the
check-mime-type invocation in order to check in new 'accented' files:


[csvn@svn tmp]$ cat /opt/csvn/data/repositories/arenero/hooks/pre-commit
#!/bin/sh

# pre-commit
# PRE-COMMIT HOOK

REPOS="$1"
TXN="$2"

# Make sure that the log message contains some text.
SVNLOOK=/opt/csvn/bin/svnlook
$SVNLOOK log -t "$TXN" "$REPOS" | grep "[a-zA-Z0-9]" > /dev/null
if [ $? -ne 0 ]
then
  echo "*** Debe introducir un texto para ***" > /dev/stderr
  echo "*** describir los cambios realizados ***" > /dev/stderr
  exit 1
fi

# Check that every added file has the svn:mime-type property set
# and every added file with a mime-type matching text/* also has
# svn:eol-style set
#/opt/csvn/data/repositories/telecontrol/hooks/check-mime-type
"$REPOS" "$TXN" || exit 1

# All checks passed, so allow the commit.
exit 0
[csvn@svn tmp]$


And /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type is:


[csvn@svn tmp]$ cat
/opt/csvn/data/repositories/telecontrol/hooks/check-mime-type
#!/usr/bin/env perl

# ====================================================================
# commit-mime-type-check.pl: check that every added file has the
# svn:mime-type property set and every added file with a mime-type
# matching text/* also has svn:eol-style set. If any file fails this
# test the user is sent a verbose error message suggesting solutions and
# the commit is aborted.
#
# Usage: commit-mime-type-check.pl REPOS TXN-NAME
# ====================================================================
# Most of commit-mime-type-check.pl was taken from
# commit-access-control.pl, Revision 9986, 2004-06-14 16:29:22 -0400.
# ====================================================================
# Copyright (c) 2000-2004 CollabNet.  All rights reserved.
#
# This software is licensed as described in the file COPYING, which
# you should have received as part of this distribution.  The terms
# are also available at http://subversion.tigris.org/license.html.
# If newer versions of this license are posted there, you may use a
# newer version instead, at your option.
#
# This software consists of voluntary contributions made by many
# individuals.  For exact contribution history, see the revision
# history and logs, available at http://subversion.tigris.org/.
# ====================================================================

# Turn on warnings the best way depending on the Perl version.
BEGIN {
  if ( $] >= 5.006_000)
    { require warnings; import warnings; }
  else
    { $^W = 1; }
}

use strict;
use Carp;


######################################################################
# Configuration section.

# Svnlook path.
my $svnlook = "/opt/csvn/bin/svnlook";

# Since the path to svnlook depends upon the local installation
# preferences, check that the required program exists to insure that
# the administrator has set up the script properly.
{
  my $ok = 1;
  foreach my $program ($svnlook)
    {
      if (-e $program)
        {
          unless (-x $program)
            {
              warn "$0: required program `$program' is not executable, ",
                   "edit $0.\n";
              $ok = 0;
            }
        }
      else
        {
          warn "$0: required program `$program' does not exist, edit $0.\n";
          $ok = 0;
        }
    }
  exit 1 unless $ok;
}

######################################################################
# Initial setup/command-line handling.

&usage unless @ARGV == 2;

my $repos        = shift;
my $txn          = shift;

unless (-e $repos)
  {
    &usage("$0: repository directory `$repos' does not exist.");
  }
unless (-d $repos)
  {
    &usage("$0: repository directory `$repos' is not a directory.");
  }

# Define two constant subroutines to stand for read-only or read-write
# access to the repository.
sub ACCESS_READ_ONLY  () { 'read-only' }
sub ACCESS_READ_WRITE () { 'read-write' }


######################################################################
# Harvest data using svnlook.

# Change into /tmp so that svnlook diff can create its .svnlook
# directory.
my $tmp_dir = '/tmp';
chdir($tmp_dir)
  or die "$0: cannot chdir `$tmp_dir': $!\n";

# Figure out what files have added using svnlook.
my @files_added;
foreach my $line (&read_from_process($svnlook, 'changed', $repos, '-t', $txn))
  {
                # Add only files that were added to @files_added
    if ($line =~ /^A.  (.*[^\/])$/)
      {
        push(@files_added, $1);
      }
  }

my @errors;
foreach my $path ( @files_added )
        {
                my $mime_type;
                my $eol_style;

                # Parse the complete list of property values of the
file $path to extract
                # the mime-type and eol-style
                foreach my $prop (&read_from_process($svnlook,
'proplist', $repos, '-t',
                                  $txn, '--verbose', $path))
                        {
                                if ($prop =~ /^\s*svn:mime-type : (\S+)/)
                                        {
                                                $mime_type = $1;
                                        }
                                elsif ($prop =~ /^\s*svn:eol-style : (\S+)/)
                                        {
                                                $eol_style = $1;
                                        }
                        }

                # Detect error conditions and add them to @errors
                if (not $mime_type)
                        {
                                push @errors, "$path : svn:mime-type
is not set";
                        }
                elsif ($mime_type =~ /^text\// and not $eol_style)
                        {
                                push @errors, "$path :
svn:mime-type=$mime_type but svn:eol-style is not set";
                        }
        }

# If there are any errors list the problem files and give information
# on how to avoid the problem. Hopefully people will set up auto-props
# and will not see this verbose message more than once.
if (@errors)
  {
    warn "$0:\n\n",
         join("\n", @errors), "\n\n",
                                 <<EOS;

    Every added file must have the svn:mime-type property set. In
    addition text files must have the svn:eol-style property set.

    For binary files try running
    svn propset svn:mime-type application/octet-stream path/of/file

    For text files try
    svn propset svn:mime-type text/plain path/of/file
    svn propset svn:eol-style native path/of/file

    You may want to consider uncommenting the auto-props section
    in your ~/.subversion/config file. Read the Subversion book
    (http://svnbook.red-bean.com/), Chapter 7, Properties section,
    Automatic Property Setting subsection for more help.
EOS
    exit 1;
  }
else
  {
    exit 0;
  }

sub usage
{
  warn "@_\n" if @_;
  die "usage: $0 REPOS TXN-NAME\n";
}

sub safe_read_from_pipe
{
  unless (@_)
    {
      croak "$0: safe_read_from_pipe passed no arguments.\n";
    }
  print "Running @_\n";
  my $pid = open(SAFE_READ, '-|');
  unless (defined $pid)
    {
      die "$0: cannot fork: $!\n";
    }
  unless ($pid)
    {
      open(STDERR, ">&STDOUT")
        or die "$0: cannot dup STDOUT: $!\n";
      exec(@_)
        or die "$0: cannot exec `@_': $!\n";
    }
  my @output;
  while (<SAFE_READ>)
    {
      chomp;
      push(@output, $_);
    }
  close(SAFE_READ);
  my $result = $?;
  my $exit   = $result >> 8;
  my $signal = $result & 127;
  my $cd     = $result & 128 ? "with core dump" : "";
  if ($signal or $cd)
    {
      warn "$0: pipe from `@_' failed $cd: exit=$exit signal=$signal\n";
    }
  if (wantarray)
    {
      return ($result, @output);
    }
  else
    {
      return $result;
    }
}

sub read_from_process
  {
  unless (@_)
    {
      croak "$0: read_from_process passed no arguments.\n";
    }
  my ($status, @output) = &safe_read_from_pipe(@_);
  if ($status)
    {
      if (@output)
        {
          die "$0: `@_' failed with this output:\n", join("\n", @output), "\n";
        }
      else
        {
          die "$0: `@_' failed with no output.\n";
        }
    }
  else
    {
      return @output;
    }
}
[csvn@svn tmp]$





> See the issue link for more information and some workarounds (patches,
> but also an additional apache module you could load).
> A fix has just recently been committed but it is for 1.8. We cannot
> backport it to 1.7 because it requires API changes.

I will give it a try when I understand it :-) I hope to find some free
time soon.

> The character ú is a character which has a diacritic so another
> possible explanation is a problem with NFC/NFD normalisation.
> See http://subversion.tigris.org/issues/show_bug.cgi?id=2464
> This usually happens when MacOS X clients are involved. But in theory any
> Windows or Linux client could cause the same problem depening on how
> tools used on the client machine normalise UTF-8.

Ditto, I'll give it a try.

> Can you check if either of these apply?
> If not, we'll need to dig further.

OK, I'll investigate further.
Just to summarize, I have a problem and a no-problem:

Problem: how to use the aforementioned check-mime-type with 'accented' files
checked-in from Windows clients.

No-problem: how to check out 'accented' files already in the
repository with a Linux client. "Solved".

Re: check-mime-type, Windows client, non-ASCII path

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 01, 2012 at 09:00:39AM +0100, Ignacio González (Eliop) wrote:
> Clients: Windows-XP, Windows 7, svn 1.6.16 (Spanish)
> Server: Linux (CentOS), svn 1.6.16 (Spanish)
> 
> Repository created OK
> Hundreds of revisions already checked-in OK
> Hook "check-mime-type" (bash) added in server
> A couple of revisions checked-in OK
> New file added with non-ASCII characters -> Problem:
> Path name (in Windows, client): C:\Usuarios\arenero\Inútil.TXT
> (note the u with an acute accent: ú)
> 
> C:\Usuarios\arenero>svn ci acentos -m "Prueba 1"
> Adding         acentos
> Adding         acentos\In£til.TXT
> Transmitting file data .svn: Commit failed (details follow):
> svn: Commit blocked by pre-commit hook (exit code 1) with output:
> /opt/csvn/data/repositories/telecontrol/hooks/check-mime-type:
> `/opt/csvn/bin/sv
> nlook proplist /opt/csvn/data/repositories/arenero -t 44-1e --verbose
> acentos/In
> ?\195?\186til.TXT' failed with this output:
> svnlook: Path 'acentos/In?\195?\186til.TXT' does not exist

195 186 in hex is 0xc38a

$ echo 0xc3ba | xxd -r | ExplicateUTF8
The sequence 0xC3     0xBA     
             11000011 10111010 
is a valid UTF-8 character encoding equivalent to UTF32 0x000000FA.

(ExplicateUTF8 is part of the 'unitools' suit).

Written out as UTF-8 in email, unicode code point 0xfa is the character 'ú'.

> To help diagnose it, I tried to check out an already existing file with
> accents in its name
> (checked in before the Hook "check-mime-type" (bash) was added in the
> server).
> Check out fails.

And how exactly does it fail? What's the error message?
Does it print the same error message as you get with the hook?

Whenever you write a problem report and you describe parts of the
problem by "X fails" without showing how X fails, recipients of your
report can only make wild guesses.

> Oh, my God.

Don't panic. This is nothing that cannot be fixed.
You'll just have to figure out where it goes wrong.

You didn't specify what type of server you are running (svnserve or
mod_dav_svn), so I'm going to guess that you're using mod_dav_svn,
i.e. an Apache HTTPD server is serving your repositories.
In that case, issue #2487 might be the problem:
http://subversion.tigris.org/issues/show_bug.cgi?id=2487
Though this would not explain a failing checkout, only problems
in the hook script. Does your hook script set any of the LANG, LC_CTYPE
or LC_ALL environment variables to some value? (If possible, please just
show us the entire hook script.)
See the issue link for more information and some workarounds (patches,
but also an additional apache module you could load).
A fix has just recently been committed but it is for 1.8. We cannot
backport it to 1.7 because it requires API changes.

The character ú is a character which has a diacritic so another
possible explanation is a problem with NFC/NFD normalisation.
See http://subversion.tigris.org/issues/show_bug.cgi?id=2464
This usually happens when MacOS X clients are involved. But in theory any
Windows or Linux client could cause the same problem depening on how
tools used on the client machine normalise UTF-8.

Can you check if either of these apply?
If not, we'll need to dig further.