You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by "B. Smith-Mannschott" <be...@gmail.com> on 2006/05/22 10:02:22 UTC

mangled ?\nnn encoding; os.popen; only in hook script

I've got an SVN repository containing files whose names contain
non-ascii characters. For purposes of discussion, we'll consider just
one such character:

"ü" - UNICODE (0x00FC = 252 = latin small letter u with diaresis)
When encoded as UTF-8 this produces two bytes: hex: 0xC3, 0xBC;
decimal: 195, 188.

When I perform svnlook -r 76 changed /production/subversion/hooktest
from the command line, the names of the files are returned correctly
as UTF-8 and displayed correctly in my terminal (which is set to
UTF-8).

In an interactive python session /usr/bin/python ('2.3.4 (#1, Mar 20
2006, 00:23:47) \n[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)]') also
behaves has expected.  The leading 0x00FC in the first two file names
is returned as UTF from the svnlook subprocess and echoed legibly to
the terminal.

>>> print os.popen("svnlook -r 76 changed
/production/subversion/hooktest").read()
_U  übermittlung.txt
_U  übermittlung2
_U  FOO
_U  scratch/

When, however, I run the *same* code in a hook script in the *same*
version of python, my previously nice UTF-8 bytes come back as what
I'll call "shellmanged" (a chunk out of the pre-commit script's log
file):

:cmd:
svnlook -t 76-1 changed /production/subversion/hooktest
:out:
_U  ?\195?\188bermittlung.txt
_U  ?\195?\188bermittlung2
_U  FOO
_U  scratch/

Frist, this is not some strange notation of my editor for characters
it can not display. The two bytes of the UTF-8 character are really
gettting expanded to ten bytes \ ? 1 9 5 ? \ 1 8 8. Secondly, what I'm
seeing are the printed decimal versions of the UTF-8 bytes I want.

**Questions
why is there this difference?
has anyone encountered something comparable when writing their hook scripts?
**

Now, I've written something (unshellmangle), which undoes the damage
and gives me back a nice UTF-8 string. Unfortunately, I need to use
these very file names in another call to svnlook (to check for the
presence of a property on *.txt files.)

Now, I know that I can make this work from the interactive shell:

print os.popen("svnlook -r 76 proplist /production/subversion/hooktest
übermittlung.txt").read()
  foo
  svn:eol-style

But, when I try to do so from the hook script...

(1) using a UTF-8 string (my source file's encoding is set to UTF-8)

    cmd = r"svnlook -t %s proplist %s übermittlung.txt" %(TRANSACTION,
REPOSITORY)
    log("cmd", cmd)
    log("out", os.popen(cmd).read())
    log("sys.getdefaultencoding()", sys.getdefaultencoding())

svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svn: Can't convert string from native encoding to 'UTF-8':
svn: ?\195?\188bermittlung.txt

(2) trying to immitate the ?\ notation I'm getting back (but that
obviously doesn't work).

    cmd = r"svnlook -t %s proplist %s ?\195?\188bermittlung.txt"
%(TRANSACTION, REPOSITORY)
    log("cmd", cmd)
    log("out", os.popen(cmd).read())
    log("sys.getdefaultencoding()", sys.getdefaultencoding())

svn: Commit failed (details follow):
svn: 'pre-commit' hook failed with error output:
svnlook: Path '?195?188bermittlung.txt' does not exist

**Question
How can I pass the name of a file containing a non-ascii character to
svnlook from a running hook script?
**

Thankful for any help
// Ben

Re: mangled ?\nnn encoding; os.popen; only in hook script

Posted by "B. Smith-Mannschott" <be...@gmail.com>.
On 5/22/06, Ryan Schmidt <su...@ryandesign.com> wrote:
> On May 22, 2006, at 12:02, B. Smith-Mannschott wrote:
>
> [snip]
>
> > _U  ?\195?\188bermittlung.txt
> > _U  ?\195?\188bermittlung2
> > _U  FOO
>
> [snip]
>
> I believe the solution is to set the LANG environment variable in
> your script? In a shell script that would be:
>
> export LANG=en_US.utf8
>
> or whatever charset you're using. As far as I know, it doesn't matter
> if you use a UTF-8 locale or not, it just matters that you set *some*
> locale so that Subversion knows what you're using so it can convert
> from that to UTF-8.
>

Thanks! This did the trick for me:

    os.environ["LANG"]="en_US.UTF-8"

// ben

Re: mangled ?\nnn encoding; os.popen; only in hook script

Posted by Ryan Schmidt <su...@ryandesign.com>.
On May 22, 2006, at 12:02, B. Smith-Mannschott wrote:

[snip]

> _U  ?\195?\188bermittlung.txt
> _U  ?\195?\188bermittlung2
> _U  FOO

[snip]

I believe the solution is to set the LANG environment variable in  
your script? In a shell script that would be:

export LANG=en_US.utf8

or whatever charset you're using. As far as I know, it doesn't matter  
if you use a UTF-8 locale or not, it just matters that you set *some*  
locale so that Subversion knows what you're using so it can convert  
from that to UTF-8.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org