You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Greg Ames <am...@gmail.com> on 2010/05/11 23:36:48 UTC

[PATCH] readable error messages on non-ASCII systems

I am attempting to get the subversion client running on z/OS.  I am starting
with svn 1.5.6 because I don't want to port SQLite first, at least not until
I have a working "svn co".  autogen.sh, configure, and make run to
completion after some build hacks.  Then make install dies with:

subversion/svnversion/svnversion . /repos/svn/trunk >
/u/USER193/svn/cvs-1.5.6/subversion-1.5.6/built/include/subversion-1/svn-revision.txt
?\162?\165?\149: ?\229?\129?\147?\137?\132 ?\228?\227?\198-?\248
?\132?\129?\163
?\129
(?\136?\133?\167: ?\242?\133 ?\246?\241 ?\244?\130)
?\134?\150?\147?\147?\150?\166?\133?\132 ?\130?\168
?\137?\149?\165?\129?\147?\1
37?\132 ?\228?\227?\198-?\248 ?\162?\133?\152?\164?\133?\149?\131?\133
(?\136?\133?\167: ?\129?\242 ?\129?\245 ?\249?\245 ?\246?\241)
FSUM8226 make: Error code 8

With the following patch, the situation improves and I can read the error
messages without looking at EBCDIC code point charts.

subversion/svnversion/svnversion . /repos/svn/trunk >
/u/USER193/svn/cvs-1.5.6/subversion-1.5.6/built/include/subversion-1/svn-revision.txt
svn: Valid UTF-8 data
(hex: 2e 61 4b)
followed by invalid UTF-8 sequence
(hex: a2 a5 95 61)
FSUM8226 make: Error code 8

The error messages are in the native code page to start with, so running
them through a UTF-8 -> native conversion doesn't do anything helpful.

I have a handful of other patches that get subversion 1.5.6 to where
svnversion and "svn help [subcommand]" work properly.  I'm pretty confident
that I can take it beyond that, but I want to start posting patches one baby
step at a time.

Regards,
Greg


Index: subversion/libsvn_subr/cmdline.c
===================================================================
--- subversion/libsvn_subr/cmdline.c    (revision 943316)
+++ subversion/libsvn_subr/cmdline.c    (working copy)
@@ -318,24 +318,15 @@
 svn_error_t *
 svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
 {
-  svn_error_t *err;
-  const char *out;
+  /* "string" is native.  do not try to convert from UTF-8 */

-  err = svn_cmdline_cstring_from_utf8(&out, string, pool);
-
-  if (err)
-    {
-      svn_error_clear(err);
-      out = svn_cmdline_cstring_from_utf8_fuzzy(string, pool);
-    }
-
   /* On POSIX systems, errno will be set on an error in fputs, but this
might
      not be the case on other platforms.  We reset errno and only
      use it if it was set by the below fputs call.  Else, we just return
      a generic error. */
   errno = 0;

-  if (fputs(out, stream) == EOF)
+  if (fputs(string, stream) == EOF)
     {
       if (errno)
         return svn_error_wrap_apr(errno, _("Write error"));

Re: [PATCH] readable error messages on non-ASCII systems

Posted by Greg Ames <am...@gmail.com>.

On Wed, May 12, 2010 at 3:09 PM, Mark Phippard <ma...@gmail.com> wrote:

> You should look at this now deleted branch for Subversion on EBCDIC
> systems.
>
> http://svn.apache.org/viewvc/subversion/branches/ebcdic/?pathrev=876004
>
> It was written for AS/400, not z/OS but examining the diff of this
> branch and what it was synched with will help get an idea on where to
> look.


will do, thanks!

Greg

Re: [PATCH] readable error messages on non-ASCII systems

Posted by Mark Phippard <ma...@gmail.com>.

You should look at this now deleted branch for Subversion on EBCDIC systems.

http://svn.apache.org/viewvc/subversion/branches/ebcdic/?pathrev=876004

It was written for AS/400, not z/OS but examining the diff of this
branch and what it was synched with will help get an idea on where to
look.

The branch was deleted in r876005.  So be sure to use a peg-revision
of 876004 when looking at the repository.



On Wed, May 12, 2010 at 2:59 PM, Greg Ames <am...@gmail.com> wrote:
> ooops, I meant to cc: dev at subversion.  Thanks, Daniel.  I'm more used to
> dev at httpd where there is some magic involving replyto:
>
> Here is what I see if I run dbx against svnversion with a stop set in
> svn_cmdline_fputs:
>
> [2] stopped in svn_cmdline_fputs at line 288 in file "cmdline.c"  ($t1)
>  288   svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t
> *pool)
> (dbx64) where
> svn_cmdline_fputs(string = "svn: Valid UTF-8 data.(hex: 2e 61 4b).followed
> by invalid UTF-8 sequence.(hex: a2 a5 95 61).", stream = 0xD6C84F8, pool =
> 0xE4E4040), line 288 in "cmdline.c"
> svn_cmdline_fprintf(stream = 0xD6C84F8, pool = 0xE4E4040, fmt = "%s%s.",
> ...), line 284 in "cmdline.c"
> print_error.$b20, line 332 in "error.c"
> print_error(err = 0xE4E4078, stream = 0xD6C84F8, prefix = "svn: "), line 332
> in "error.c"
> svn_handle_error2.$b25.$b29, line 405 in "error.c"
> svn_handle_error2.$b25, line 405 in "error.c"
> svn_handle_error2(err = 0xE4E4078, stream = 0xD6C84F8, fatal = 0, prefix =
> "svn: "), line 405 in "error.c"
> main.$b17.$b18, line 216 in "main.c"
> main.$b17, line 216 in "main.c"
> main(argc = 1, argv = 0xDF76878), line 216 in "main.c"
>
> Any strings that are printable with z/OS dbx "where" or "p" are in the
> native EBCDIC encoding.  If I stop in svn_handle_error2 and print the err
> structure, I see:
>
> (dbx64) p *err
> (apr_err = 121, message = "Valid UTF-8 data.(hex: 2e 61 4b).followed by
> invalid UTF-8 sequence.(hex: a2 a5 95 61)", child = 0x0, pool = 0xE4E4040,
> file = "./subversion/libsvn_subr/utf.c", line = 632)
>
> ...so we have native strings here which never become UTF-8.  I could patch
> print_error() to do the UTF-8 conversion prior to calling
> svn_cmdline_fputs(), but the back-to-back conversions seem silly.  Maybe it
> would be better to define svn_cmdline_fputs_native_cstring() or some such
> and call that from print_error() and any other caller that passes native
> strings.
>
> Greg
>
> On Wed, May 12, 2010 at 10:42 AM, Greg Ames <am...@gmail.com> wrote:
>
>>
>>
>> On Wed, May 12, 2010 at 12:59 AM, Daniel Shahaf <d....@daniel.shahaf.name>wrote:
>>
>>> Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400:
>>> > The error messages are in the native code page to start with, so running
>>> > them through a UTF-8 -> native conversion doesn't do anything helpful.
>>> >
>>> ...
>>> > Index: subversion/libsvn_subr/cmdline.c
>>> > ===================================================================
>>> > --- subversion/libsvn_subr/cmdline.c    (revision 943316)
>>> > +++ subversion/libsvn_subr/cmdline.c    (working copy)
>>> > @@ -318,24 +318,15 @@
>>> >  svn_error_t *
>>> >  svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
>>> >  {
>>> > -  svn_error_t *err;
>>> > -  const char *out;
>>> > +  /* "string" is native.  do not try to convert from UTF-8 */
>>>
>>> The doc string of this function (see subversion/include/svn_cmdline.h)
>>> specifically promises that it'll do conversion from UTF-8.
>>
>>
>> ok, but
>>
>> a) that's not appropriate for error messages
>> b) it's not enforced.
>>
>>
>>> We cannot make it unconditionally do the opposite.
>>
>>
>> I have done exactly that with good results
>>
>>
>>> (Perhaps with suitable #ifdef's we could do it; or perhaps your problem
>>> can be fixed elsewhere (e.g., the error-printing code).)
>>>
>>>
>> The SVN_ERR() macro and supporting functions produce native strings, not
>> UTF-8, and they are widely used.
>>
>>
>>> Is your issue only with the encoding of error messages?
>>
>>
>> This patch addresses only the encoding of error messages. There are a few
>> other places where there is confusion about the encoding of input or
>> literals.
>>
>>
>>> Or with the the encoding of all svn output?
>>>
>>
>> I think it's a great idea to have svn metadata and text files in the
>> repository in UTF-8 to promote universal access.  But error messages are
>> local and shouldn't be munged much or Bad Things can happen.  Yes, someone
>> could inject code after SVN_ERR() to convert all the literal strings and
>> characters in error messages throughout subversion to UTF-8.  But what's the
>> point of doing that then converting it back to native to write to stderr?
>> And what are the odds of picking up 100% of the literal strings and
>> characters and doing exactly one UTF-8 conversion on all of them prior to
>> calling svn_cmdline_fputs()?  Simplicity is good, especially in error
>> situations, and it saves a few cycles on non-UTF-8 systems.
>>
>> Greg
>>
>



-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: [PATCH] readable error messages on non-ASCII systems

Posted by Greg Ames <am...@gmail.com>.

ooops, I meant to cc: dev at subversion.  Thanks, Daniel.  I'm more used to
dev at httpd where there is some magic involving replyto:

Here is what I see if I run dbx against svnversion with a stop set in
svn_cmdline_fputs:

[2] stopped in svn_cmdline_fputs at line 288 in file "cmdline.c"  ($t1)
  288   svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t
*pool)
(dbx64) where
svn_cmdline_fputs(string = "svn: Valid UTF-8 data.(hex: 2e 61 4b).followed
by invalid UTF-8 sequence.(hex: a2 a5 95 61).", stream = 0xD6C84F8, pool =
0xE4E4040), line 288 in "cmdline.c"
svn_cmdline_fprintf(stream = 0xD6C84F8, pool = 0xE4E4040, fmt = "%s%s.",
...), line 284 in "cmdline.c"
print_error.$b20, line 332 in "error.c"
print_error(err = 0xE4E4078, stream = 0xD6C84F8, prefix = "svn: "), line 332
in "error.c"
svn_handle_error2.$b25.$b29, line 405 in "error.c"
svn_handle_error2.$b25, line 405 in "error.c"
svn_handle_error2(err = 0xE4E4078, stream = 0xD6C84F8, fatal = 0, prefix =
"svn: "), line 405 in "error.c"
main.$b17.$b18, line 216 in "main.c"
main.$b17, line 216 in "main.c"
main(argc = 1, argv = 0xDF76878), line 216 in "main.c"

Any strings that are printable with z/OS dbx "where" or "p" are in the
native EBCDIC encoding.  If I stop in svn_handle_error2 and print the err
structure, I see:

(dbx64) p *err
(apr_err = 121, message = "Valid UTF-8 data.(hex: 2e 61 4b).followed by
invalid UTF-8 sequence.(hex: a2 a5 95 61)", child = 0x0, pool = 0xE4E4040,
file = "./subversion/libsvn_subr/utf.c", line = 632)

...so we have native strings here which never become UTF-8.  I could patch
print_error() to do the UTF-8 conversion prior to calling
svn_cmdline_fputs(), but the back-to-back conversions seem silly.  Maybe it
would be better to define svn_cmdline_fputs_native_cstring() or some such
and call that from print_error() and any other caller that passes native
strings.

Greg

On Wed, May 12, 2010 at 10:42 AM, Greg Ames <am...@gmail.com> wrote:

>
>
> On Wed, May 12, 2010 at 12:59 AM, Daniel Shahaf <d....@daniel.shahaf.name>wrote:
>
>> Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400:
>> > The error messages are in the native code page to start with, so running
>> > them through a UTF-8 -> native conversion doesn't do anything helpful.
>> >
>> ...
>> > Index: subversion/libsvn_subr/cmdline.c
>> > ===================================================================
>> > --- subversion/libsvn_subr/cmdline.c    (revision 943316)
>> > +++ subversion/libsvn_subr/cmdline.c    (working copy)
>> > @@ -318,24 +318,15 @@
>> >  svn_error_t *
>> >  svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
>> >  {
>> > -  svn_error_t *err;
>> > -  const char *out;
>> > +  /* "string" is native.  do not try to convert from UTF-8 */
>>
>> The doc string of this function (see subversion/include/svn_cmdline.h)
>> specifically promises that it'll do conversion from UTF-8.
>
>
> ok, but
>
> a) that's not appropriate for error messages
> b) it's not enforced.
>
>
>> We cannot make it unconditionally do the opposite.
>
>
> I have done exactly that with good results
>
>
>> (Perhaps with suitable #ifdef's we could do it; or perhaps your problem
>> can be fixed elsewhere (e.g., the error-printing code).)
>>
>>
> The SVN_ERR() macro and supporting functions produce native strings, not
> UTF-8, and they are widely used.
>
>
>> Is your issue only with the encoding of error messages?
>
>
> This patch addresses only the encoding of error messages. There are a few
> other places where there is confusion about the encoding of input or
> literals.
>
>
>> Or with the the encoding of all svn output?
>>
>
> I think it's a great idea to have svn metadata and text files in the
> repository in UTF-8 to promote universal access.  But error messages are
> local and shouldn't be munged much or Bad Things can happen.  Yes, someone
> could inject code after SVN_ERR() to convert all the literal strings and
> characters in error messages throughout subversion to UTF-8.  But what's the
> point of doing that then converting it back to native to write to stderr?
> And what are the odds of picking up 100% of the literal strings and
> characters and doing exactly one UTF-8 conversion on all of them prior to
> calling svn_cmdline_fputs()?  Simplicity is good, especially in error
> situations, and it saves a few cycles on non-UTF-8 systems.
>
> Greg
>

Re: [PATCH] readable error messages on non-ASCII systems

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400:
> The error messages are in the native code page to start with, so running
> them through a UTF-8 -> native conversion doesn't do anything helpful.
> 
...
> Index: subversion/libsvn_subr/cmdline.c
> ===================================================================
> --- subversion/libsvn_subr/cmdline.c    (revision 943316)
> +++ subversion/libsvn_subr/cmdline.c    (working copy)
> @@ -318,24 +318,15 @@
>  svn_error_t *
>  svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool)
>  {
> -  svn_error_t *err;
> -  const char *out;
> +  /* "string" is native.  do not try to convert from UTF-8 */

The doc string of this function (see subversion/include/svn_cmdline.h) 
specifically promises that it'll do conversion from UTF-8.  We cannot make 
it unconditionally do the opposite.

(Perhaps with suitable #ifdef's we could do it; or perhaps your problem 
can be fixed elsewhere (e.g., the error-printing code).)

Is your issue only with the encoding of error messages?  Or with the the
encoding of all svn output?