You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Ames <am...@gmail.com> on 2010/05/21 23:18:07 UTC

[PATCH] use APR's ctypes on EBCDIC systems

The current svn_ctype_* implementation depends on ASCII character encoding.
APR's ctype functions are portable, so use those on EBCDIC systems.

A somewhat surprising side effect is that svn messages become mostly
readable on z/OS with the help of svn_utf__cstring_from_utf8_fuzzy().

I thought it would be more readable to have one #if around the entire block
of svn_ctype_is* macros rather than around each individual macro.  I did not
want to create a second set of nearly duplicate Doxygen docs so I just
punted on those.  Let me know if you'd prefer to see it formatted
differently.

Greg

[[[ Use portable ctypes on EBCDIC systems ]]]

Fwd: [PATCH] use APR's ctypes on EBCDIC systems

Posted by Greg Ames <am...@gmail.com>.
forgot to cc: dev

On Mon, May 24, 2010 at 8:13 PM, Branko Čibej <br...@xbc.nu> wrote:

>
> >
> >     Others have ported Subversion to EBCDIC systems
> >
> >
> > It was OS/400 (aka. iSeries).  OS/390 was an earlier brand name for z/OS.
>
> (ISTR now that you had something to do with thosat port, right?)
>

I contributed a bunch to the httpd 2.x port to z/OS and to the MPMs, and to
APR on z/OS to a lesser extent.  I haven't worked on OS/400.  I suspect that
it has more complete runtime library support for ASCII than the z/OS
LIBASCII does.


> printf style format strings and args for other (but not all) library
> functions are expected to have native encoding on z/OS, so using the
> #pragmas is not as easy as I would like.

 Oh ouch. That's going to make things realy evil. Imagine, our
> translations are coded in UTF-8 and quite a few printf-style format
> strings come from there.


$ find . -name .svn -prune -o -print | xargs grep "_(.*%"
[...]
./subversion/svnserve/main.c:                    _("svnserve: Root path '%s'
does not exist "
./subversion/svnserve/serve.c:                             _("Unknown
revprop word '%s' in log command"),
[...]
$ find . -name .svn -prune -o -print | xargs grep "_(.*%" | wc -l
1072
$

yeah it will be challenging.  I did notice that the _() macro could pick up
either literal strings or real UTF-8 depending on ENABLE_NLS.

However ... if I'm not too much mistaken, we
> use APR's formatter functions almost everywhere -- I believe the notable
> exception are the command-line output functions.


that's where I first noticed problems.

Are you sure you can't just tickle those, makeing them (elegantly)
> z/OS-specific?


it is amazing how much better the output looks by deleting one charset
conversion from the command line fputs function.  that's not a final
solution, but it suggests that it shouldn't be too bad.


> For example, we do some special magic just for the console output on
> Windows.
>

thanks, I'll see if I can find the magic.

Greg

Re: [PATCH] use APR's ctypes on EBCDIC systems

Posted by Branko Čibej <br...@xbc.nu>.
On 24.05.2010 20:10, Greg Ames wrote:
> On Sun, May 23, 2010 at 5:40 PM, Branko Čibej <brane@xbc.nu
> <ma...@xbc.nu>> wrote:
>
>
>     This is very, very wrong, because we use the ctypes for other things,
>     not just for string literals. 
>
>
> I'm aware that ctypes are used for other things.  I don't see why
> using APR's portable version of ctypes across the board would break
> anything.  I couldn't find a reason why subversion requires a custom
> version of ctypes which happens to be non-portable.

Because we assume all strings are in a subset of UTF-8, and those ctypes
are intended for such strings. If you suddenly start using
EBCDIC-specific "portable" ctypes, then other parts of the code are
likely to break horribly. At least, IIRC; it's been a long time since I
was there.
 
>
>     There is a very deep-rooted assumption
>     within the code that inside the library, all strings are encoded in (a
>     subset of) UTF-8, and that implies that we expect string literals
>     to be
>     in ascii.
>
>
> There certainly are a lot of string literals that are assumed to be
> UTF-8, agreed.
>  
>
>     Others have ported Subversion to EBCDIC systems
>
>
> It was OS/400 (aka. iSeries).  OS/390 was an earlier brand name for z/OS.

(ISTR now that you had something to do with thosat port, right?)

>     but IIRC they always told their compilers to treat the source as
>     ASCII. There was a port that used some sort of #pragma or
>     preprocessing,
>     I don't recall which, to handle string literals, but it died off
>     because
>     it was too easy to just tell the compiler to do the right thing.
>
>
> printf style format strings and args for other (but not all) library
> functions are expected to have native encoding on z/OS, so using the
> #pragmas is not as easy as I would like.

Oh ouch. That's going to make things realy evil. Imagine, our
translations are coded in UTF-8 and quite a few printf-style format
strings come from there. However ... if I'm not too much mistaken, we
use APR's formatter functions almost everywhere -- I believe the notable
exception are the command-line output functions. Are you sure you can't
just tickle those, makeing them (elegantly) z/OS-specific? For example,
we do some special magic just for the console output on Windows.

-- Brane

Re: [PATCH] use APR's ctypes on EBCDIC systems

Posted by Greg Ames <am...@gmail.com>.
On Sun, May 23, 2010 at 5:40 PM, Branko Čibej <br...@xbc.nu> wrote:

>
> This is very, very wrong, because we use the ctypes for other things,
> not just for string literals.


I'm aware that ctypes are used for other things.  I don't see why using
APR's portable version of ctypes across the board would break anything.  I
couldn't find a reason why subversion requires a custom version of ctypes
which happens to be non-portable.


> There is a very deep-rooted assumption
> within the code that inside the library, all strings are encoded in (a
> subset of) UTF-8, and that implies that we expect string literals to be
> in ascii.
>

There certainly are a lot of string literals that are assumed to be UTF-8,
agreed.


> Others have ported Subversion to EBCDIC systems
>

It was OS/400 (aka. iSeries).  OS/390 was an earlier brand name for z/OS.


> but IIRC they always told their compilers to treat the source as
> ASCII. There was a port that used some sort of #pragma or preprocessing,
> I don't recall which, to handle string literals, but it died off because
> it was too easy to just tell the compiler to do the right thing.
>

printf style format strings and args for other (but not all) library
functions are expected to have native encoding on z/OS, so using the
#pragmas is not as easy as I would like.

Greg

Re: [PATCH] use APR's ctypes on EBCDIC systems

Posted by Branko Čibej <br...@xbc.nu>.
On 21.05.2010 23:18, Greg Ames wrote:
> The current svn_ctype_* implementation depends on ASCII character
> encoding.  APR's ctype functions are portable, so use those on EBCDIC
> systems.
>
> A somewhat surprising side effect is that svn messages become mostly
> readable on z/OS with the help of svn_utf__cstring_from_utf8_fuzzy().
>
> I thought it would be more readable to have one #if around the entire
> block of svn_ctype_is* macros rather than around each individual
> macro.  I did not want to create a second set of nearly duplicate
> Doxygen docs so I just punted on those.  Let me know if you'd prefer
> to see it formatted differently.
>
> Greg
>
> [[[ Use portable ctypes on EBCDIC systems ]]]
>
This is very, very wrong, because we use the ctypes for other things,
not just for string literals. There is a very deep-rooted assumption
within the code that inside the library, all strings are encoded in (a
subset of) UTF-8, and that implies that we expect string literals to be
in ascii.

Others have ported Subversion to EBCDIC systems (I recall an OS/390
port?) but IIRC they always told their compilers to treat the source as
ASCII. There was a port that used some sort of #pragma or preprocessing,
I don't recall which, to handle string literals, but it died off because
it was too easy to just tell the compiler to do the right thing.

In my opinion, that's what you should be doing, too.

-- Brane