You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Burba <pa...@softlanding.com> on 2006/03/07 13:39:34 UTC

Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.

Mark Phippard <ma...@softlanding.com> wrote on 02/24/2006 08:11:25 PM:

> > There's no difference between "text" and "binary" files on Unix.
> 
> As always, thanks for the education.  Other than OS/400, is Cygwin the 
only
> environment where there is a difference?
> 
> My only exposure to using these system calls is on OS/400 (via RPG 
programs
> no less) so I am just used to this "feature" of OS/400.  The high-level
> OS/400 "CPY" command even has a Text/Binary option.
> 
> > Yes, I think that APR should be using APR_BINARY, too. I guess nobody
> > really noticed there was a problem because most of the testing was 
done
> > on platforms where APR_BINARY has no meaning.
> 
> Well maybe Max will try to get this into APR.  That will be a long time
> before it trickles down to Subversion though.  Can this patch be 
reviewed
> in the meantime?  Perhaps it can be tweaked (by Max) so that it could be
> used for Cygwin too so that he does not have to patch APR for Cygwin?  A
> luxury we do not have for OS/400.
> 
> Thanks
> 
> Mark

Hey All,

We're down to the last two "core" patches to enable Subversion to run on 
OS400 V5R4.  I originally posted this patch back on 2/24 - 
http://svn.haxx.se/dev/archive-2006-02/1418.shtml. It got discussed a 
little then fell off the radar.  I know, I know, don't post patches on 
Friday!

Here is the original post of the patch again, please check it out if you 
have some time.

Thanks,

Paul B.

----------------------------------------

(Don't know what the OS400/EBCDIC port is about?  See: 
http://svn.haxx.se/dev/archive-2006-02/0519.shtml)

Following on the heels of my last patch comes another related to file 
CCSIDs - see http://svn.haxx.se/dev/archive-2006-02/1298.shtml for more on 

the horror of CCSIDs.  That noise you hear is Julian throwing himself off 
a cliff after reading this...or possibly plotting to throw me off a cliff 
8-o

On OS400 apr_file_copy() is sensitive to the source and destination files' 

CCSIDs in that it attempts to convert the contents of the source file from 

its CCSID to the CCSID of the destination file.  If the files' CCSIDs 
differ from each other and/or the system CCSID, the destination file is 
likely to be corrupted.

My solution in the pre-V5R4 port (a.k.a. /svn/branches/ebcdic) was to add 
a function (very closely) based on apr_file_transfer_contents() from 
srclib/apr/file_io/unix/copy.c that copies the file in a CCSID-insensitive 

manner.

I looked at this problem afresh for V5R4 and despite coming up with a lot 
of ideas, I didn't see a better solution.  So I further streamlined and 
recommented the ebcdic branch approach and here it is for your 
consideration, let me know what you think, thanks.

Paul B. 

----------------------------------------------------------------------
[[[
OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.

This is one of several patches to allow Subversion to run on IBM's
OS400 V5R4.  It makes svn_io_copy_file() insensitive to the CCSIDs of
the source and destination files, thereby preventing potential OS400
conversion of the files' contents.

* subversion/libsvn_subr/io.c
   (os400_file_copy): New function.
   (svn_io_copy_file): Replace call to apr_file_copy() with new
    function.
]]]


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.

Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 03/10/2006 08:11:35 AM:

> Paul Burba wrote:
> > Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00 
AM:
> > 
> >>(1) [...]  The destination should be opened in the same 
> >>mode, and have the same CCSID as the source. [...]
> > 
> > FWIW, IBM APR does not provide any way to obtain, specify, or 
otherwise 
> > manipulate a file's CCSID.  It can be done with lower-level system 
calls; 
> > IBM provides a rather "cumbersome" (as in requiring a new 30+ line 
> > function) API (Qp0lSetAttr) to change the CCSID of an IFS file.
> 
> OK.  Perhaps it's safe to pretty much ignore the CCSIDs, always setting 
it to 
> UTF-8.  I still think the CCSID should be copied by any "file copy" 
function, 
> in theory, but, in practical terms, I was thinking that either
> 
> * the unwanted change of CCSID when copying a non-UTF8 file could bea 
problem 
> in some cases, or
> 
> * all files being copied by Subversion will already have a UTF-8 
> CCSID in which 
> case no translation will happen when copied in text mode so the 
> patch is not needed
> 
> Now I can see that the second isn't true (e.g. "svn add non-UTF8-file"), 
and 
> maybe the first isn't either; I'll leave that to your judgement, as I 
haven't 
> thought through the possible scenarios and don't know the system well 
enough 
> anyway.  If it's not a problem in practice, then fair enough.

The svn add scenario is a concern.  With my current set of patches if a 
user adds and commits a file with a CCSID of 37 and another user checks 
that file out, the second user gets the file's contents correctly in a 
byte-for-byte sense, but the file's CCSID will be 1208 in their working 
copy.  Recall that OS400 V5R4 APR (built with UTF support) assumes binary 
files have CCSID 37 and text files have CCSID 1208.  But any other utility 
or application on V5R4 that *isn't* built with UTF support assumes binary 
and text files both have CCSID 37.  So there is a fundamental disagreement 
between them as to how to encode and tag a text file, but this is a 
problem bigger than Subversion, it's just life on the iSeries. 

Ideally, if a user commits a CCSID 37 text file and another user checks it 
out, the CCSID of the checked out file would be 37.  But I don't see 
svn_io_copy_file() as the place to solve this - even if it did preserve 
CCSID, the svn add/ci/co scenario previously described still has the same 
result.  Perhaps the ultimate solution is to support some new svn: 
properties (i.e. svn:ebcdic) that causes a checked out file to be retagged 
with Qp0lSetAttr()... 

...which brings us to my cop out :-)  Mark and I have ported Subversion to 
the iSeries primarily to run as a server.  We don't know of anyone who is 
using the client to commit ebcdic text files nor has anyone so much as 
hinted to us that this is something they want.  If a demand for this 
arises we'll certainly reconsider beefing up support for it or possibly 
get some other folks in the wider OS400 world to contribute some patches. 
For now we feel safer knowing that all files created by Subversion (via 
apr_file_open and apr_file_copy) are consistently tagged with 1208.

> >>To complete this patch, svn_io_copy_file() needs to say in its 
> doc-string that 
> >>it does a "byte-for-byte" copy, as it is no longer just a wrapper 
around 
> >>apr_file_copy().  (Eventually, I hope apr_file_copy() will be fixed 
and 
> >>documented to do the same.)
> > 
> > Do we really want to do that?  It becomes true for OS400 yes, and it's 

> > been implicitly true for *nix, but it still isn't true for CygWin, no? 
 Am 
> > I being too much of a stickler here...or did you mean for the 
doc-string 
> > to indicate that there are platform dependent differences?
> 
> Sorry, I thought svn_io_copy_file() said it was a wrapper around 
> apr_file_copy(), but it doesn't.
> 
> We obviously want svn_io_copy_file() to perform a byte-for-byte copy. 
> Therefore we want to document it as such, because, as we have seen in 
APR, if 
> we don't then there is confusion about what it should do to text files. 
The 
> fact that it does not actually do so on all platforms then becomes 
adefinite 
> bug, rather than just unspecified behaviour as it is at present, and I 
think 
> that's a good thing.
> 
> It seems logical to document the intended behavior at the same timeas 
fixing 
> the implementation (on OS400) to conform to that behaviour.

Ok, I understand now, I'll tweak the doc-string.

> Apart from that, this patch is fine.
> 
> Sorry for the delay in responding this time.
> 
> - Julian

Thanks for all the help,

Paul B.

_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00 AM:
> 
>>(1) [...]  The destination should be opened in the same 
>>mode, and have the same CCSID as the source. [...]
> 
> FWIW, IBM APR does not provide any way to obtain, specify, or otherwise 
> manipulate a file's CCSID.  It can be done with lower-level system calls; 
> IBM provides a rather "cumbersome" (as in requiring a new 30+ line 
> function) API (Qp0lSetAttr) to change the CCSID of an IFS file.

OK.  Perhaps it's safe to pretty much ignore the CCSIDs, always setting it to 
UTF-8.  I still think the CCSID should be copied by any "file copy" function, 
in theory, but, in practical terms, I was thinking that either

* the unwanted change of CCSID when copying a non-UTF8 file could be a problem 
in some cases, or

* all files being copied by Subversion will already have a UTF-8 CCSID in which 
case no translation will happen when copied in text mode so the patch is not needed

Now I can see that the second isn't true (e.g. "svn add non-UTF8-file"), and 
maybe the first isn't either; I'll leave that to your judgement, as I haven't 
thought through the possible scenarios and don't know the system well enough 
anyway.  If it's not a problem in practice, then fair enough.


>>(2) [...]  That's the bug that Max Bowsher found for the 
>>CygWin port of APR: it needs to read and write in binary mode.  The same fix 
>>should make the reads and writes insensitive to CCSIDs on OS400.
>>
>>This should all be done within APR, of course.  I can accept this patch on the 
>>basis that it won't get into APR soon enough, but I would like some assurance 
>>that one of you will make an effort to get it into APR.
> 
> We'll push IBM to make this change, as we have with others, in their port 
> of APR as soon as it's in open source APR.  We'll also suggest to them 
> that they may simply want to fix the problem in their port of APR, 
> regardless of what happens with open source APR, but this is less likely. 
> Regardless of what IBM elects to do, please understand that this is an 
> informal process; there's no mailing list for IBM APR or anything like 
> that.  We simply don't have much power to make them do anything they don't 
> want to do.

OK.

We - that is, Subversion developers, some of whom are also APR developers - can 
and should independently push for APR to be fixed in this way.  I've just sent 
a message about this ("[APR] apr_file_copy() improvements").


>>To complete this patch, svn_io_copy_file() needs to say in its doc-string that 
>>it does a "byte-for-byte" copy, as it is no longer just a wrapper around 
>>apr_file_copy().  (Eventually, I hope apr_file_copy() will be fixed and 
>>documented to do the same.)
> 
> Do we really want to do that?  It becomes true for OS400 yes, and it's 
> been implicitly true for *nix, but it still isn't true for CygWin, no?  Am 
> I being too much of a stickler here...or did you mean for the doc-string 
> to indicate that there are platform dependent differences?

Sorry, I thought svn_io_copy_file() said it was a wrapper around 
apr_file_copy(), but it doesn't.

We obviously want svn_io_copy_file() to perform a byte-for-byte copy. 
Therefore we want to document it as such, because, as we have seen in APR, if 
we don't then there is confusion about what it should do to text files.  The 
fact that it does not actually do so on all platforms then becomes a definite 
bug, rather than just unspecified behaviour as it is at present, and I think 
that's a good thing.

It seems logical to document the intended behaviour at the same time as fixing 
the implementation (on OS400) to conform to that behaviour.

Apart from that, this patch is fine.

Sorry for the delay in responding this time.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.

Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00 AM:

Hi Julian,

Thanks for reviewing this.  I know this OS400 stuff can be a haze...

> Paul Burba wrote:
> > 
> > [[[
> > OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
> > 
> > This is one of several patches to allow Subversion to run on IBM's
> > OS400 V5R4.  It makes svn_io_copy_file() insensitive to the CCSIDs of
> > the source and destination files, thereby preventing potential OS400
> > conversion of the files' contents.
> > 
> > * subversion/libsvn_subr/io.c
> >    (os400_file_copy): New function.
> >    (svn_io_copy_file): Replace call to apr_file_copy() with new
> >     function.
> > ]]]
> 
> What do we want svn_io_copy_file() to do, in high-level terms?  It seems 
we 
> want it to make an exact copy of a file, at least an exact copy of the 
> contents, but perhaps not the meta-data.  But it's silly, isn't it, 
> to preserve 
> the contents while not preserving the CCSID, because then the new 
> file's CCSID 
> would be more likely to be wrong, so this function should copy the 
source 
> file's CCSID too.
> 
> I believe the apr_file_copy() was intended to do a byte-for-byte 
> copy and thus 
> correctly copy any file, text or binary, even on systems that otherwise 
might 
> do translations on text files.  However, it's broken on non-Unix 
systems.
> 
> There are two possible causes of unwanted modification during a copy.
> 
> (1) If the source and destination are opened in different modes (text 
vs. 
> binary) or, on OS400, have different CCSIDs.  This should not occur 
because 
> this is a complete COPY operation in which the destination file is 
completely 
> under the function's control.  The destination should be opened in the 
same 
> mode, and have the same CCSID as the source.  If that's not easy to do 
with 
> APR, that's a bug in APR or needs to be done with lower-level system 
calls.

FWIW, IBM APR does not provide any way to obtain, specify, or otherwise 
manipulate a file's CCSID.  It can be done with lower-level system calls; 
IBM provides a rather "cumbersome" (as in requiring a new 30+ line 
function) API (Qp0lSetAttr) to change the CCSID of an IFS file.

> (2) If translating reads and writes are used, and the translation is not 

> exactly reversible, so read + write causes an unintended translation. 
This 
> should be avoided by opening the files in "binary" mode so that 
byte-for-byte 
> reads and writes are performed.  That's the bug that Max Bowsher 
> found for the 
> CygWin port of APR: it needs to read and write in binary mode.  The same 
fix 
> should make the reads and writes insensitive to CCSIDs on OS400.
> 
> This should all be done within APR, of course.  I can accept this 
> patch on the 
> basis that it won't get into APR soon enough, but I would like some 
assurance 
> that one of you will make an effort to get it into APR.

We'll push IBM to make this change, as we have with others, in their port 
of APR as soon as it's in open source APR.  We'll also suggest to them 
that they may simply want to fix the problem in their port of APR, 
regardless of what happens with open source APR, but this is less likely. 
Regardless of what IBM elects to do, please understand that this is an 
informal process; there's no mailing list for IBM APR or anything like 
that.  We simply don't have much power to make them do anything they don't 
want to do.
 
> To complete this patch, svn_io_copy_file() needs to say in its doc-
> string that 
> it does a "byte-for-byte" copy, as it is no longer just a wrapper around 

> apr_file_copy().  (Eventually, I hope apr_file_copy() will be fixed and 
> documented to do the same.)

Do we really want to do that?  It becomes true for OS400 yes, and it's 
been implicitly true for *nix, but it still isn't true for CygWin, no?  Am 
I being too much of a stickler here...or did you mean for the doc-string 
to indicate that there are platform dependent differences?

> > +#ifdef AS400
> > +/* CCSID insensitive replacement for apr_file_copy() on OS400.
> 
> It's not simply a replacement, as it has an extra argument.  Please 
either 
> remove or document the extra argument.

I removed that argument, it was not necessary.  I should have read my own 
comment, which explained why:

   * apr_file_copy() does not require the destination file to exist and 
will
   * overwrite it if it does.  Since this is a replacement for
   * apr_file_copy() we enforce similar behavior.
 
> > + * 
> > + * (See comments for file_open() for more info on CCSIDs.)
> > + * 
> > + * On OS400 apr_file_copy() attempts to convert the contents of the 
source
> > + * file from its CCSID to the CCSID of the desination file.  This may
> 
> ("desination" -> "destination")

Fixed.
 
> > + * corrupt the destination file's contents if the files' CCSIDs 
differ from
> > + * each other and/or the system CCSID.
> > + * 
> > + * This new function prevents this by forcing a binary copy.  It is
> > + * stripped down copy of the private function 
apr_file_transfer_contents in
> > + * srclib/apr/file_io/unix/copy.c of version 2.0.54 of the Apache 
HTTP
> > + * Server (http://httpd.apache.org/) excepting that APR_LARGEFILE is 
not
> > + * used, from_path is always opened with APR_BINARY, and
> > + * APR_FILE_SOURCE_PERMS is not supported. 
> > + */ 
> > +static apr_status_t
> > +os400_file_copy(const char *from_path,
> > +                const char *to_path,
> > +                apr_int32_t flags,
> > +                apr_fileperms_t perms,
> > +                apr_pool_t *pool)
> 
> 
> > +#ifndef AS400
> >    apr_err = apr_file_copy(src_apr, dst_tmp_apr, APR_OS_DEFAULT, 
pool);
> > +#else
> > +  /* Do a CCSID blind file copy on OS400. */
> 
> (No need for that comment.  No space before paren in next line(s).)

Gone & gone.

Paul B.

[[[
OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.

This is one of several patches to allow Subversion to run on IBM's
OS400 V5R4.  It makes svn_io_copy_file() insensitive to the CCSIDs of
the source and destination files, thereby preventing potential OS400
conversion of the files' contents.

* subversion/libsvn_subr/io.c
   (os400_file_copy): New function.
   (svn_io_copy_file): Replace call to apr_file_copy() with new
    function.
]]]


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> 
> [[[
> OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
> 
> This is one of several patches to allow Subversion to run on IBM's
> OS400 V5R4.  It makes svn_io_copy_file() insensitive to the CCSIDs of
> the source and destination files, thereby preventing potential OS400
> conversion of the files' contents.
> 
> * subversion/libsvn_subr/io.c
>    (os400_file_copy): New function.
>    (svn_io_copy_file): Replace call to apr_file_copy() with new
>     function.
> ]]]

What do we want svn_io_copy_file() to do, in high-level terms?  It seems we 
want it to make an exact copy of a file, at least an exact copy of the 
contents, but perhaps not the meta-data.  But it's silly, isn't it, to preserve 
the contents while not preserving the CCSID, because then the new file's CCSID 
would be more likely to be wrong, so this function should copy the source 
file's CCSID too.

I believe the apr_file_copy() was intended to do a byte-for-byte copy and thus 
correctly copy any file, text or binary, even on systems that otherwise might 
do translations on text files.  However, it's broken on non-Unix systems.

There are two possible causes of unwanted modification during a copy.

(1) If the source and destination are opened in different modes (text vs. 
binary) or, on OS400, have different CCSIDs.  This should not occur because 
this is a complete COPY operation in which the destination file is completely 
under the function's control.  The destination should be opened in the same 
mode, and have the same CCSID as the source.  If that's not easy to do with 
APR, that's a bug in APR or needs to be done with lower-level system calls.

(2) If translating reads and writes are used, and the translation is not 
exactly reversible, so read + write causes an unintended translation.  This 
should be avoided by opening the files in "binary" mode so that byte-for-byte 
reads and writes are performed.  That's the bug that Max Bowsher found for the 
CygWin port of APR: it needs to read and write in binary mode.  The same fix 
should make the reads and writes insensitive to CCSIDs on OS400.

This should all be done within APR, of course.  I can accept this patch on the 
basis that it won't get into APR soon enough, but I would like some assurance 
that one of you will make an effort to get it into APR.


To complete this patch, svn_io_copy_file() needs to say in its doc-string that 
it does a "byte-for-byte" copy, as it is no longer just a wrapper around 
apr_file_copy().  (Eventually, I hope apr_file_copy() will be fixed and 
documented to do the same.)


> +#ifdef AS400
> +/* CCSID insensitive replacement for apr_file_copy() on OS400.

It's not simply a replacement, as it has an extra argument.  Please either 
remove or document the extra argument.

> + * 
> + * (See comments for file_open() for more info on CCSIDs.)
> + * 
> + * On OS400 apr_file_copy() attempts to convert the contents of the source
> + * file from its CCSID to the CCSID of the desination file.  This may

("desination" -> "destination")

> + * corrupt the destination file's contents if the files' CCSIDs differ from
> + * each other and/or the system CCSID.
> + * 
> + * This new function prevents this by forcing a binary copy.  It is
> + * stripped down copy of the private function apr_file_transfer_contents in
> + * srclib/apr/file_io/unix/copy.c of version 2.0.54 of the Apache HTTP
> + * Server (http://httpd.apache.org/) excepting that APR_LARGEFILE is not
> + * used, from_path is always opened with APR_BINARY, and
> + * APR_FILE_SOURCE_PERMS is not supported. 
> + */ 
> +static apr_status_t
> +os400_file_copy(const char *from_path,
> +                const char *to_path,
> +                apr_int32_t flags,
> +                apr_fileperms_t perms,
> +                apr_pool_t *pool)


> +#ifndef AS400
>    apr_err = apr_file_copy(src_apr, dst_tmp_apr, APR_OS_DEFAULT, pool);
> +#else
> +  /* Do a CCSID blind file copy on OS400. */

(No need for that comment.  No space before paren in next line(s).)

> +  apr_err = os400_file_copy (src_apr, dst_tmp_apr,
> +                             APR_WRITE | APR_CREATE | APR_TRUNCATE |
> +                             APR_BINARY, APR_OS_DEFAULT, pool);
> +#endif

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org