You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Burba <pa...@softlanding.com> on 2006/03/07 13:39:34 UTC
Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.
Mark Phippard <ma...@softlanding.com> wrote on 02/24/2006 08:11:25 PM:
> > There's no difference between "text" and "binary" files on Unix.
>
> As always, thanks for the education. Other than OS/400, is Cygwin the
only
> environment where there is a difference?
>
> My only exposure to using these system calls is on OS/400 (via RPG
programs
> no less) so I am just used to this "feature" of OS/400. The high-level
> OS/400 "CPY" command even has a Text/Binary option.
>
> > Yes, I think that APR should be using APR_BINARY, too. I guess nobody
> > really noticed there was a problem because most of the testing was
done
> > on platforms where APR_BINARY has no meaning.
>
> Well maybe Max will try to get this into APR. That will be a long time
> before it trickles down to Subversion though. Can this patch be
reviewed
> in the meantime? Perhaps it can be tweaked (by Max) so that it could be
> used for Cygwin too so that he does not have to patch APR for Cygwin? A
> luxury we do not have for OS/400.
>
> Thanks
>
> Mark
Hey All,
We're down to the last two "core" patches to enable Subversion to run on
OS400 V5R4. I originally posted this patch back on 2/24 -
http://svn.haxx.se/dev/archive-2006-02/1418.shtml. It got discussed a
little then fell off the radar. I know, I know, don't post patches on
Friday!
Here is the original post of the patch again, please check it out if you
have some time.
Thanks,
Paul B.
----------------------------------------
(Don't know what the OS400/EBCDIC port is about? See:
http://svn.haxx.se/dev/archive-2006-02/0519.shtml)
Following on the heels of my last patch comes another related to file
CCSIDs - see http://svn.haxx.se/dev/archive-2006-02/1298.shtml for more on
the horror of CCSIDs. That noise you hear is Julian throwing himself off
a cliff after reading this...or possibly plotting to throw me off a cliff
8-o
On OS400 apr_file_copy() is sensitive to the source and destination files'
CCSIDs in that it attempts to convert the contents of the source file from
its CCSID to the CCSID of the destination file. If the files' CCSIDs
differ from each other and/or the system CCSID, the destination file is
likely to be corrupted.
My solution in the pre-V5R4 port (a.k.a. /svn/branches/ebcdic) was to add
a function (very closely) based on apr_file_transfer_contents() from
srclib/apr/file_io/unix/copy.c that copies the file in a CCSID-insensitive
manner.
I looked at this problem afresh for V5R4 and despite coming up with a lot
of ideas, I didn't see a better solution. So I further streamlined and
recommented the ebcdic branch approach and here it is for your
consideration, let me know what you think, thanks.
Paul B.
----------------------------------------------------------------------
[[[
OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
This is one of several patches to allow Subversion to run on IBM's
OS400 V5R4. It makes svn_io_copy_file() insensitive to the CCSIDs of
the source and destination files, thereby preventing potential OS400
conversion of the files' contents.
* subversion/libsvn_subr/io.c
(os400_file_copy): New function.
(svn_io_copy_file): Replace call to apr_file_copy() with new
function.
]]]
_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________
Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.
Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 03/10/2006 08:11:35 AM:
> Paul Burba wrote:
> > Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00
AM:
> >
> >>(1) [...] The destination should be opened in the same
> >>mode, and have the same CCSID as the source. [...]
> >
> > FWIW, IBM APR does not provide any way to obtain, specify, or
otherwise
> > manipulate a file's CCSID. It can be done with lower-level system
calls;
> > IBM provides a rather "cumbersome" (as in requiring a new 30+ line
> > function) API (Qp0lSetAttr) to change the CCSID of an IFS file.
>
> OK. Perhaps it's safe to pretty much ignore the CCSIDs, always setting
it to
> UTF-8. I still think the CCSID should be copied by any "file copy"
function,
> in theory, but, in practical terms, I was thinking that either
>
> * the unwanted change of CCSID when copying a non-UTF8 file could bea
problem
> in some cases, or
>
> * all files being copied by Subversion will already have a UTF-8
> CCSID in which
> case no translation will happen when copied in text mode so the
> patch is not needed
>
> Now I can see that the second isn't true (e.g. "svn add non-UTF8-file"),
and
> maybe the first isn't either; I'll leave that to your judgement, as I
haven't
> thought through the possible scenarios and don't know the system well
enough
> anyway. If it's not a problem in practice, then fair enough.
The svn add scenario is a concern. With my current set of patches if a
user adds and commits a file with a CCSID of 37 and another user checks
that file out, the second user gets the file's contents correctly in a
byte-for-byte sense, but the file's CCSID will be 1208 in their working
copy. Recall that OS400 V5R4 APR (built with UTF support) assumes binary
files have CCSID 37 and text files have CCSID 1208. But any other utility
or application on V5R4 that *isn't* built with UTF support assumes binary
and text files both have CCSID 37. So there is a fundamental disagreement
between them as to how to encode and tag a text file, but this is a
problem bigger than Subversion, it's just life on the iSeries.
Ideally, if a user commits a CCSID 37 text file and another user checks it
out, the CCSID of the checked out file would be 37. But I don't see
svn_io_copy_file() as the place to solve this - even if it did preserve
CCSID, the svn add/ci/co scenario previously described still has the same
result. Perhaps the ultimate solution is to support some new svn:
properties (i.e. svn:ebcdic) that causes a checked out file to be retagged
with Qp0lSetAttr()...
...which brings us to my cop out :-) Mark and I have ported Subversion to
the iSeries primarily to run as a server. We don't know of anyone who is
using the client to commit ebcdic text files nor has anyone so much as
hinted to us that this is something they want. If a demand for this
arises we'll certainly reconsider beefing up support for it or possibly
get some other folks in the wider OS400 world to contribute some patches.
For now we feel safer knowing that all files created by Subversion (via
apr_file_open and apr_file_copy) are consistently tagged with 1208.
> >>To complete this patch, svn_io_copy_file() needs to say in its
> doc-string that
> >>it does a "byte-for-byte" copy, as it is no longer just a wrapper
around
> >>apr_file_copy(). (Eventually, I hope apr_file_copy() will be fixed
and
> >>documented to do the same.)
> >
> > Do we really want to do that? It becomes true for OS400 yes, and it's
> > been implicitly true for *nix, but it still isn't true for CygWin, no?
Am
> > I being too much of a stickler here...or did you mean for the
doc-string
> > to indicate that there are platform dependent differences?
>
> Sorry, I thought svn_io_copy_file() said it was a wrapper around
> apr_file_copy(), but it doesn't.
>
> We obviously want svn_io_copy_file() to perform a byte-for-byte copy.
> Therefore we want to document it as such, because, as we have seen in
APR, if
> we don't then there is confusion about what it should do to text files.
The
> fact that it does not actually do so on all platforms then becomes
adefinite
> bug, rather than just unspecified behaviour as it is at present, and I
think
> that's a good thing.
>
> It seems logical to document the intended behavior at the same timeas
fixing
> the implementation (on OS400) to conform to that behaviour.
Ok, I understand now, I'll tweak the doc-string.
> Apart from that, this patch is fine.
>
> Sorry for the delay in responding this time.
>
> - Julian
Thanks for all the help,
Paul B.
_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.
Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00 AM:
>
>>(1) [...] The destination should be opened in the same
>>mode, and have the same CCSID as the source. [...]
>
> FWIW, IBM APR does not provide any way to obtain, specify, or otherwise
> manipulate a file's CCSID. It can be done with lower-level system calls;
> IBM provides a rather "cumbersome" (as in requiring a new 30+ line
> function) API (Qp0lSetAttr) to change the CCSID of an IFS file.
OK. Perhaps it's safe to pretty much ignore the CCSIDs, always setting it to
UTF-8. I still think the CCSID should be copied by any "file copy" function,
in theory, but, in practical terms, I was thinking that either
* the unwanted change of CCSID when copying a non-UTF8 file could be a problem
in some cases, or
* all files being copied by Subversion will already have a UTF-8 CCSID in which
case no translation will happen when copied in text mode so the patch is not needed
Now I can see that the second isn't true (e.g. "svn add non-UTF8-file"), and
maybe the first isn't either; I'll leave that to your judgement, as I haven't
thought through the possible scenarios and don't know the system well enough
anyway. If it's not a problem in practice, then fair enough.
>>(2) [...] That's the bug that Max Bowsher found for the
>>CygWin port of APR: it needs to read and write in binary mode. The same fix
>>should make the reads and writes insensitive to CCSIDs on OS400.
>>
>>This should all be done within APR, of course. I can accept this patch on the
>>basis that it won't get into APR soon enough, but I would like some assurance
>>that one of you will make an effort to get it into APR.
>
> We'll push IBM to make this change, as we have with others, in their port
> of APR as soon as it's in open source APR. We'll also suggest to them
> that they may simply want to fix the problem in their port of APR,
> regardless of what happens with open source APR, but this is less likely.
> Regardless of what IBM elects to do, please understand that this is an
> informal process; there's no mailing list for IBM APR or anything like
> that. We simply don't have much power to make them do anything they don't
> want to do.
OK.
We - that is, Subversion developers, some of whom are also APR developers - can
and should independently push for APR to be fixed in this way. I've just sent
a message about this ("[APR] apr_file_copy() improvements").
>>To complete this patch, svn_io_copy_file() needs to say in its doc-string that
>>it does a "byte-for-byte" copy, as it is no longer just a wrapper around
>>apr_file_copy(). (Eventually, I hope apr_file_copy() will be fixed and
>>documented to do the same.)
>
> Do we really want to do that? It becomes true for OS400 yes, and it's
> been implicitly true for *nix, but it still isn't true for CygWin, no? Am
> I being too much of a stickler here...or did you mean for the doc-string
> to indicate that there are platform dependent differences?
Sorry, I thought svn_io_copy_file() said it was a wrapper around
apr_file_copy(), but it doesn't.
We obviously want svn_io_copy_file() to perform a byte-for-byte copy.
Therefore we want to document it as such, because, as we have seen in APR, if
we don't then there is confusion about what it should do to text files. The
fact that it does not actually do so on all platforms then becomes a definite
bug, rather than just unspecified behaviour as it is at present, and I think
that's a good thing.
It seems logical to document the intended behaviour at the same time as fixing
the implementation (on OS400) to conform to that behaviour.
Apart from that, this patch is fine.
Sorry for the delay in responding this time.
- Julian
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.
Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 03/07/2006 09:56:00 AM:
Hi Julian,
Thanks for reviewing this. I know this OS400 stuff can be a haze...
> Paul Burba wrote:
> >
> > [[[
> > OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
> >
> > This is one of several patches to allow Subversion to run on IBM's
> > OS400 V5R4. It makes svn_io_copy_file() insensitive to the CCSIDs of
> > the source and destination files, thereby preventing potential OS400
> > conversion of the files' contents.
> >
> > * subversion/libsvn_subr/io.c
> > (os400_file_copy): New function.
> > (svn_io_copy_file): Replace call to apr_file_copy() with new
> > function.
> > ]]]
>
> What do we want svn_io_copy_file() to do, in high-level terms? It seems
we
> want it to make an exact copy of a file, at least an exact copy of the
> contents, but perhaps not the meta-data. But it's silly, isn't it,
> to preserve
> the contents while not preserving the CCSID, because then the new
> file's CCSID
> would be more likely to be wrong, so this function should copy the
source
> file's CCSID too.
>
> I believe the apr_file_copy() was intended to do a byte-for-byte
> copy and thus
> correctly copy any file, text or binary, even on systems that otherwise
might
> do translations on text files. However, it's broken on non-Unix
systems.
>
> There are two possible causes of unwanted modification during a copy.
>
> (1) If the source and destination are opened in different modes (text
vs.
> binary) or, on OS400, have different CCSIDs. This should not occur
because
> this is a complete COPY operation in which the destination file is
completely
> under the function's control. The destination should be opened in the
same
> mode, and have the same CCSID as the source. If that's not easy to do
with
> APR, that's a bug in APR or needs to be done with lower-level system
calls.
FWIW, IBM APR does not provide any way to obtain, specify, or otherwise
manipulate a file's CCSID. It can be done with lower-level system calls;
IBM provides a rather "cumbersome" (as in requiring a new 30+ line
function) API (Qp0lSetAttr) to change the CCSID of an IFS file.
> (2) If translating reads and writes are used, and the translation is not
> exactly reversible, so read + write causes an unintended translation.
This
> should be avoided by opening the files in "binary" mode so that
byte-for-byte
> reads and writes are performed. That's the bug that Max Bowsher
> found for the
> CygWin port of APR: it needs to read and write in binary mode. The same
fix
> should make the reads and writes insensitive to CCSIDs on OS400.
>
> This should all be done within APR, of course. I can accept this
> patch on the
> basis that it won't get into APR soon enough, but I would like some
assurance
> that one of you will make an effort to get it into APR.
We'll push IBM to make this change, as we have with others, in their port
of APR as soon as it's in open source APR. We'll also suggest to them
that they may simply want to fix the problem in their port of APR,
regardless of what happens with open source APR, but this is less likely.
Regardless of what IBM elects to do, please understand that this is an
informal process; there's no mailing list for IBM APR or anything like
that. We simply don't have much power to make them do anything they don't
want to do.
> To complete this patch, svn_io_copy_file() needs to say in its doc-
> string that
> it does a "byte-for-byte" copy, as it is no longer just a wrapper around
> apr_file_copy(). (Eventually, I hope apr_file_copy() will be fixed and
> documented to do the same.)
Do we really want to do that? It becomes true for OS400 yes, and it's
been implicitly true for *nix, but it still isn't true for CygWin, no? Am
I being too much of a stickler here...or did you mean for the doc-string
to indicate that there are platform dependent differences?
> > +#ifdef AS400
> > +/* CCSID insensitive replacement for apr_file_copy() on OS400.
>
> It's not simply a replacement, as it has an extra argument. Please
either
> remove or document the extra argument.
I removed that argument, it was not necessary. I should have read my own
comment, which explained why:
* apr_file_copy() does not require the destination file to exist and
will
* overwrite it if it does. Since this is a replacement for
* apr_file_copy() we enforce similar behavior.
> > + *
> > + * (See comments for file_open() for more info on CCSIDs.)
> > + *
> > + * On OS400 apr_file_copy() attempts to convert the contents of the
source
> > + * file from its CCSID to the CCSID of the desination file. This may
>
> ("desination" -> "destination")
Fixed.
> > + * corrupt the destination file's contents if the files' CCSIDs
differ from
> > + * each other and/or the system CCSID.
> > + *
> > + * This new function prevents this by forcing a binary copy. It is
> > + * stripped down copy of the private function
apr_file_transfer_contents in
> > + * srclib/apr/file_io/unix/copy.c of version 2.0.54 of the Apache
HTTP
> > + * Server (http://httpd.apache.org/) excepting that APR_LARGEFILE is
not
> > + * used, from_path is always opened with APR_BINARY, and
> > + * APR_FILE_SOURCE_PERMS is not supported.
> > + */
> > +static apr_status_t
> > +os400_file_copy(const char *from_path,
> > + const char *to_path,
> > + apr_int32_t flags,
> > + apr_fileperms_t perms,
> > + apr_pool_t *pool)
>
>
> > +#ifndef AS400
> > apr_err = apr_file_copy(src_apr, dst_tmp_apr, APR_OS_DEFAULT,
pool);
> > +#else
> > + /* Do a CCSID blind file copy on OS400. */
>
> (No need for that comment. No space before paren in next line(s).)
Gone & gone.
Paul B.
[[[
OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
This is one of several patches to allow Subversion to run on IBM's
OS400 V5R4. It makes svn_io_copy_file() insensitive to the CCSIDs of
the source and destination files, thereby preventing potential OS400
conversion of the files' contents.
* subversion/libsvn_subr/io.c
(os400_file_copy): New function.
(svn_io_copy_file): Replace call to apr_file_copy() with new
function.
]]]
_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________
Re: [PATCH] #7 OS400/EBCDIC Port: Make svn_io_copy_file() CCSID insensitive.
Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
>
> [[[
> OS400/EBCDIC Port: Make svn_io_copy_file CCSID insensitive.
>
> This is one of several patches to allow Subversion to run on IBM's
> OS400 V5R4. It makes svn_io_copy_file() insensitive to the CCSIDs of
> the source and destination files, thereby preventing potential OS400
> conversion of the files' contents.
>
> * subversion/libsvn_subr/io.c
> (os400_file_copy): New function.
> (svn_io_copy_file): Replace call to apr_file_copy() with new
> function.
> ]]]
What do we want svn_io_copy_file() to do, in high-level terms? It seems we
want it to make an exact copy of a file, at least an exact copy of the
contents, but perhaps not the meta-data. But it's silly, isn't it, to preserve
the contents while not preserving the CCSID, because then the new file's CCSID
would be more likely to be wrong, so this function should copy the source
file's CCSID too.
I believe the apr_file_copy() was intended to do a byte-for-byte copy and thus
correctly copy any file, text or binary, even on systems that otherwise might
do translations on text files. However, it's broken on non-Unix systems.
There are two possible causes of unwanted modification during a copy.
(1) If the source and destination are opened in different modes (text vs.
binary) or, on OS400, have different CCSIDs. This should not occur because
this is a complete COPY operation in which the destination file is completely
under the function's control. The destination should be opened in the same
mode, and have the same CCSID as the source. If that's not easy to do with
APR, that's a bug in APR or needs to be done with lower-level system calls.
(2) If translating reads and writes are used, and the translation is not
exactly reversible, so read + write causes an unintended translation. This
should be avoided by opening the files in "binary" mode so that byte-for-byte
reads and writes are performed. That's the bug that Max Bowsher found for the
CygWin port of APR: it needs to read and write in binary mode. The same fix
should make the reads and writes insensitive to CCSIDs on OS400.
This should all be done within APR, of course. I can accept this patch on the
basis that it won't get into APR soon enough, but I would like some assurance
that one of you will make an effort to get it into APR.
To complete this patch, svn_io_copy_file() needs to say in its doc-string that
it does a "byte-for-byte" copy, as it is no longer just a wrapper around
apr_file_copy(). (Eventually, I hope apr_file_copy() will be fixed and
documented to do the same.)
> +#ifdef AS400
> +/* CCSID insensitive replacement for apr_file_copy() on OS400.
It's not simply a replacement, as it has an extra argument. Please either
remove or document the extra argument.
> + *
> + * (See comments for file_open() for more info on CCSIDs.)
> + *
> + * On OS400 apr_file_copy() attempts to convert the contents of the source
> + * file from its CCSID to the CCSID of the desination file. This may
("desination" -> "destination")
> + * corrupt the destination file's contents if the files' CCSIDs differ from
> + * each other and/or the system CCSID.
> + *
> + * This new function prevents this by forcing a binary copy. It is
> + * stripped down copy of the private function apr_file_transfer_contents in
> + * srclib/apr/file_io/unix/copy.c of version 2.0.54 of the Apache HTTP
> + * Server (http://httpd.apache.org/) excepting that APR_LARGEFILE is not
> + * used, from_path is always opened with APR_BINARY, and
> + * APR_FILE_SOURCE_PERMS is not supported.
> + */
> +static apr_status_t
> +os400_file_copy(const char *from_path,
> + const char *to_path,
> + apr_int32_t flags,
> + apr_fileperms_t perms,
> + apr_pool_t *pool)
> +#ifndef AS400
> apr_err = apr_file_copy(src_apr, dst_tmp_apr, APR_OS_DEFAULT, pool);
> +#else
> + /* Do a CCSID blind file copy on OS400. */
(No need for that comment. No space before paren in next line(s).)
> + apr_err = os400_file_copy (src_apr, dst_tmp_apr,
> + APR_WRITE | APR_CREATE | APR_TRUNCATE |
> + APR_BINARY, APR_OS_DEFAULT, pool);
> +#endif
- Julian
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org