You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Burba <pa...@softlanding.com> on 2006/04/14 12:57:07 UTC

[PROPOSAL] Using binary mode in Python open() calls

Hi All,

A long time back I hacked up the Python test scripts to test the 
Subversion OS400/EBCDIC port.  At first all I did was tweak the scripts 
running on Windows to actually test a remote AS400 server.  Later, once we 
ported the client to pre-V5R4 OS400, I ported the scripts themselves.*

Unfortunately what I've done so far is, I admit, fairly ugly and isn't 
suitable for trunk.  It's also a lot of work to keep up with as the tests 
change.

So I'm taking a new look at this and applying lessons learned in the hope 
I can make some changes to the scripts that are relatively unobtrusive, 
won't affect other platforms, and will make our lives testing the OS400 
port easier.

The first category of "fixes" I'd like to present for consideration are 
the truly "general" in nature, i.e. no OS400 dependent script code.  And 
the first of these relates to the built-in Python function open().  For 
those of you who remember the CCSID mess with apr_file_open() the 
situation is not all that different: files on the OS400 are tagged with 
CCSIDs representing the file's encoding and reading/writing/creating test 
to these files presents a host of problems when the file is not opened in 
binary mode.  All of these problems are easily avoided by using the 'b' 
mode with open().

So my question is: If all open() calls in the test scripts were done in 
binary mode would this have any adverse effect on other platforms?

I've already tried this on XP and the tests run normally.  I've looked at 
the scripts and don't see any potential problems (yet).  If any Python 
experts would like to weigh in on this I'd appreciate it.

Thanks,

Paul B.

* A huge thank you to Per Gummedal, who ported Python to the AS400 - 
http://www.iseriespython.com/

P.S. Don't know what the OS400/EBCDIC port is about?  See: 
http://svn.haxx.se/dev/archive-2006-02/0519.shtml)

P.P.S. I attached the patch to trunk for this change in case anyone wants 
to try it out.


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Jonathan Gilbert <o2...@sneakemail.com>.
At 07:33 PM 19/04/2006 +0200, Brane wrote:
[snip]
> If there's a Unix out there that distinguishes between text and binary, it
> was probably written somewhere close to Seattle. :)
>
> (And no, I don't consider Cygwin to be Unix.)

IBM's headquarters are located in New York.

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Branko Čibej <br...@xbc.nu>.
Paul Burba wrote:
> rooneg@gmail.com wrote on 04/17/2006 02:46:20 PM:
>
>   
>> On 4/14/06, Paul Burba <pa...@softlanding.com> wrote:
>>     
>>> Hi All,
>>>
>>> A long time back I hacked up the Python test scripts to test the
>>> Subversion OS400/EBCDIC port.  At first all I did was tweak the 
>>>       
> scripts
>   
>>> running on Windows to actually test a remote AS400 server.  Later, 
>>>       
> once we
>   
>>> ported the client to pre-V5R4 OS400, I ported the scripts themselves.*
>>>
>>> Unfortunately what I've done so far is, I admit, fairly ugly and isn't
>>> suitable for trunk.  It's also a lot of work to keep up with as the 
>>>       
> tests
>   
>>> change.
>>>
>>> So I'm taking a new look at this and applying lessons learned in the 
>>>       
> hope
>   
>>> I can make some changes to the scripts that are relatively 
>>>       
> unobtrusive,
>   
>>> won't affect other platforms, and will make our lives testing the 
>>>       
> OS400
>   
>>> port easier.
>>>
>>> The first category of "fixes" I'd like to present for consideration 
>>>       
> are
>   
>>> the truly "general" in nature, i.e. no OS400 dependent script code. 
>>>       
> And
>   
>>> the first of these relates to the built-in Python function open(). For
>>> those of you who remember the CCSID mess with apr_file_open() the
>>> situation is not all that different: files on the OS400 are tagged 
>>>       
> with
>   
>>> CCSIDs representing the file's encoding and reading/writing/creating 
>>>       
> test
>   
>>> to these files presents a host of problems when the file is not opened 
>>>       
> in
>   
>>> binary mode.  All of these problems are easily avoided by using the 
>>>       
> 'b'
>   
>>> mode with open().
>>>
>>> So my question is: If all open() calls in the test scripts were done 
>>>       
> in
>   
>>> binary mode would this have any adverse effect on other platforms?
>>>
>>> I've already tried this on XP and the tests run normally.  I've looked 
>>>       
> at
>   
>>> the scripts and don't see any potential problems (yet).  If any Python
>>> experts would like to weigh in on this I'd appreciate it.
>>>       
>> I'm hardly a python expert, but it doesn't seem likely to be a problem
>> as far as I can tell...  I'd recommend trying on windows and unix and
>> seeing what happens.
>>     
>
> Hi Garrett,
>
> I tested this patch via ra_local and ra_dav on Windows XP (Apache 2.0.54, 
> Neon 0.24.7) with fsfs.  Everything passes.  I don't have a unix box to 
> test on, but I can remedy that given some time.  In the meantime if 
> someone could apply this patch and run it on some flavor of *nix that 
> would be quite helpful...though in my limited *nix understanding there is 
> no concept of a binary vs. text file so *how* could it matter?
>   
I must admit I'm very surprised that the tests pass on Windows, but 
using binary mode should certainly have no effect no Unix. If there's a 
Unix out there that distinguishes between text and binary, it was 
probably written somewhere close to Seattle. :)

(And no, I don't consider Cygwin to be Unix.)

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Paul Burba <pa...@softlanding.com>.
rooneg@gmail.com wrote on 04/17/2006 02:46:20 PM:

> On 4/14/06, Paul Burba <pa...@softlanding.com> wrote:
> > Hi All,
> >
> > A long time back I hacked up the Python test scripts to test the
> > Subversion OS400/EBCDIC port.  At first all I did was tweak the 
scripts
> > running on Windows to actually test a remote AS400 server.  Later, 
once we
> > ported the client to pre-V5R4 OS400, I ported the scripts themselves.*
> >
> > Unfortunately what I've done so far is, I admit, fairly ugly and isn't
> > suitable for trunk.  It's also a lot of work to keep up with as the 
tests
> > change.
> >
> > So I'm taking a new look at this and applying lessons learned in the 
hope
> > I can make some changes to the scripts that are relatively 
unobtrusive,
> > won't affect other platforms, and will make our lives testing the 
OS400
> > port easier.
> >
> > The first category of "fixes" I'd like to present for consideration 
are
> > the truly "general" in nature, i.e. no OS400 dependent script code. 
And
> > the first of these relates to the built-in Python function open(). For
> > those of you who remember the CCSID mess with apr_file_open() the
> > situation is not all that different: files on the OS400 are tagged 
with
> > CCSIDs representing the file's encoding and reading/writing/creating 
test
> > to these files presents a host of problems when the file is not opened 
in
> > binary mode.  All of these problems are easily avoided by using the 
'b'
> > mode with open().
> >
> > So my question is: If all open() calls in the test scripts were done 
in
> > binary mode would this have any adverse effect on other platforms?
> >
> > I've already tried this on XP and the tests run normally.  I've looked 
at
> > the scripts and don't see any potential problems (yet).  If any Python
> > experts would like to weigh in on this I'd appreciate it.
> 
> I'm hardly a python expert, but it doesn't seem likely to be a problem
> as far as I can tell...  I'd recommend trying on windows and unix and
> seeing what happens.

Hi Garrett,

I tested this patch via ra_local and ra_dav on Windows XP (Apache 2.0.54, 
Neon 0.24.7) with fsfs.  Everything passes.  I don't have a unix box to 
test on, but I can remedy that given some time.  In the meantime if 
someone could apply this patch and run it on some flavor of *nix that 
would be quite helpful...though in my limited *nix understanding there is 
no concept of a binary vs. text file so *how* could it matter?

Paul B.


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 4/14/06, Paul Burba <pa...@softlanding.com> wrote:
> Hi All,
>
> A long time back I hacked up the Python test scripts to test the
> Subversion OS400/EBCDIC port.  At first all I did was tweak the scripts
> running on Windows to actually test a remote AS400 server.  Later, once we
> ported the client to pre-V5R4 OS400, I ported the scripts themselves.*
>
> Unfortunately what I've done so far is, I admit, fairly ugly and isn't
> suitable for trunk.  It's also a lot of work to keep up with as the tests
> change.
>
> So I'm taking a new look at this and applying lessons learned in the hope
> I can make some changes to the scripts that are relatively unobtrusive,
> won't affect other platforms, and will make our lives testing the OS400
> port easier.
>
> The first category of "fixes" I'd like to present for consideration are
> the truly "general" in nature, i.e. no OS400 dependent script code.  And
> the first of these relates to the built-in Python function open().  For
> those of you who remember the CCSID mess with apr_file_open() the
> situation is not all that different: files on the OS400 are tagged with
> CCSIDs representing the file's encoding and reading/writing/creating test
> to these files presents a host of problems when the file is not opened in
> binary mode.  All of these problems are easily avoided by using the 'b'
> mode with open().
>
> So my question is: If all open() calls in the test scripts were done in
> binary mode would this have any adverse effect on other platforms?
>
> I've already tried this on XP and the tests run normally.  I've looked at
> the scripts and don't see any potential problems (yet).  If any Python
> experts would like to weigh in on this I'd appreciate it.

I'm hardly a python expert, but it doesn't seem likely to be a problem
as far as I can tell...  I'd recommend trying on windows and unix and
seeing what happens.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Jesper Steen Møller <je...@selskabet.org>.
Mark Phippard wrote:

> Michael Haggerty <mh...@alum.mit.edu> wrote on 04/17/2006 04:27:58 PM:
>
> > - The contents of the file itself would not be in the correct text
> > format for the local platform.  This would make it difficult to look at
> > or process a test's intermediate results using the platform's standard
> > tools.
>
> Other than OS/400, I cannot think of any platform that does any kind 
> of translation of the text encoding.  What problems are you 
> anticipating here?

If I read the docs correctly, then the new Visual Studio 2005 CRT will 
actually look for BOMs if a file is opened in text mode, and silently do 
Unicode->ANSI conversion as needed - yuck!

<http://msdn2.microsoft.com/en-us/library/yeby3zcb(VS.80).aspx>

This will be a problem for UTF-8 files read as text, which is how the 
config files are de-facto read (you may recall the thread with the 
Chinese mod_dav_svn user with an authz problem).

Letting the CRT convert UTF-8 -> ANSI would be a dangerous thing. 
Another good reason for using binary!

-Jesper

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Michael Haggerty <mh...@alum.mit.edu>.
Mark Phippard wrote:
> Michael Haggerty <mh...@alum.mit.edu> wrote on 04/17/2006 04:27:58 PM:
>> Paul Burba wrote:
>> I can think of lots of reasons to *expect* problems switching all file
>> access to binary mode.  The fact that no problems turned up in your XP
>> test should be considered to be a lucky coincidence until proven
> otherwise.
> 
> What sort of problems would you expect?  In C opening a file in binary
> has no effect on most platforms.  In fact, when we were working on the
> C-side of the port, several different committers told us it has no
> effect on any *nix port.  Is Python different?  If so, then how exactly?

No, the "b"inary option doesn't have any effect on *nix systems.
Problems, if any, would be expected under Windows, old MacOS, CP/M,
ENIAC, etc.

Python allows C's stdio libraries to do text <-> binary translation.
Therefore the situation is no different than that for C.

>> - The contents of the file itself would not be in the correct text
>> format for the local platform.  This would make it difficult to look at
>> or process a test's intermediate results using the platform's standard
>> tools.
> 
> Other than OS/400, I cannot think of any platform that does any kind of
> translation of the text encoding.  What problems are you anticipating here?

I'm just referring to Window's '\n' -> '\r\n' conversion and old MacOS's
(I believe) '\n' -> '\r'.  If the file is opened in binary mode, then
these translations are not done and therefore the files on disk are not
in the platforms' expected text file format.  I know from experience
that this confuses many Windows tools (for example, some editors).

>> > All of these problems are easily avoided by using the 'b' mode with
>> > open().
>>
>> Just a very naive question then: why does the OS400 version of stdio (or
>> Python) do translation when opening a text-mode file?  Why not treat
>> text files the same as binary in general on this platform?
> 
> OS/400 is EBCDIC native.  If you search the dev@ list for EBCDIC you can
> probably find several messages where Paul has explained this in great
> detail for the C-side of the port.  Basically, files are tagged with a
> CCSID that indicates there encoding.  When opened in text mode, the
> contents are converted to the job CCSID, which is always EBCDIC.  For
> Subversion purposes, we want to keep stuff in UTF-8, so opening the
> files in binary mode tells OS/400 not to translate the encoding of the
> content into EBCDIC.

Thanks for the explanation.

My point of view is that writing a text file in binary mode is
nonstandard and nonportable, and therefore the burden of proof should be
on the proposer to explain why the change will not be a problem on any
platform.  (The fact that a test ran successfully, by itself, is not
such a proof.)  Something along the lines of "after this change, files
x, y, and z will be written with different line-end conventions, but
that is not a problem because..." would be much more persuasive.

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Mark Phippard <ma...@softlanding.com>.
Michael Haggerty <mh...@alum.mit.edu> wrote on 04/17/2006 04:27:58 PM:

> Paul Burba wrote:
> I can think of lots of reasons to *expect* problems switching all file
> access to binary mode.  The fact that no problems turned up in your XP
> test should be considered to be a lucky coincidence until proven 
otherwise.

What sort of problems would you expect?  In C opening a file in binary has 
no effect on most platforms.  In fact, when we were working on the C-side 
of the port, several different committers told us it has no effect on any 
*nix port.  Is Python different?  If so, then how exactly?

> - It will not occur to authors of future tests to open text files in
> non-text mode, so future tests will likely be broken on OS400.

We expect this.  Paul has full commit rights, once the general concept is 
approved he can fix tests as necessary if/when someone commits changes and 
forgets.  This is much easier than having to make changes in the WC 
whenever we want to run the tests.

> - The contents of the file itself would not be in the correct text
> format for the local platform.  This would make it difficult to look at
> or process a test's intermediate results using the platform's standard
> tools.

Other than OS/400, I cannot think of any platform that does any kind of 
translation of the text encoding.  What problems are you anticipating 
here?

> > All of these problems are easily avoided by using the 'b' mode with
> > open().
> 
> Just a very naive question then: why does the OS400 version of stdio (or
> Python) do translation when opening a text-mode file?  Why not treat
> text files the same as binary in general on this platform?

OS/400 is EBCDIC native.  If you search the dev@ list for EBCDIC you can 
probably find several messages where Paul has explained this in great 
detail for the C-side of the port.  Basically, files are tagged with a 
CCSID that indicates there encoding.  When opened in text mode, the 
contents are converted to the job CCSID, which is always EBCDIC.  For 
Subversion purposes, we want to keep stuff in UTF-8, so opening the files 
in binary mode tells OS/400 not to translate the encoding of the content 
into EBCDIC.

Mark



_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

Re: [PROPOSAL] Using binary mode in Python open() calls

Posted by Michael Haggerty <mh...@alum.mit.edu>.
Paul Burba wrote:
> So I'm taking a new look at this and applying lessons learned in the hope 
> I can make some changes to the scripts that are relatively unobtrusive, 
> won't affect other platforms, and will make our lives testing the OS400 
> port easier.
> 
> The first category of "fixes" I'd like to present for consideration are 
> the truly "general" in nature, i.e. no OS400 dependent script code.  And 
> the first of these relates to the built-in Python function open().  For 
> those of you who remember the CCSID mess with apr_file_open() the 
> situation is not all that different: files on the OS400 are tagged with 
> CCSIDs representing the file's encoding and reading/writing/creating test 
> to these files presents a host of problems when the file is not opened in 
> binary mode.  All of these problems are easily avoided by using the 'b' 
> mode with open().
> 
> So my question is: If all open() calls in the test scripts were done in 
> binary mode would this have any adverse effect on other platforms?
> 
> I've already tried this on XP and the tests run normally.  I've looked at 
> the scripts and don't see any potential problems (yet).  If any Python 
> experts would like to weigh in on this I'd appreciate it.

I can think of lots of reasons to *expect* problems switching all file
access to binary mode.  The fact that no problems turned up in your XP
test should be considered to be a lucky coincidence until proven otherwise.

Therefore, I think that this change has to be justified on a
case-by-case basis.  In the cases that you have looked at, why does it work?

Perhaps it works because the only files that are read by Python using
the binary option were also written by Python in binary mode.  This
would result in the errors canceling each other out and the tests passing.

However, this leaves other problems unaddressed:

- If a file is opened with the binary option, people reading the code
will be misled to think that the file contains non-text information.
Therefore, code clarity is reduced.

- It will not occur to authors of future tests to open text files in
non-text mode, so future tests will likely be broken on OS400.

- The contents of the file itself would not be in the correct text
format for the local platform.  This would make it difficult to look at
or process a test's intermediate results using the platform's standard
tools.

> All of these problems are easily avoided by using the 'b' mode with
> open().

Just a very naive question then: why does the OS400 version of stdio (or
Python) do translation when opening a text-mode file?  Why not treat
text files the same as binary in general on this platform?

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org