You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Oliver Deakin <ol...@googlemail.com> on 2009/08/05 18:23:21 UTC

Re: [jira] Resolved: (HARMONY-6290) BufferedReader.readLine() breaks at EBCDIC newline, violating the spec

Hi all,

I have been looking at the charset encoder/decoders for ebcdic (IBM1047) 
as a result of HARMONY-6290 and I noticed that the character mappings 
appear to be slightly different to those originally generated by the 
TableGenerator tool contributed as part of HARMONY-3593.

When I run the tool on my local machine using the RI, I get byte 0x15 
(NEL) mapped to 0x0A (unicode LF) and 0x25 (LF) mapped to 0x85 (unicode 
NEL). However the Harmony tables have these values the other way around 
- i.e. byte 0x15 mapped to 0x85 and 0x25 mapped to 0x0A. So it appears 
we currently have a character mapping difference to the RI. I have 
opened [1] for this issue and attached a patch to alter our mapping to 
match the RI.

Before I make the commit, are there any objections/comments on this?

Regards,
Oliver

[1] https://issues.apache.org/jira/browse/HARMONY-6294


Oliver Deakin (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/HARMONY-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Oliver Deakin resolved HARMONY-6290.
> ------------------------------------
>
>        Resolution: Fixed
>     Fix Version/s: 5.0M11
>          Assignee: Oliver Deakin
>
> Fix and test case applied with minor change at repo revision r801230 - please check it applied as expected.
>
>   
>> BufferedReader.readLine() breaks at EBCDIC newline, violating the spec
>> ----------------------------------------------------------------------
>>
>>                 Key: HARMONY-6290
>>                 URL: https://issues.apache.org/jira/browse/HARMONY-6290
>>             Project: Harmony
>>          Issue Type: Bug
>>          Components: Classlib
>>         Environment: SVN Revision: 800827
>>            Reporter: Jesse Wilson
>>            Assignee: Oliver Deakin
>>             Fix For: 5.0M11
>>
>>         Attachments: readLine_no_EBCDIC.patch
>>
>>   Original Estimate: 0.33h
>>  Remaining Estimate: 0.33h
>>
>> The spec says that BufferedReader.readLine() considers only "\r", "\n" and "\r\n" to be line separators. We must not permit additional separator characters. I admit that the RI's behaviour is surprising, and incompatible with it's own Pattern and Scanner classes. But this is the specified behaviour; the doc explicitly calls out which character sequences are used as newlines. It does not permit additional characters to break lines. 
>> For users reading EBCDIC-encoded files, a better practice is to read through the files using a Scanner. That way, the application will behave the same when executed on either Harmony or on the RI.
>> #Android
>>     
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU