You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Oliver Deakin <ol...@googlemail.com> on 2009/08/05 18:23:21 UTC
Re: [jira] Resolved: (HARMONY-6290) BufferedReader.readLine() breaks
at EBCDIC newline, violating the spec
Hi all,
I have been looking at the charset encoder/decoders for ebcdic (IBM1047)
as a result of HARMONY-6290 and I noticed that the character mappings
appear to be slightly different to those originally generated by the
TableGenerator tool contributed as part of HARMONY-3593.
When I run the tool on my local machine using the RI, I get byte 0x15
(NEL) mapped to 0x0A (unicode LF) and 0x25 (LF) mapped to 0x85 (unicode
NEL). However the Harmony tables have these values the other way around
- i.e. byte 0x15 mapped to 0x85 and 0x25 mapped to 0x0A. So it appears
we currently have a character mapping difference to the RI. I have
opened [1] for this issue and attached a patch to alter our mapping to
match the RI.
Before I make the commit, are there any objections/comments on this?
Regards,
Oliver
[1] https://issues.apache.org/jira/browse/HARMONY-6294
Oliver Deakin (JIRA) wrote:
> [ https://issues.apache.org/jira/browse/HARMONY-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Oliver Deakin resolved HARMONY-6290.
> ------------------------------------
>
> Resolution: Fixed
> Fix Version/s: 5.0M11
> Assignee: Oliver Deakin
>
> Fix and test case applied with minor change at repo revision r801230 - please check it applied as expected.
>
>
>> BufferedReader.readLine() breaks at EBCDIC newline, violating the spec
>> ----------------------------------------------------------------------
>>
>> Key: HARMONY-6290
>> URL: https://issues.apache.org/jira/browse/HARMONY-6290
>> Project: Harmony
>> Issue Type: Bug
>> Components: Classlib
>> Environment: SVN Revision: 800827
>> Reporter: Jesse Wilson
>> Assignee: Oliver Deakin
>> Fix For: 5.0M11
>>
>> Attachments: readLine_no_EBCDIC.patch
>>
>> Original Estimate: 0.33h
>> Remaining Estimate: 0.33h
>>
>> The spec says that BufferedReader.readLine() considers only "\r", "\n" and "\r\n" to be line separators. We must not permit additional separator characters. I admit that the RI's behaviour is surprising, and incompatible with it's own Pattern and Scanner classes. But this is the specified behaviour; the doc explicitly calls out which character sequences are used as newlines. It does not permit additional characters to break lines.
>> For users reading EBCDIC-encoded files, a better practice is to read through the files using a Scanner. That way, the application will behave the same when executed on either Harmony or on the RI.
>> #Android
>>
>
>
--
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU