You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Håvard Wigtil (JIRA)" <ji...@codehaus.org> on 2008/11/28 10:25:19 UTC

[jira] Created: (MASSEMBLY-371) Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8

Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8
---------------------------------------------------------------------------------

                 Key: MASSEMBLY-371
                 URL: http://jira.codehaus.org/browse/MASSEMBLY-371
             Project: Maven 2.x Assembly Plugin
          Issue Type: Bug
    Affects Versions: 2.2-beta-2
         Environment: Linux with platform encoding set to UTF-8
            Reporter: Håvard Wigtil
         Attachments: assembly-encoding.zip

Converting line endings for a text file encoded in ISO-8859-1 replaces any character in the set above ASCII with the three characters ᅵ.
What happens is that the file to be converted is read as text in the platform encoding (seems to be method readFile in class FileFormatter), and when the platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is converted to the UTF-8 character "&#65533;" (i.e. the placeholder for unknown / broken character). 

I've attached a small sample project that shows this problem on Linux with platform encoding set to UTF-8.

I see two possible fixes for this, one is to read the file as bytes and do a search /replace for line endings, and the other is to be able to specify encoding for a fileset or file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] (MASSEMBLY-371) Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8

Posted by "Dennis Lundberg (JIRA)" <ji...@codehaus.org>.
     [ https://jira.codehaus.org/browse/MASSEMBLY-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Lundberg closed MASSEMBLY-371.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.4
         Assignee: Dennis Lundberg

Fixed in [r1402965|http://svn.apache.org/viewvc?view=revision&revision=1402965].
                
> Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8
> ---------------------------------------------------------------------------------
>
>                 Key: MASSEMBLY-371
>                 URL: https://jira.codehaus.org/browse/MASSEMBLY-371
>             Project: Maven 2.x Assembly Plugin
>          Issue Type: Bug
>    Affects Versions: 2.2-beta-2, 2.2
>         Environment: Linux with platform encoding set to UTF-8
>            Reporter: Håvard Wigtil
>            Assignee: Dennis Lundberg
>             Fix For: 2.4
>
>         Attachments: assembly-encoding.zip
>
>
> Converting line endings for a text file encoded in ISO-8859-1 replaces any character in the set above ASCII with the three characters ᅵ.
> What happens is that the file to be converted is read as text in the platform encoding (seems to be method readFile in class FileFormatter), and when the platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is converted to the UTF-8 character "&#65533;" (i.e. the placeholder for unknown / broken character). 
> I've attached a small sample project that shows this problem on Linux with platform encoding set to UTF-8.
> I see two possible fixes for this, one is to read the file as bytes and do a search /replace for line endings, and the other is to be able to specify encoding for a fileset or file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] (MASSEMBLY-371) Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8

Posted by "Dennis Lundberg (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MASSEMBLY-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312643#comment-312643 ] 

Dennis Lundberg commented on MASSEMBLY-371:
-------------------------------------------

More fixes in [r1403897|http://svn.apache.org/viewvc?view=revision&revision=1403897].
                
> Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8
> ---------------------------------------------------------------------------------
>
>                 Key: MASSEMBLY-371
>                 URL: https://jira.codehaus.org/browse/MASSEMBLY-371
>             Project: Maven 2.x Assembly Plugin
>          Issue Type: Bug
>    Affects Versions: 2.2-beta-2, 2.2
>         Environment: Linux with platform encoding set to UTF-8
>            Reporter: Håvard Wigtil
>            Assignee: Dennis Lundberg
>             Fix For: 2.4
>
>         Attachments: assembly-encoding.zip
>
>
> Converting line endings for a text file encoded in ISO-8859-1 replaces any character in the set above ASCII with the three characters ᅵ.
> What happens is that the file to be converted is read as text in the platform encoding (seems to be method readFile in class FileFormatter), and when the platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is converted to the UTF-8 character "&#65533;" (i.e. the placeholder for unknown / broken character). 
> I've attached a small sample project that shows this problem on Linux with platform encoding set to UTF-8.
> I see two possible fixes for this, one is to read the file as bytes and do a search /replace for line endings, and the other is to be able to specify encoding for a fileset or file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Updated: (MASSEMBLY-371) Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8

Posted by "Dennis Lundberg (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MASSEMBLY-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Lundberg updated MASSEMBLY-371:
--------------------------------------

    Affects Version/s: 2.2

> Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8
> ---------------------------------------------------------------------------------
>
>                 Key: MASSEMBLY-371
>                 URL: http://jira.codehaus.org/browse/MASSEMBLY-371
>             Project: Maven 2.x Assembly Plugin
>          Issue Type: Bug
>    Affects Versions: 2.2-beta-2, 2.2
>         Environment: Linux with platform encoding set to UTF-8
>            Reporter: Håvard Wigtil
>         Attachments: assembly-encoding.zip
>
>
> Converting line endings for a text file encoded in ISO-8859-1 replaces any character in the set above ASCII with the three characters ᅵ.
> What happens is that the file to be converted is read as text in the platform encoding (seems to be method readFile in class FileFormatter), and when the platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is converted to the UTF-8 character "&#65533;" (i.e. the placeholder for unknown / broken character). 
> I've attached a small sample project that shows this problem on Linux with platform encoding set to UTF-8.
> I see two possible fixes for this, one is to read the file as bytes and do a search /replace for line endings, and the other is to be able to specify encoding for a fileset or file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira