You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by "Naoki Nose (JIRA)" <ji...@codehaus.org> on 2005/11/03 15:00:06 UTC

[jira] Created: (MNG-1409) Various encoding problems with InputStream and XML

Various encoding problems with InputStream and XML
--------------------------------------------------

         Key: MNG-1409
         URL: http://jira.codehaus.org/browse/MNG-1409
     Project: Maven 2
        Type: Bug
    Reporter: Naoki Nose


There is various encoding problems with InputStream and XML in different components.
- Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
- In different components Reader is constructed with default system encoding.
- MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_55057 ] 

Brett Porter commented on MSITE-19:
-----------------------------------

I have not applied the i18n patch. I like the idea of doing native2ascii in process-resources better.

Do you know if there will be any negative side effects of the change to XmlWriter? What was that attempting to address?

Is there anything else necessary to get this issue resolved other than the above patches and the native2ascii'ing?

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Yue Ni (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_58415 ] 

Yue Ni commented on MSITE-19:
-----------------------------

I translate the Chinese simplified version of the site and project-info-report resource bundles, and attach them here, could anyone help to commit them to the svn repository?

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Yue Ni (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Yue Ni updated MSITE-19:
------------------------

    Attachment: project-info-report_zh_CN.properties

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, project-info-report_zh_CN.properties, site-plugin_ja.properties, site-plugin_zh_CN.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_55056 ] 

Brett Porter commented on MSITE-19:
-----------------------------------

the plexus-site-renderer patch is no longer required as it has moved to parsing using the modello generated model which accounts for encoding

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Naoki Nose (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MNG-1409?page=comments#action_53350 ] 

Naoki Nose commented on MNG-1409:
---------------------------------

I've looked into the source codes for the cause of encoding problems.

Problem 1. 
  the encoding detection of the input files heavily rely on 
  default system encoding.  
Problem 2. 
  In the site generation process, The Stirng to byte array conversions occur many times. 
  This leads to problems difficult to solve.

With problem 1, I have some idea about the solutions.

there are some types of input files, for example

- property resource file
- XML file
- apt file

and there should be an method 
of specifying encoding according to the file type . 

With property resource file, I like to use native2ascii.
Certainly, that's not human readable, but rarely causes the encoding problems.
And the problem of readability can be avoided by automating 
native2ascii processing. the build lifecycle phase 
"process-resource" will be 
good place to hold such a process.

With XML file , I think the encoding detection should 
follow XML specification of w3c. 
So, MXParser should be changed to support the auto 
encoding detection.
http://www.w3.org/TR/REC-xml/#sec-guessing

With apt file , I think the encoding detection should follow
POM configuration. The configuration will be like following:

<configuration>
  <inputEncoding>Shift_JIS</inputEncoding>
  <outputEncoding>UTF-8</outputEncoding>
  <locales>en,ja</locales>
</configuration>

With problem 2, I have no idea about the good solutions, yet.
the string to byte array conversion occur many times 
in the process of getting the site descriptor. In that process, 
the characters seems to be converted wrongly.



> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug

>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MNG-1409?page=all ]

Vincent Siveton updated MNG-1409:
---------------------------------

    Attachment: plexus-utils_2.diff

plexus-utils_2.diff

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MNG-1409?page=all ]

Vincent Siveton updated MNG-1409:
---------------------------------

    Attachment: plexus-i18n.diff

plexus-i18n.diff

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_57590 ] 

Vincent Siveton commented on MSITE-19:
--------------------------------------

Brett,

Any news about potential side effects? Could we close this issue?

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MNG-1409?page=all ]

Vincent Siveton updated MNG-1409:
---------------------------------

    Attachment: plexus-utils.diff

plexus-utils.diff

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MNG-1409?page=all ]

Vincent Siveton updated MNG-1409:
---------------------------------

    Component: maven-site-plugin

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose

>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MNG-1409?page=all ]

Vincent Siveton updated MNG-1409:
---------------------------------

    Attachment: plexus-site-renderer.diff

plexus-site-renderer.diff

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Yue Ni (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Yue Ni updated MSITE-19:
------------------------

    Attachment: site-plugin_zh_CN.properties

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties, site-plugin_zh_CN.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Naoki Nose (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Naoki Nose updated MSITE-19:
----------------------------

    Attachment: site-plugin_ja.properties

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Yue Ni (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Yue Ni updated MSITE-19:
------------------------

    Attachment: project-info-report_zh_CN.properties

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, project-info-report_zh_CN.properties, project-info-report_zh_CN.properties, site-plugin_ja.properties, site-plugin_zh_CN.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_55124 ] 

Vincent Siveton commented on MSITE-19:
--------------------------------------

Brett,

I tried to generate a dummy site in Japanese and in other available languages.
So, I used plexus-utils trunk version and I converted all bundles with native2ascii.
It works a treat for me with outputEncoding=UTF-8 :) 
Naoki, could you confirm too?
Moreover some translation in japanese are missing (eg in the dependencies page).

>From my point of view, I don't see any negative side effects.

I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not)


> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MNG-1409?page=comments#action_49947 ] 

Vincent Siveton commented on MNG-1409:
--------------------------------------

This issue appears currently for a Japanese translation and maybe for other East Asian languages (CJK charsets).

- Using a VM parameter could be a good starting point -Dfile.encoding=UTF-8 (to add to MAVEN_OPTS).

- Java reads bundles stream with the ISO-8859-1 charset.
PropertyResourceBundle class uses Properties internally: the ISO 8859-1 character encoding is used to load properties. 
Have a look to the API:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/PropertyResourceBundle.html
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html
So, I propose to correct plexus-i18n and use it instead of ResourceBundle.getBundle() calls (I think specifically in maven-project-info-reports-plugin subproject). See plexus-i18n.diff.
Another solution could be to use native2ascii in each bundles but IMHO it is not really human readable. 

- Xpp3DomBuilder in plexus-util seems to not handle correctly encoding parameter in XML header. So, plexus-site-renderer component doesn't generate a site descriptor with special characters.
Have a look to plexus-utils.diff and plexus-site-renderer.diff
Another issue could be in the toString() method from Xpp3Dom class: we need to add a default encoding. See plexus-utils_2.diff.

- Finally, IMHO, I don't think that the StringInputStream class in plexus-utils component has a good implementation because no encoding is defined. Maybe we could migrate to the StringInputStream class from Ant project.
http://svn.apache.org/repos/asf/ant/core/trunk/src/main/org/apache/tools/ant/filters/StringInputStream.java

It is hard to debug charset problems and depends on several factors. 
Other ideas are welcome.


> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose

>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Naoki Nose (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_55307 ] 

Naoki Nose commented on MSITE-19:
---------------------------------

I also tried to generate a dummy site including Japanese.
My enviroment is Debian/GNU Linux and the default encoding is EUC-JP.
I used trunk version of maven-site-plugin, doxia, modello and plexus and Japanese rendered collectly.
Thanks!  Many Japanese developers will appreciate this improvement.

> Moreover some translation in japanese are missing (eg in the dependencies page).
Some property items have added to the original property file since I send the japanese translation first.
I will update japanese translation later.

>I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not)

There are some disirable improvements about this problems.
1. XML parser in plexus-utils should handle encoding parameter in XML declaration collectly.
2. Dixia constucts reader with a default encoding. The encoding of the site documents should be declared explicitly.

May I create new issues about these ?

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Michael Schnake (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MNG-1409?page=comments#action_50466 ] 

Michael Schnake commented on MNG-1409:
--------------------------------------

While trying to get meaningful results building a (default, that is no own apt etc. files) site in german the current (= maven2 build from SVN) situation seems to be that it is impossible right now. My default file.encoding is UTF-8. I have maven-site-plugin configured with <outputEncoding>UTF-8</outputEncoding>.

The situation "out of the box" with regard to the german umlauts for the generated side is:
=> Result: site content has garbage, site navigator is correct, organization name (from pom.xlm) in copyright statement is correct.

1. Despite the statement at http://maven.apache.org/plugins/maven-site-plugin/i18n.html all Java .properties files must be encoded "ISO-8859-1 with unicode escapes as needed" (as defined by the Java API and already stated above). So I converted site-plugin_de.properties from UTF-8 to ISO-8859-1.
=> Result: Site content is correct, site navigator has garbage, organization name is correct.

2. Well, the component building the site navigator seems to (incorrectly, or at least "non Property API-Doc conforming") read site-plugin_de.properties using my platform default encoding (= UTF-8). So I called "mvn site" with MAVEN_OPTS="-Dfile.encoding=ISO-8859-1".
=> Result: Site content is correct, site navigator is correct, organization name has garbage.

So, now the organziation name has garbage, although it comes from my pom.xlm which explicitly states <?xml version="1.0" encoding="UTF-8"?>. But the parser reading the organization name from there seems to ignore that and uses the platform encoding (= ISO-8859-1 in the step above) instead.

The net result is that you currently have to sacrifice one of site [content | navigator | copyright]. But, hey, two out of three is not that bad ;-) Note that the previous comments for this bug already seem to explain (and probably fix) that behaviour. But perhaps this comment helps those struggling with site i18n until this is fixed.

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_58672 ] 

Brett Porter commented on MSITE-19:
-----------------------------------

applied Chinese simplified translation - thanks. Please attach new translations to a new issue!

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, project-info-report_zh_CN.properties, project-info-report_zh_CN.properties, site-plugin_ja.properties, site-plugin_zh_CN.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Naoki Nose (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Naoki Nose updated MSITE-19:
----------------------------

    Attachment: project-info-report_ja.properties

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Updated: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
     [ http://jira.codehaus.org/browse/MSITE-19?page=all ]

Brett Porter updated MSITE-19:
------------------------------

    Fix Version: 2.0

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_58045 ] 

Brett Porter commented on MSITE-19:
-----------------------------------

We still need to setup the native2ascii'ing.

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MNG-1409) Various encoding problems with InputStream and XML

Posted by "Lukas Theussl (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MNG-1409?page=comments#action_49955 ] 

Lukas Theussl commented on MNG-1409:
------------------------------------

The german translation also has some problems. The properties files are UTF-8 encoded, but the html output is unreadable (even with -Dfile.encoding=UTF-8, LC_ALL=en_US.UTF-8, checked with test9 of the site plugin). Strangely, the french properties files are not UTF-8 encoded (contrary to our own standarts), but the html result is correct in UTF-8. This definitely has to be sorted out before more translations are coming in...

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MNG-1409
>          URL: http://jira.codehaus.org/browse/MNG-1409
>      Project: Maven 2
>         Type: Bug
>   Components: maven-site-plugin
>     Reporter: Naoki Nose
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Brett Porter (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_55480 ] 

Brett Porter commented on MSITE-19:
-----------------------------------

Naoki,
yes, please create new issues for your 2 points, and the updated japanese translation. Thanks!

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


[jira] Commented: (MSITE-19) Various encoding problems with InputStream and XML

Posted by "Vincent Siveton (JIRA)" <ji...@codehaus.org>.
    [ http://jira.codehaus.org/browse/MSITE-19?page=comments#action_57589 ] 

Vincent Siveton commented on MSITE-19:
--------------------------------------

Applied in SVN. Thanks for the translation!

> Various encoding problems with InputStream and XML
> --------------------------------------------------
>
>          Key: MSITE-19
>          URL: http://jira.codehaus.org/browse/MSITE-19
>      Project: Maven 2.x Site Plugin
>         Type: Bug

>     Reporter: Naoki Nose
>      Fix For: 2.0
>  Attachments: plexus-i18n.diff, plexus-site-renderer.diff, plexus-utils.diff, plexus-utils_2.diff, project-info-report_ja.properties, site-plugin_ja.properties
>
>
> There is various encoding problems with InputStream and XML in different components.
> - Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
> - In different components Reader is constructed with default system encoding.
> - MXParser ignores encoding attribute in xml declaration.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org